Reverse Engineering Android Apps (part 3)

I was going to end at part 2 but the recent shock news story about the iPhone and location data has made some of the other things I was thinking about when I took the android apps apart seem more relevant.

One of the things I mentioned about the adverts was that they were ‘location aware’. This of course means your phone sends information about your location to the ad server. It’s obviously impractical to send all ads for all locations to the phone, so the phone sends it’s location to the server and it serves up the relevant ads.

The other thing ad servers need to know is how many unique views there are of the ads. This again requires more data to go to the ad server. For Android devices this normally appears to be done by the ‘ANDROID_ID’. This is a random number generated by the device when it first boots and is supposed to remain constant for the lifetime of the device. The ad library I was looking at further anonymized this by md5 hashing it before sending it to their servers. So that’s a random number that’s then hashed before being sent. Not too bad really!

Of course how anonymous is anonymous data? It really depends on who’s looking at it.

The ad company isn’t able to determine who was where and when. That number was fairly meaningless after the md5 hash and was pretty meaningless before. There is one scenario where the data can be linked though. If you have the original data from the phone and the data from the ad company you can tie up the records. With that it’s possible to demonstrate that the phone believed it was somewhere when it connected to the ad server and when. The main place I can see that being used is by law enforcement and other government agencies.

The anonymous as with most things is in degrees. Forensic analysis of data after the fact as ever is a lot simpler than attempting to spot things in real time.

Of course I should probably mention this has nothing to do with the recent iPhone/iPad fun and games. Even if the iPhone does send your location to their servers regularly it probably isn’t even sending over that hashed device id because they don’t need to know about unique visitors (or worry about tracking visitors) for building up the location database.

Actually, there may be another way your data may not be properly anonymous to those looking at the data collected on the ad server. The server technically has the ip and port you connected from so if they store that, and in a way that it can be cross referenced with the location of the ad hit, then someone could decide to find out the identity of a person from that information along with where they were. Again that probably requires due process.

Should you be scared? Only as much as usual.

Advertisements

Reverse engineering android applications (part 2)

Taking a look at a single application, an  ebook I bought from O’Reilly that also had an Android app format it looks like my suspicions were correct.

When I take it apart I can see it is looking like advertising and usage information that are the main sources of the need for all the networking.

The O’Reilly application appears to use AdMob and Google Analytics.  AdMob does advertising which can be geo-aware which causes it to want to find your location.  The advertising probably explains a lot of the permissions requested.  I understand the need for ads in a lot of applications and I definitely sympathise with using analytics to allow you to figure out basic things about your user base so that you can improve your application in ways users are more likely to care about.  The problem is I still didn’t install the app because of those permission requirements.  And there are a bunch more I haven’t installed either.

I would be a lot happier if google came up with a standard API so that they could label the functions wanted ‘advertising’ and ‘collecting usage statistics’ as needed permissions instead of access to networks etc.  That way an app can use those facilities without needing to light up all the other permissions if it doesn’t need them.

There would be a friction if they were the only allowed supplier of ads on the platform but I would have thought that they could come up with an API that allows for restricted information to be communicated from the device while still allowing it to hook up to arbitrary ad/analytic providers.

The key thing is giving the user a clearer and narrower set of permissions to allow while allowing the app developers to use key services not directly related to the functionality of their applications.  If I saw ‘‘advertising, collecting usage statistics and sd card access’ as a requirement I’d be much more likely to allow it than the current, ‘wireless, gps, mobile data, sd card, your life, your wallet….’.

It’s not necessarily an easy problem but it ought to be solve-able.

Of course they might have already fixed this.  My phone is only running Android 1.6 and I’m not an Android developer, at least not yet…

Reverse engineering android applications (part 1)

I’ve been getting irritated by the number of android applications that want access to everything on my phone despite the function I want to use not really requiring it. I figure it’s time to take these applications apart and figure out what the reason for this problem is.

I figure in most cases the likelihood isn’t that the app wants to do evil, but instead it’s something else that’s causing the need for all the extra permissions. The problem is that there is no way to tell by simply looking at the permissions required because they want to do so much. When you look at a a word search game you have to wonder why it wants to use GPS, wireless and internet.

There appear to be two main schools of android reverse engineering. One is to unzip the .apk and then decode the .dex file into the raw instructions. An alternative is to turn the .dex into a more regular .jar with .class files and then feed it to a java decompiler. See dex2jar and Java Decompiler for that approach.  The first approach is taken by baksmali which is a disassembler for the dex files to essentially an assembly like version of the byte code with annotations.

The de-compiler approach produces much more easier to understand output for most code and at worst essentially produces assembly like output like baksmali for code it can’t de-compile properly.

Test::Most

When testing in Perl I’ve come around to the idea of using Test::Most in all my tests.  At first I resisted because I like to minimize my dependencies to just what I need but a few realisations have turned me around in this case.  Test::Most automatically includes a bunch of the common test modules and the synopsis kind of makes it seem like that’s most of what it does but that’s under selling it.  Those modules include useful functions like eq_or_diff and dies_ok but it also sets up a bunch of useful defaults and adds some valuable features.

The bail_on_fail is almost worth the price of admission itself.  The fact that you can turn on the bail on fail with an environment variable (BAIL_ON_FAIL) is seriously useful too.

The explain function also deserves an honourable mention too.  It’s basically a note Dumper($var).

Android/gmail data cleanup

I’ve been a long time user of gmail and so it was natural I got an android phone. Well eventually, after a few windows phones (which I did generally like) anyway. The one problem I’ve found is that there are a few bits of data that really need tidying up for the phone to work ideally. My data has come from a few sources, stuff that’s been added manually to gmail and stuff that was imported from Outlook (since I used to keep my phone numbers synced there with the windows phones).

If the Android (1.6) phone’s contact doesn’t have an actual name then it won’t find it when you try to type it in. This might sound obvious but I was used to typing things like ‘Dominos’ (okay, I should cut down). The data looks relatively correct in that the company name is in the field allocated for company names but unfortunately the phone doesn’t pick that up. The simplest way to work around that is to copy the company name into the regular name field. Since I have quite a lot of contacts like that I figured it would be simplest to script that. Luckily there’s a module for that, WWW::Google::Contacts.

#!perl

use strict;
use warnings;
use WWW::Google::Contacts;
use Term::ReadKey;

my $user = shift;
print "Enter password: ";
ReadMode('noecho');
my $pass = ReadLine(0);
print "\n";
ReadMode('restore');

my $google = WWW::Google::Contacts->new( username => $user, password => $pass );

my @contacts = $google->contacts->search();

foreach my $c ( @contacts ) {
    if(!$c->full_name) {
        if($c->organization) {
            my $name = $c->organization->[0]->name;
            print "Updating $name\n";
            $c->name($name);
            $c->update;
        }
    }
}

There is one other annoyance that I have with alarms and the calendar but I’ll save that for another day. I actually needed to extend an existing module so I really need to contribute that work back.