Searching stackexchange

It’s funny but I just realised the best way to search the stackexchange sites (stackoverflow, askubuntu etc.) is to type in a question. As you type in a title, and then the actual question it tries to suggest relevant questions. It’s always been a good feature, but I finally realised it’s actually a lot better than simply using the search button in the conventional way. I assume it must be because you type in a lot more detail when you formulate a proper question so it must have a lot more keywords it can use to do a better search for you.

The more I use those sites, the more I realise they have a lot of intelligent ideas demonstrated.


vim tricks (from stackoverflow)

There are a couple of tricks in vim that I keep needing to look up in my Stack Overflow favourites.  The first is to save a file that needs root access when you haven’t loaded vim as root.

:w !sudo tee %

The second is using vim as a hex editor,

:%!xxd -g 1

and to reverse it,

:%!xxd -r

Reverse Engineering Android Apps (part 3)

I was going to end at part 2 but the recent shock news story about the iPhone and location data has made some of the other things I was thinking about when I took the android apps apart seem more relevant.

One of the things I mentioned about the adverts was that they were ‘location aware’. This of course means your phone sends information about your location to the ad server. It’s obviously impractical to send all ads for all locations to the phone, so the phone sends it’s location to the server and it serves up the relevant ads.

The other thing ad servers need to know is how many unique views there are of the ads. This again requires more data to go to the ad server. For Android devices this normally appears to be done by the ‘ANDROID_ID’. This is a random number generated by the device when it first boots and is supposed to remain constant for the lifetime of the device. The ad library I was looking at further anonymized this by md5 hashing it before sending it to their servers. So that’s a random number that’s then hashed before being sent. Not too bad really!

Of course how anonymous is anonymous data? It really depends on who’s looking at it.

The ad company isn’t able to determine who was where and when. That number was fairly meaningless after the md5 hash and was pretty meaningless before. There is one scenario where the data can be linked though. If you have the original data from the phone and the data from the ad company you can tie up the records. With that it’s possible to demonstrate that the phone believed it was somewhere when it connected to the ad server and when. The main place I can see that being used is by law enforcement and other government agencies.

The anonymous as with most things is in degrees. Forensic analysis of data after the fact as ever is a lot simpler than attempting to spot things in real time.

Of course I should probably mention this has nothing to do with the recent iPhone/iPad fun and games. Even if the iPhone does send your location to their servers regularly it probably isn’t even sending over that hashed device id because they don’t need to know about unique visitors (or worry about tracking visitors) for building up the location database.

Actually, there may be another way your data may not be properly anonymous to those looking at the data collected on the ad server. The server technically has the ip and port you connected from so if they store that, and in a way that it can be cross referenced with the location of the ad hit, then someone could decide to find out the identity of a person from that information along with where they were. Again that probably requires due process.

Should you be scared? Only as much as usual.

Reverse engineering android applications (part 2)

Taking a look at a single application, an  ebook I bought from O’Reilly that also had an Android app format it looks like my suspicions were correct.

When I take it apart I can see it is looking like advertising and usage information that are the main sources of the need for all the networking.

The O’Reilly application appears to use AdMob and Google Analytics.  AdMob does advertising which can be geo-aware which causes it to want to find your location.  The advertising probably explains a lot of the permissions requested.  I understand the need for ads in a lot of applications and I definitely sympathise with using analytics to allow you to figure out basic things about your user base so that you can improve your application in ways users are more likely to care about.  The problem is I still didn’t install the app because of those permission requirements.  And there are a bunch more I haven’t installed either.

I would be a lot happier if google came up with a standard API so that they could label the functions wanted ‘advertising’ and ‘collecting usage statistics’ as needed permissions instead of access to networks etc.  That way an app can use those facilities without needing to light up all the other permissions if it doesn’t need them.

There would be a friction if they were the only allowed supplier of ads on the platform but I would have thought that they could come up with an API that allows for restricted information to be communicated from the device while still allowing it to hook up to arbitrary ad/analytic providers.

The key thing is giving the user a clearer and narrower set of permissions to allow while allowing the app developers to use key services not directly related to the functionality of their applications.  If I saw ‘‘advertising, collecting usage statistics and sd card access’ as a requirement I’d be much more likely to allow it than the current, ‘wireless, gps, mobile data, sd card, your life, your wallet….’.

It’s not necessarily an easy problem but it ought to be solve-able.

Of course they might have already fixed this.  My phone is only running Android 1.6 and I’m not an Android developer, at least not yet…

Reverse engineering android applications (part 1)

I’ve been getting irritated by the number of android applications that want access to everything on my phone despite the function I want to use not really requiring it. I figure it’s time to take these applications apart and figure out what the reason for this problem is.

I figure in most cases the likelihood isn’t that the app wants to do evil, but instead it’s something else that’s causing the need for all the extra permissions. The problem is that there is no way to tell by simply looking at the permissions required because they want to do so much. When you look at a a word search game you have to wonder why it wants to use GPS, wireless and internet.

There appear to be two main schools of android reverse engineering. One is to unzip the .apk and then decode the .dex file into the raw instructions. An alternative is to turn the .dex into a more regular .jar with .class files and then feed it to a java decompiler. See dex2jar and Java Decompiler for that approach.  The first approach is taken by baksmali which is a disassembler for the dex files to essentially an assembly like version of the byte code with annotations.

The de-compiler approach produces much more easier to understand output for most code and at worst essentially produces assembly like output like baksmali for code it can’t de-compile properly.

Microsoft Marketing

The Marketing people at Microsoft have really been busy lately.  Lots of American TV show’s I’ve been watching have had plenty of Microsoft plugs.  Things like Windows glowing logos on the backs of their monitors (both laptop and desktop).  Then there was the episode of NCSI Los Angeles where the character does the shaky thing to minimize all the other windows with his Windows 7.

Now there are major sites suggesting you use IE8 because it works so well with them.  There’s the Guardian – and who’ve developed a toolbar –

While it’s neat to see the Apple bias being offset in the movies this is getting a little weird.  They seem to be making a lot more impact these days.  Ever since their rather cool comeback to the Mac/PC ads with the I’m a PC campaign they’ve been a lot more pervasive in their marketing.  Heck there’s even one of those “I’m a PC” “stickers on my laptop.

Birmingham Perl QA Hackathon

I’m back from the 2009 Perl QA Hackathon ( where I acted as a spare programmer.

Barbie and JJ and the Birmingham Perl Mongers ( organized an excellent event.  I’m really going to have to do some exercise after eating so well too.

For my own reference if no-one else’s, here are some of the places I’ve just eaten in Birmingham so that I can remember them.

  • Shimla Pinks – An upmarket Indian restaurant
  • The Handmade burger company – decent burgers
  • The Thai Orchid on Bennetts Hill – Excellent food and service.
  • Bar Room Bar at the Mail box – excellent pizzas

lftp rocks!

I’ve known that lftp was a more advanced ftp client than ftp for years. I’ve not used it very often though because I move about platforms so often that I’m never sure whether it will be available. Getting dependent on non-standard features can be frustrating.

I just got an email from my backup service saying that I’ve overflowed my quote. I was just scrabbling about for a way to grab an entire directory listing so that I could analyze the situation when I realised that lftp has a ‘du‘ command built in. How cool is that? 😀

Figuring out what’s listening

A trick I learned a long time ago for linux and other *nix systems is to use fuser to check what’s listening. It’s actually a more generic file based command but most versions also allow you to check out sockets too. The standard syntax is fuser -v 81/tcp. That will display a list of the processes listening to the port.

It’s actually a part of the lsof list of open files functionality generally useful on *nix.

Internet Explorer and Foreign Languages

Internet Explorer is pretty good at displaying foreign languages. There are several mechanisms for displaying languages that have different character sets to our own.

The most portable two for web browsers are UTF-8 and normal ASCII.

ASCII Character Sets

If you want to use a different code page you simply specify the character set, to specify the Russian code page use the following meta tag,

It is worth noting that you should always specify a character set for your pages. Never assume that a page will be interpreted with the standard character set. If it is just the normal latin one set the charset to iso-8859-1

For more information about characters sets a good resource is

Unicode Character Sets

Unicode is a single character set; it defines a unique character for every character in every language. At least that’s the idea. I understand that they are still defining some of the characters from obscure medieval languages but most of our current ones have their characters in place already. This is so that we do not have to flip code pages to display different languages, we can just use the same character set throughout the whole application and it will have Russian characters, Chinese, and even good old Latin characters.

The bit when most people start to get confused is when we talk about how many bytes we use to represent. I did so myself until I had spent quite a lot of time looking at the issue. On windows we have ‘Wide’ characters, which are Unicode. These are stored in 16-bits. This gives rise to the impression that Unicode characters are stored in 16 bits. You then come across UTF-8 being used in web browsers and things start to get murkier. These are simply different representations of the Unicode character set. The UTF stands for Unicode Transformation Format. These are simply alternative representations of the Unicode character set. The representation used by Windows in NT is UCS-2, a 16 bit representation only able to access the first 16-bits of the Unicode character set. This isn’t too much of a problem right now as there are no languages defined outside this space yet.

UTF-8 encodes the Unicode character set as a varying number of bytes. The one quirk that makes it so loveable is that the 7-bit US-ASCII characters are represented by the 7-bit ASCII values. In other words your Standard English text is not encoded any differently in UTF-8 to how it is currently in ASCII.

If you want to specify UTF-8 text use this meta tag.

For a lot more detailed information go to