Grokking source code

I like automating my analysis/fixing of code wherever possible. While new languages that allow us to examine the parse tree will allow us a lot of control there is plenty we can do with the existing languages.

A tool that looks at the source code can do an awful lot without a great deal of understanding.

Even apparently ambigious syntax may not be as problematic as it first appears. Take Visual Objects for example. This language has standard syntax for accessing arrays using square braces a[1]. It also allows string literals to be specified in square braces as well as double quotes and single quotes, like [a string].

Telling the difference between being in a string and not being in a string is quite a fundamental problem when dealing with source code. If you don’t do that then that can have a knock on effect on dealing with comments too.

Is [/* x+ */1] a string or an array index?

The answer is to look at these in real code. In practice an array access must always come after an identifier, the array. A string does not. If there’s an operator before it then it must be a string.

This means that writing a tool for automated source code analysis doesn’t have to be anywhere as complex as it sounds. You don’t necessarily have to fully parse it, you can effectively leave it at little more than a lexical analysis.

By figuring this out and accounting for a bunch of other comment types you can do things like extracting all the strings for an application. Useful for making your application available in other languages.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s