Debugging Web API traffic

Note that this blog post mostly assumes you’re operating on Linux. If you’re using Windows just use Fiddler. It probably does everything you need. Actually, having just looked at their page it looks like it may well work on lots of places other than Windows too, so it might be a good option.

When developing programs that consume API’s that make use of HTTP at some level it’s often useful to check what is actually going over the wire. Note that this is talking about unencrypted traffic at this level. If you’re talking to a server over HTTPS you will need to MITM your connection or get the keys for the session using the SSLKEYLOGFILE environment variable.

The simplest way to capture traffic is generally to use tcpdump. Wireshark is often a good tool for looking at network traces, but for lots of HTTP requests it tends to feel clunky. This is where I turn to a python utility named pcap2har. This converts a packet capture to a HAR file. A HAR file is essentially a json file containing the HTTP requests. It has an array of the requests with each request noting the headers and content of the parts of the request/response. The HAR file format is documented here.

You will actually find that Google Chrome allows you to export a set of requests from it’s Network tab of the Developer toolbar as a HAR file too.

The pcap2har utility isn’t packaged in a particularly pythonic way, and it doesn’t actually extract the request body so I created a minor tweak to it on a branch. This branch does extract the request body which is often used in API calls. You’ll need to pip install dpkt, the rest of the dependencies are bundled in the repo. Then you run it like this,

git clone https://github.com/OpusVL/pcap2har.git --branch request_body
cd pcap2har
sudo pip install dpkt
tcpdump port 8069 -w packets.dump
pcap2har packets.dump traffic.har

Having said that HAR is a lot easier to consume, there are viewers, but I’ve not found one that I particularly liked. I tend to either pretty print the json and look at it in a text editor, or then use grep or code to extract the information I want.

For OpenERP/Odoo API calls I created a quick script to explode the contents of the API calls out to make it easier to read. It explodes the xml/json contents within the requests/responses out to json at the same level as the rest of the HAR data, rather than having encoded data encoded within json.

Hand Coding SQL with psycopg2 for Odoo

I thought it would be a good idea to write down some guidelines on writing hand coded SQL for Odoo/OpenERP. One of the general themes should be ‘avoid SQL injection’, and Odoo themselves provide some documentation explaining that. They also make the point before that you should try to stick with the ORM as much as possible too. Rolling your own SQL bypasses all the access controls on records, so if you are writing some SQL, always consider whether the user is supposed to have access to the data extracted or manipulated by the SQL.

Assuming you’re still set on writing SQL it’s worth noting that Odoo uses psycopg2, a mature and respectable library for accessing Postgres from Python. Dealing with parameters in SQL clauses is simple as it allows you to pass them through as parameters, just as you’d hope.

cr.execute('select * from res_partner where name = %s', (partner_name,))

Unfortunately it’s too simplistic to say never use string concatenation. In practice you often want to stitch together a complex query based on some data. You might want to vary the search criteria based on user input.

Here are some concrete scenarios to demonstrate more complex situations. Read the psycopg2 documentation for how to pass in values, and how it converts the various types first.

In clause

This is the most common place people give up with parameterisation. It’s clearly documented in Psycopg2, but since people often don’t read the docs, they don’t realise %s works and so they end up doing something involving join and their own format strings. The syntax is really simple,

cr.execute('select * from res_partner where name in %s', (['Partner 1', 'Partner 2'],))

Optional clauses

The trick with building up optional parts is to maintain lists of parameters alongside lists of the bits of SQL you need. The exact mechanism for appending the strings isn’t terribly important, and in fact you might find it useful to write yourself a nice class to make all that string construction simple and reliable. The key thing is to never mix untrusted data into the SQL you are generating.

clauses = []
params = []
if partner_name:
    clauses.append('name = %s')
    params.append(partner_name)
if website:
    clauses.append('website = %s')
    params.append(website)
where_clause = ""
if len(clauses) > 0:
    where_clause = "where " + ' and '.join(clauses)
   
sql = "select id, name from res_partner " + where_clause
cr.execute(sql, params)

Refactoring out chunks of SQL generation

If you find yourself requiring reuse with snippets of SQL or subqueries it’s quite common to refactor that out into a method. This is another common place for mistakes. Return SQL and a list of parameters whenever you do this, rather than just a block of SQL.

def stock_level(product_ids):
    sql = """
    select location_dest_id as location_id, product_uom, sum(product_qty) as qty
    from stock_move 
    where product_id IN %s
    and state = 'done'
    group by  location_dest_id, product_uom
    union all 
    select location_id, product_uom, 0 - sum(product_qty) as qty
    from stock_move 
    where product_id IN %s
    and state = 'done'
    group by  location_id, product_uom
    """
    return (sql, [product_ids, product_ids])

snippet, params = stock_level(product_ids)
sql = """
select location_id, product_uom, sum(qty) as stock_level
from (
    %s
) as stock
group by location_id, product_uom
having sum(qty) > 0 ;
""" % snippet
cr.execute(sql, params)

Using the same parameter twice

The previous example uses the same parameter twice in the query, and passes it in twice. This can be avoided by using a named arguments like %(name)s. This can also be useful when you have a very large query requiring lots of parameters as it allows you to pass in a dictionary and not worry about the ordering of the arguments you pass it.

def stock_level(product_ids):
    sql = """
    select location_dest_id as location_id, product_uom, sum(product_qty) as qty
    from stock_move 
    where product_id IN %(product_id)s
    and state = 'done'
    group by  location_dest_id, product_uom
    union all 
    select location_id, product_uom, 0 - sum(product_qty) as qty
    from stock_move 
    where product_id IN %(product_id)s
    and state = 'done'
    group by  location_id, product_uom
    """
    return (sql, {"product_id": product_ids})

snippet, params = stock_level(product_ids)
sql = """
select location_id, product_uom, sum(qty) as stock_level
from (
    %s
) as stock
group by location_id, product_uom
having sum(qty) > 0 ;
""" % snippet
cr.execute(sql, params)

Fixing badly refactored code

If you do find yourself with a function that is returning SQL only, and you can’t change the function prototype because external code is depending on it, psycopg2 provides the mogrify function that can help. It does require the cursor to be in scope.

def stock_level(cr, uid, product_ids):
    sql = """
    select location_dest_id as location_id, product_uom, sum(product_qty) as qty
    from stock_move 
    where product_id IN %s
    and state = 'done'
    group by  location_dest_id, product_uom
    union all 
    select location_id, product_uom, 0 - sum(product_qty) as qty
    from stock_move 
    where product_id IN %s
    and state = 'done'
    group by  location_id, product_uom
    """
    return cr.mogrify(sql, [product_ids, product_ids])

snippet = stock_level(cr, uid, product_ids)
sql = """
select location_id, product_uom, sum(qty) as stock_level
from (
    %s
) as stock
group by location_id, product_uom
having sum(qty) > 0 ;
""" % snippet
cr.execute(sql)

Allowing the caller to specify fields/tables

This is something where the libraries generally don’t help. You need to sanitize this code yourself. In general try to avoid fully untrusted output. It’s better to have a whitelist of what you expect to allow, and allow them to select from those known good tables/fields. Also remember that you can query the database for what the structure of the database is if you don’t know it when writing the code. If you really can’t check either, sanitize the input heavily. Try to allow the absolute minimum characters within identifiers and also consider using an extra qualifier so that you can identify programmatically created fields/tables and prevent name clashes with core database functions.

In summary

Keep your string manipulation safe, never append untrusted input (note that data you retrieved from the database should generally be treated as untrusted). Assuming you are doing it right, any query without parameters should look start to look suspicious. There are of course simple queries that take no parameters, but they should be easy to verify as safe.

Note that the code is not from a live system and may not actually work (it was typed up for this post and not tested). It’s designed to illustrate the principles rather than be blindly copy pasted.

New Salt version break

I started getting these error messages last week and I discovered it was actually caused by an incompatibility between the new version 2014.01 and an old server 0.17. In general if you’re finding strange errors, particularly with configuration that used to work, double check your version numbers match. You can always do a ‘salt machine test.version’ to check the clients version.

salt machine state.highstate
machine:
----------
	State: - no
	Name:  	states
	Function:  None
    	Result:	False
    	Comment:   No Top file or external nodes data matches found
    	Changes:

Or via salt-call state.highstate

[ERROR   ] Got a bad pillar from master, type bool, expecting dict: False

Getting started with Salt

I’ve finally started to use the salt stack in anger now and I thought I’d make some quick notes on what I thought I was slow to pick up on.

Specifying that a box needs to have x services installed is all done in the configuration. I understood that there are two aspects to salt, one is running things via the command line, the second is configuration management. I thought I’d be able to do something like ‘salt provision box-a service-b’. Instead you generally put the machine names and what they should have in top.sls in the /srv/salt directory along with all the other config.

Pillars appears to be intended to provide the site specific data generally. In general the salt .sls files are the config that can be used everywhere, sometimes templated, and the pillar data can be used in those templates to insert data specific to a site/machine. This means you should be able to use the same basic salt setup on multiple salt-masters, and simply change the data on them to generate machines with different users and setups.

Salt appears to be very opaque to start with. It took me too long to realise that like most things, it logs. In fact it logs very well so you can really turn the log level up to a very high level if you are troubleshooting. It can be interesting to turn it up if you want to see what it’s doing in real time, otherwise you just see problems logged.

It’s well worth taking a look at the state documentation as you’re looking at the tutorials. I found the examples in the tutorial relatively hard to pick apart until I was able to see the references for the various states.

http://docs.saltstack.com/ref/states/all

The file state is particularly worth taking a look at. A lot of deployment is pushing files about and modifying them.

http://docs.saltstack.com/ref/states/all/salt.states.file.html#module-salt.states.file

Make sure you run the latest versions from salt’s official repositories if possible. The ones packaged with your OS are likely to be old. And don’t run inconsistent versions between your masters and minions. That way leads to fail.

If something obvious doesn’t appear to work, try it with a different package. I found my simple couchdb installation didn’t work quite right out of the box. I tried exactly the same thing with memcached and it did. This turned out to be a bug.

I’m still early on my journey into salt, but so far it appears to be very useful.

Skippng Python unit tests if a dependency is missing (fixed)

I got some feedback on my previous post about skipping tests in python unittests pointing out my solution was flawed.  As Mu Mind pointed out, the denizens of stackoverflow pointed out the solution has a problem when run directly from python.  At first I didn’t realise how flawed; technically I had run my tests via python and nosetest regularly.  I just hadn’t realised that I’d never run the tests via python when I was missing the dependency.  If you do that you get this ugly error,

ERROR: test_openihm_gui_interface_db_mixin (unittest.loader.ModuleImportFailure)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_openihm_gui_interface_db_mixin
Traceback (most recent call last):
  File "python2.7/unittest/loader.py", line 252, in _find_tests
    module = self._get_module_from_name(name)
  File "python2.7/unittest/loader.py", line 230, in _get_module_from_name
    __import__(name)
  File "tests/test_openihm_gui_interface_db_mixin.py", line 6, in 
    raise unittest.SkipTest("Need PyQt4 installed to do gui tests")
SkipTest: Need PyQt4 installed to do gui tests

It does tell you what the problem was clearly, but it really wasn’t the intention.  The idea was to silently skip the test.

From the answers and comments on the stackoverflow post I stitched together this ugly but hopefully working solution for whatever your unit test runner of choice is.

import unittest
try:
    import PyQt4
    # the rest of the imports


    class TestDataEntryMixin(unittest.TestCase):
        def test_somefeature(self):
            # actual tests go here.

except ImportError, e:
    if e.message.find('PyQt4') >= 0:
        class TestMissingDependency(unittest.TestCase):

            @unittest.skip('Missing dependency - ' + e.message)
            def test_fail():
                pass
    else:
        raise

if __name__ == '__main__':
    unittest.main()

I dislike the fact that I can’t hide away the logic at the top, but surrounding your whole test with the code works.  Then if the import fails we create a dummy test case that does a skip to indicate the problem. I’ve also tried to ensure that we only catch the exception we’re expecting, and pass through any we aren’t.

Now if you run the tests in verbose mode you’ll see this when there is a missing dependency.

test_fail (test_openihm_gui_interface_mixins.TestMissingDependency) ... skipped 'Missing dependency - No module named PyQt4'

Hooking into the OpenERP ORM

I’ve been hooking into the OpenERP ORM layer of a few of the models to add full text search via an external engine. Thanks to the way OpenERP is structured that appears to be quite a reliable approach. As I was doing it I found that I wanted a common set of hooks on several models. Doing that suggested I should refactor the hooks to a mixin or a base class. After playing about with my OpenERP module I’ve come to the conclusion that creating a base class seems to be the most reliable way to hook the methods. The one trick you need to be aware of is the _register flag that you want to set to False for your base class.

class search_base(osv.osv):

    _register = False

    def write(self, cr, user, ids, vals, context=None):
       success = super(search_base, self).write(cr, user, ids,
                                                    vals, context)
       … do stuff
       return success


class product_template_search(search_base):
    _name = "product.template"
    _inherit = "product.template"
    _register = True


class product_search(search_base):
    _name = "product.product"
    _inherit = "product.product"
    _register = True

product_search()
product_template_search()

Without that you’ll end up with the orm whining that the search_base class has no _name attribute as it tries to register it as a model.

openerp.osv.orm: The class search_base has to have a _name attribute
openerp.netsvc: ValueError
The class search_base has to have a _name attribute
> /usr/lib/pymodules/python2.7/openerp/osv/orm.py(967)__init__()
-> raise except_orm('ValueError', msg)

Skippng Python unit tests if a dependency is missing

Update: this is a flawed method of skipping the tests, I’ve written up an improved version based on the feedback I received.

In all the examples of using python unittest to skip tests I’ve not seen anyone explaining how to skip tests if a library isn’t installed.

The simplest way to do it appears to be to simply try to import the relevant libraries and catch the exception thrown if the library isn’t there.

import unittest
try:
    import PyQt4.QtCore
    import PyQt4.QtGui
except:
    raise unittest.SkipTest("Need PyQt4 installed to do gui tests")

This example at the top of a test file simply skips all the tests if PyQt4 isn’t available.