PAUSE permissions code outline

This is a quick brain dump from my brief coding session on PAUSE working on the permissions. This won’t make sense without reading the code yourself. Unless you’re interested in modifying PAUSE or reviewing my changes I would ignore this blog post. I’m writing this for my future self as much as anyone. The behaviour I modified in my pull request is in italics.

The code for setting the permissions when a new distribution is indexed is largely in PAUSE::package. It’s triggered by ./cron/mldistwatch and PAUSE::mldistwatch, but the code that actually sets the permissions is in PAUSE::package.

perm_check checks that the current user has permission to be uploading this distro and bails if not. If this is a new package (and they do have permission) then it also adds the user to the perms table as they are effectively first come.

give_regdowner_perms adds people to the perms table where necessary. This was supporting the module permissions in amongst other things. It is now also where we copy the permissions from the main module.

The checkin method as well as adding the record to the packages table also adds to the primeur table via the checkin_into_primeur function. If the x_authority flag is used then that user is added, otherwise the main module is looked up, or failing that the uploading user.

Note that the first come (primeur) users appear in both the perms and primeur tables.

There are special checks for Perl distributions in the permissions code that will change the behaviour of the permissions. I am purposely not mentioning them as I never took the time to understand them.

A quick note on testing. As well as the tests which work of the dummy distributions in the corpus directory there is a test utility. To get a look at the permissions database in practice use the build-fresh-cpan script. This will build a one off cpan environment that you can examine. Just pass it a package to index and then you can check the permissions database.

$ perl one-off-utils/build-fresh-cpan corpus/mld/009/authors/O/OO/OOOPPP/Jenkins-Hack-0.14.tar.gz
Initialized empty Git repository in /tmp/h4L3GU6Awi/git/.git/
$ ls
cpan  db  git  Maildir  pause.log  run
$ cd db
$ ls
authen.sqlite  mod.sqlite
$ sqlite3 mod.sqlite 
SQLite version 2014-10-29 13:59:56
Enter ".help" for usage hints.
sqlite> select * from primeur;
sqlite> select * from perms;

Docker logging and buffering

When you start using Docker one of the things it’s quite possibly you’ve hit is output buffering. Docker encourages you to log to stdout/err from your program, and then use docker to feed your regular logging solution.

This generally exhibits itself as you seeing no logging, then after the program has been running for a while you come back to the logs and find they are all there. It doesn’t get as much press as caching but it can be just as head scratching.

With Perl for example the buffering kicks in when it’s not connected to a terminal. For that this somewhat idiomatic snippet can be useful,

select( ( select(\*STDERR), $|=1 )[0] );
select( ( select(\*STDOUT), $|=1 )[0] );

As ever with Perl there is more than one way to do it of course…

Note that you could experience this sort of thing when using other logging setups. If you pipe to logger to output to rsyslog you could experience the same issues.

Debugging Web API traffic

Note that this blog post mostly assumes you’re operating on Linux. If you’re using Windows just use Fiddler. It probably does everything you need. Actually, having just looked at their page it looks like it may well work on lots of places other than Windows too, so it might be a good option.

When developing programs that consume API’s that make use of HTTP at some level it’s often useful to check what is actually going over the wire. Note that this is talking about unencrypted traffic at this level. If you’re talking to a server over HTTPS you will need to MITM your connection or get the keys for the session using the SSLKEYLOGFILE environment variable.

The simplest way to capture traffic is generally to use tcpdump. Wireshark is often a good tool for looking at network traces, but for lots of HTTP requests it tends to feel clunky. This is where I turn to a python utility named pcap2har. This converts a packet capture to a HAR file. A HAR file is essentially a json file containing the HTTP requests. It has an array of the requests with each request noting the headers and content of the parts of the request/response. The HAR file format is documented here.

You will actually find that Google Chrome allows you to export a set of requests from it’s Network tab of the Developer toolbar as a HAR file too.

The pcap2har utility isn’t packaged in a particularly pythonic way, and it doesn’t actually extract the request body so I created a minor tweak to it on a branch. This branch does extract the request body which is often used in API calls. You’ll need to pip install dpkt, the rest of the dependencies are bundled in the repo. Then you run it like this,

git clone --branch request_body
cd pcap2har
sudo pip install dpkt
tcpdump port 8069 -w packets.dump
pcap2har packets.dump traffic.har

Having said that HAR is a lot easier to consume, there are viewers, but I’ve not found one that I particularly liked. I tend to either pretty print the json and look at it in a text editor, or then use grep or code to extract the information I want.

For OpenERP/Odoo API calls I created a quick script to explode the contents of the API calls out to make it easier to read. It explodes the xml/json contents within the requests/responses out to json at the same level as the rest of the HAR data, rather than having encoded data encoded within json.

Perl QA Hackathon report (#perlqah2016)

Thank you to all the people who sponsored the Perl QAH hackathon and all those that provided their time. It was a very productive environment for a lot of projects.

I worked on 2 primary things relating to PAUSE permissions. PAUSE (indexing) itself and a new module for testing that permissions are consistent named Test::PAUSE::ConsistentPermissions.

The permissions of distributions on CPAN are generally something you don’t think about until you become a co-maintainer of a distribution and you see red on or Then you discover that there is a system by which each module in your distribution is effectively owned by somebody, and that some others might also have been granted permission to upload a file.

Let’s take Test::DBIx::Class as an example. I’m a co-maintainer, and JJNAPIORK is the owner. There is also another Co-Maintainer.

Using the script from Test::PAUSE::ConsistentPermissions I can look at the permissions currently assigned.

$ pause-check-distro-perms Test-DBIx-Class Test::DBIx::Class
Distribution: Test-DBIx-Class
Module: Test::DBIx::Class

When I upload a new release of Test::DBIx::Class to PAUSE any new modules added will be given to me. So I will be owner and no-one will be Comaint. I then need to grant PHAYLON comaint (which I can do as owner), and then pass ownership back to JJNAPIORK.

There is an alternative mechanism involving a bit of metadata named ‘x_authority’. With that we could ensure that JJNAPIORK retains ownership of all the modules within the distribution. The downside of that is that while I would also gain comaint on those new modules as the uploader, PHAYLON would not. Since I wasn’t the owner either, I wouldn’t be able to assign him Comaint, and I would have to ask JJNAPIORK to do that instead.

I believe there were historically alternative methods for managing this, but PAUSE has been through some rationalisation and simplification of some features and those features don’t exist anymore.

I came to this hackathon wanting to work on this problem having encountered Karen Etheridge (ETHER)’s prototype Dist::Zilla::Plugin::AuthorityFromModule which suggested a potential solution.

We had a meeting with the interested parties at the hackathon about how we could deal with this scenario better. Ricardo Signes (RJBS) suggested that we could make use of the fact that there is a designated ‘main module’ for permissions. We could  use that for the defaults. This is one step better than the previously suggested solution as it makes use of some of the previous rationalisation of PAUSE permissions and won’t require authors to provide extra metadata. This should mean that permissions are much more likely to just work out of the box.

With that potential solution suggested RJBS gave me some assistance to get working on the PAUSE indexer. I started with adding tests, then wrote the code.

The change made so far is very minimal, only affecting the indexing of new packages. No changes have been made to the user interface, or the complex permissions you can currently end up with. The pull request is here – Note that it also benefits from Peter Sergeant (Perl Careers)’s work to hook the PAUSE tests into Travis giving it a green tick indicating the tests passed.

The other thing I worked on was Test::PAUSE::ConsistentPermissions to allow us to check that a distribution has consistent permissions. I created a script for checking a distro on PAUSE (not too dissimilar to Neil Bowers App::PAUSE::CheckPerms module) and a test function for using a distro’s release tests. This is a bit like Test::PAUSE::Permissions, but rather than check whether you have permission to upload the distribution, it checks whether the permissions are consistent. These two properties don’t necessarily coincide.

During the event I was also able to create a couple of minor pull requests.

Here’s the obligatory thank you to the full list of sponsors. Thank you all.

The sponsors for the Perl QA Hackathon 2016,

Hand Coding SQL with psycopg2 for Odoo

I thought it would be a good idea to write down some guidelines on writing hand coded SQL for Odoo/OpenERP. One of the general themes should be ‘avoid SQL injection’, and Odoo themselves provide some documentation explaining that. They also make the point before that you should try to stick with the ORM as much as possible too. Rolling your own SQL bypasses all the access controls on records, so if you are writing some SQL, always consider whether the user is supposed to have access to the data extracted or manipulated by the SQL.

Assuming you’re still set on writing SQL it’s worth noting that Odoo uses psycopg2, a mature and respectable library for accessing Postgres from Python. Dealing with parameters in SQL clauses is simple as it allows you to pass them through as parameters, just as you’d hope.

cr.execute('select * from res_partner where name = %s', (partner_name,))

Unfortunately it’s too simplistic to say never use string concatenation. In practice you often want to stitch together a complex query based on some data. You might want to vary the search criteria based on user input.

Here are some concrete scenarios to demonstrate more complex situations. Read the psycopg2 documentation for how to pass in values, and how it converts the various types first.

In clause

This is the most common place people give up with parameterisation. It’s clearly documented in Psycopg2, but since people often don’t read the docs, they don’t realise %s works and so they end up doing something involving join and their own format strings. The syntax is really simple,

cr.execute('select * from res_partner where name in %s', (['Partner 1', 'Partner 2'],))

Optional clauses

The trick with building up optional parts is to maintain lists of parameters alongside lists of the bits of SQL you need. The exact mechanism for appending the strings isn’t terribly important, and in fact you might find it useful to write yourself a nice class to make all that string construction simple and reliable. The key thing is to never mix untrusted data into the SQL you are generating.

clauses = []
params = []
if partner_name:
    clauses.append('name = %s')
if website:
    clauses.append('website = %s')
where_clause = ""
if len(clauses) > 0:
    where_clause = "where " + ' and '.join(clauses)
sql = "select id, name from res_partner " + where_clause
cr.execute(sql, params)

Refactoring out chunks of SQL generation

If you find yourself requiring reuse with snippets of SQL or subqueries it’s quite common to refactor that out into a method. This is another common place for mistakes. Return SQL and a list of parameters whenever you do this, rather than just a block of SQL.

def stock_level(product_ids):
    sql = """
    select location_dest_id as location_id, product_uom, sum(product_qty) as qty
    from stock_move 
    where product_id IN %s
    and state = 'done'
    group by  location_dest_id, product_uom
    union all 
    select location_id, product_uom, 0 - sum(product_qty) as qty
    from stock_move 
    where product_id IN %s
    and state = 'done'
    group by  location_id, product_uom
    return (sql, [product_ids, product_ids])

snippet, params = stock_level(product_ids)
sql = """
select location_id, product_uom, sum(qty) as stock_level
from (
) as stock
group by location_id, product_uom
having sum(qty) > 0 ;
""" % snippet
cr.execute(sql, params)

Using the same parameter twice

The previous example uses the same parameter twice in the query, and passes it in twice. This can be avoided by using a named arguments like %(name)s. This can also be useful when you have a very large query requiring lots of parameters as it allows you to pass in a dictionary and not worry about the ordering of the arguments you pass it.

def stock_level(product_ids):
    sql = """
    select location_dest_id as location_id, product_uom, sum(product_qty) as qty
    from stock_move 
    where product_id IN %(product_id)s
    and state = 'done'
    group by  location_dest_id, product_uom
    union all 
    select location_id, product_uom, 0 - sum(product_qty) as qty
    from stock_move 
    where product_id IN %(product_id)s
    and state = 'done'
    group by  location_id, product_uom
    return (sql, {"product_id": product_ids})

snippet, params = stock_level(product_ids)
sql = """
select location_id, product_uom, sum(qty) as stock_level
from (
) as stock
group by location_id, product_uom
having sum(qty) > 0 ;
""" % snippet
cr.execute(sql, params)

Fixing badly refactored code

If you do find yourself with a function that is returning SQL only, and you can’t change the function prototype because external code is depending on it, psycopg2 provides the mogrify function that can help. It does require the cursor to be in scope.

def stock_level(cr, uid, product_ids):
    sql = """
    select location_dest_id as location_id, product_uom, sum(product_qty) as qty
    from stock_move 
    where product_id IN %s
    and state = 'done'
    group by  location_dest_id, product_uom
    union all 
    select location_id, product_uom, 0 - sum(product_qty) as qty
    from stock_move 
    where product_id IN %s
    and state = 'done'
    group by  location_id, product_uom
    return cr.mogrify(sql, [product_ids, product_ids])

snippet = stock_level(cr, uid, product_ids)
sql = """
select location_id, product_uom, sum(qty) as stock_level
from (
) as stock
group by location_id, product_uom
having sum(qty) > 0 ;
""" % snippet

Allowing the caller to specify fields/tables

This is something where the libraries generally don’t help. You need to sanitize this code yourself. In general try to avoid fully untrusted output. It’s better to have a whitelist of what you expect to allow, and allow them to select from those known good tables/fields. Also remember that you can query the database for what the structure of the database is if you don’t know it when writing the code. If you really can’t check either, sanitize the input heavily. Try to allow the absolute minimum characters within identifiers and also consider using an extra qualifier so that you can identify programmatically created fields/tables and prevent name clashes with core database functions.

In summary

Keep your string manipulation safe, never append untrusted input (note that data you retrieved from the database should generally be treated as untrusted). Assuming you are doing it right, any query without parameters should look start to look suspicious. There are of course simple queries that take no parameters, but they should be easy to verify as safe.

Note that the code is not from a live system and may not actually work (it was typed up for this post and not tested). It’s designed to illustrate the principles rather than be blindly copy pasted.

DBIx::Class join pruning

DBIx::Class has a join optimiser to prune out any unnecessary joins from the queries it generates.  Occasionally this can cause code that looks good to fail unexpectedly.  The most obvious time this normally happens when using literal SQL.

The most common example of this is when doing an aggregate operation with a group by.  Here I’m trying to sum and count some values on a joined table.  This looks like pretty standard DBIC code, but it fails.  I don’t use the linked table in the group by, just in the columns where it’s my own raw SQL.

sub total
    my $self = shift;
    my @columns = $self->result_source->columns;
    my $me = $self->current_source_alias;
    return $self->search(undef, {
        join => ['items'],
        '+columns' => [
            { total => \'sum(amount) as total' },
            { lines => \'count(amount) as lines' },
        group_by => [map {"$me.$_"} @columns],

        total => { '>' => 10 }
        '+columns' => ['total', 'lines']


DBI Exception: DBD::Pg::st execute failed: ERROR:  column "amount" does not exist
LINE 1: ... (SELECT "me"."id", "me"."name", "me"."type", sum(amount) as...
                                                             ^ [for Statement "SELECT "me"."id", "me"."name", "me"."type", "me"."total", "me"."lines" FROM (SELECT "me"."id", "me"."name", "me"."type", sum(amount) as total, count(amount) as lines FROM "orders" "me" GROUP BY "me"."id", "me"."name", "me"."type") "me" WHERE ( "total" > ? )" with ParamValues: 1='10']

Looking closely at the SQL generated the items alias isn’t joined in.  The optimiser has pruned that join because it couldn’t see any reference to the column.  It turns out that DBIC does really try to do the right thing, it’s just we didn’t give it enough info to give it a hint.

Adding the ‘items.’ qualifier to the column names in the SQL allows DBIC to realise that that join is necessary.

   return $self->search(undef, {
        join => ['items'],
        '+columns' => [
            # NOTE: fully qualified name
            # even though they are not ambiguous.
            { total => \'sum(items.amount) as total' },
            { lines => \'count(items.amount) as lines' },
        group_by => [map {"$me.$_"} @columns],

With that table alias, not strictly necessary for the SQL since those column names are unambiguous, DBIC realises that it is needs the join.

SELECT "me"."id", "me"."name", "me"."type", "me"."total", "me"."lines"
  FROM (
    SELECT "me"."id", "me"."name", "me"."type", sum( items.amount ) AS total, count( items.amount ) AS lines
      FROM "orders" "me"
      LEFT JOIN "order_items" "items"
        ON "items"."order_id" = "me"."id"
    GROUP BY "me"."id", "me"."name", "me"."type"
   ) "me"
WHERE "total" > '10'

Querying debian/ubuntu packages

The debian package management system is something I’ve been using for years, and it’s served me well for all those years. For the first decade I rarely needed to use more than the common apt-get install or dpkg -i package.deb. Listing the installed packages with dpkg -l has been useful too, but apart from apt-cache search those were all the commands I needed most of the time.

Over the past 5 years I’ve been using slightly more complex queries to figure out information about packages I’ve installed. Working at a company where we are deploying and supporting debian servers on a regular basis has meant it’s been useful to poke about the .deb infrastructure a litle more. In truth most of the time I just use 2 commands, but they seem to get me through most of what I want to do.

If I don’t know which package a program comes from I can use dpkg-query with the -S flag.

$ dpkg-query -S /sbin/ss
iproute2: /bin/ss

To figure out what is provided by a package I can use the -L flag.

$ dpkg-query -L iproute2 | grep etc

That’s also useful for checking a package is still fully installed,

for f in `dpkg-query -L python-lxml`; do if [ ! -e $f ]; then echo Missing $f; fi; done

Of course there are other commands that I use periodically, but these are the ones I’ve been using frequently enough to start remembering by heart.

Vim word breaks and perl

I use vim for coding my Perl and in the most recent versions of Ubuntu I’ve not been keen on one of the changes to the syntax defaults.  The word definition has start to include : which means that class names like Blah::Thing count as one word which isn’t how I’m used to working.  Luckily vim is very configurable and the vim irc channel is also really helpful so I was able to find where the configuration was and how to override it quickly.

Finding out that it’s related to the iskeyword configuration was pretty simple, but changing that didn’t have any effect.  It turns out it is set in a Perl syntax specific configuration so overriding it is a little more tricky.

Figuring out where a configuration setting was set is done using :verb set ?

:verb set isk?
Last set from /usr/share/vim/vim74/ftplugin/perl.vim

Knowing where that was set I could then override it with an ‘after’ configuration file.

" ~/.vim/after/ftplugin/perl.vim
set iskeyword-=:

Then when I check in vim,

:verb set isk?
Last set from ~/.vim/after/ftplugin/perl.vim

For more of a discussion of the language specific configuration and the after files see this vim wiki entry –

LXC failed to bring up eth0

I came across an issue today where new lxc containers on a machine were failing to get an IP address. In fact it was worse than that, they didn’t appear to have an interface for networking. Just the loopback adaptor.

root@lxc-container:~# ifup eth0
Internet Systems Consortium DHCP Client 4.2.2
Copyright 2004-2011 Internet Systems Consortium.
All rights reserved.
For info, please visit

Cannot find device "eth0"
Bind socket to interface: No such device
Failed to bring up eth0.

On the host machine brctl didn’t show any interfaces on the bridge.

# brctl show
bridge name	bridge id		STP enabled	interfaces
lxcbr0		8000.56847afe4a99	no		

It turns out the answer was in the machines config file. Usually located in /var/lib/lxc/$machine_name/config.

Normally these contain a section like this defining the network interfaces. In this case it was entirely missing from the configuration file. = veth = up = lxcbr0 = 00:16:3e:99:99:99

Manually adding those settings to the configuration file setup a working eth0 on the machine when it was restarted.

In this case the reason for the missing section was a dodgy setting in the lxc.default_config setting that caused the machine setup to choke.

# cat `lxc-config lxc.default_config`
lxc.lxcpath = /lxc = veth = lxcbr0 = up = 00:16:3e:xx:xx:xx

In this case we had a rogue parameter, lxc.lxcpath that shouldn’t have been in the default_config. There wasn’t any visible error running lxc-create however so we didn’t spot it at first. It was only when we added -x to the shebang line in /usr/share/lxc/templates/lxc-debian that we saw an error,

root@host-machine:~# lxc-create -n nettest3 -t debian
lxc_container: unknown key lxc.lxcpath
lxc_container: Failed to parse config: lxc.lxcpath = /lxc

Once we removed the rogue lxc.lxcpath configuration setting new boxes were created as usual again.

Developing against the bleeding edge – Odoo 8.0 (before release)

I’ve been working on a project using Odoo 8.0 which at the time of writing isn’t fully released yet (the Odoo).  These are some notes on working at the bleeding edge as the release stabilises.

Working with a git version that is changing frequently means you need to upgrade your database on a fairly regular basis.  To do that the -u all command to openerp is very handy.

./openerp-server -d mydatabase -u all --stop-after-init --config=/etc/openerp/openerp-server.conf

The one feature we found caught us out initially was with the new reports.  The QWeb and bootstrap 3 were fine, it was the caching that surprised us.  As developers we tend to be trying lots of incremental changes on reports and at first it looked like our changes weren’t working.  Luckily we realised it was actually a new feature, caching.  This can be turned off against the reports assuming you have the access to the technical features.  It’s not on against every report, but invoices are one area it is used.  Partly because the reports are intended to be legal documents that shouldn’t change once they have been printed.

At the moment these are the open bugs on github I’m tracking.  If you need to apply one of the patches to get on with things just stick ‘.patch’ onto the github url and you will have a patch file you can apply in the usual fashion, i.e. patch -p1 < nnnn.patch

Some of these are caused by constant upgrading on the bleeding edge, others are things I hope get fixed before the final cut.  Note that these pull requests may not be the best way to fix the issues, but they do generally fix the symptoms so I have found them useful for getting on with my own development.

Issue 1110

Error context:
View `product.template.form`
[view_id: 259, xml_id: product.product_template_form_view_variant_button, model: product.template, parent_id: 252]

Issue 1267 – Fixed

KeyError: 'section_id'

Issue 1187

This one is just a tweak to the writeability of a few fields.

Issue 1284 – Fixed

XML RPC and None values