PAUSE permissions code outline

This is a quick brain dump from my brief coding session on PAUSE working on the permissions. This won’t make sense without reading the code yourself. Unless you’re interested in modifying PAUSE or reviewing my changes I would ignore this blog post. I’m writing this for my future self as much as anyone. The behaviour I modified in my pull request is in italics.

The code for setting the permissions when a new distribution is indexed is largely in PAUSE::package. It’s triggered by ./cron/mldistwatch and PAUSE::mldistwatch, but the code that actually sets the permissions is in PAUSE::package.

perm_check checks that the current user has permission to be uploading this distro and bails if not. If this is a new package (and they do have permission) then it also adds the user to the perms table as they are effectively first come.

give_regdowner_perms adds people to the perms table where necessary. This was supporting the module permissions in amongst other things. It is now also where we copy the permissions from the main module.

The checkin method as well as adding the record to the packages table also adds to the primeur table via the checkin_into_primeur function. If the x_authority flag is used then that user is added, otherwise the main module is looked up, or failing that the uploading user.

Note that the first come (primeur) users appear in both the perms and primeur tables.

There are special checks for Perl distributions in the permissions code that will change the behaviour of the permissions. I am purposely not mentioning them as I never took the time to understand them.

A quick note on testing. As well as the tests which work of the dummy distributions in the corpus directory there is a test utility. To get a look at the permissions database in practice use the build-fresh-cpan script. This will build a one off cpan environment that you can examine. Just pass it a package to index and then you can check the permissions database.

$ perl one-off-utils/build-fresh-cpan corpus/mld/009/authors/O/OO/OOOPPP/Jenkins-Hack-0.14.tar.gz
Initialized empty Git repository in /tmp/h4L3GU6Awi/git/.git/
$ ls
cpan  db  git  Maildir  pause.log  run
$ cd db
$ ls
authen.sqlite  mod.sqlite
$ sqlite3 mod.sqlite 
SQLite version 2014-10-29 13:59:56
Enter ".help" for usage hints.
sqlite> select * from primeur;
sqlite> select * from perms;

Docker logging and buffering

When you start using Docker one of the things it’s quite possibly you’ve hit is output buffering. Docker encourages you to log to stdout/err from your program, and then use docker to feed your regular logging solution.

This generally exhibits itself as you seeing no logging, then after the program has been running for a while you come back to the logs and find they are all there. It doesn’t get as much press as caching but it can be just as head scratching.

With Perl for example the buffering kicks in when it’s not connected to a terminal. For that this somewhat idiomatic snippet can be useful,

select( ( select(\*STDERR), $|=1 )[0] );
select( ( select(\*STDOUT), $|=1 )[0] );

As ever with Perl there is more than one way to do it of course…

Note that you could experience this sort of thing when using other logging setups. If you pipe to logger to output to rsyslog you could experience the same issues.

Debugging Web API traffic

Note that this blog post mostly assumes you’re operating on Linux. If you’re using Windows just use Fiddler. It probably does everything you need. Actually, having just looked at their page it looks like it may well work on lots of places other than Windows too, so it might be a good option.

When developing programs that consume API’s that make use of HTTP at some level it’s often useful to check what is actually going over the wire. Note that this is talking about unencrypted traffic at this level. If you’re talking to a server over HTTPS you will need to MITM your connection or get the keys for the session using the SSLKEYLOGFILE environment variable.

The simplest way to capture traffic is generally to use tcpdump. Wireshark is often a good tool for looking at network traces, but for lots of HTTP requests it tends to feel clunky. This is where I turn to a python utility named pcap2har. This converts a packet capture to a HAR file. A HAR file is essentially a json file containing the HTTP requests. It has an array of the requests with each request noting the headers and content of the parts of the request/response. The HAR file format is documented here.

You will actually find that Google Chrome allows you to export a set of requests from it’s Network tab of the Developer toolbar as a HAR file too.

The pcap2har utility isn’t packaged in a particularly pythonic way, and it doesn’t actually extract the request body so I created a minor tweak to it on a branch. This branch does extract the request body which is often used in API calls. You’ll need to pip install dpkt, the rest of the dependencies are bundled in the repo. Then you run it like this,

git clone --branch request_body
cd pcap2har
sudo pip install dpkt
tcpdump port 8069 -w packets.dump
pcap2har packets.dump traffic.har

Having said that HAR is a lot easier to consume, there are viewers, but I’ve not found one that I particularly liked. I tend to either pretty print the json and look at it in a text editor, or then use grep or code to extract the information I want.

For OpenERP/Odoo API calls I created a quick script to explode the contents of the API calls out to make it easier to read. It explodes the xml/json contents within the requests/responses out to json at the same level as the rest of the HAR data, rather than having encoded data encoded within json.