LXC Networking

My infrequently used laptop appears to have developed some issues with the networking on new lxc containers (Ubuntu 13.10) so I finally started to dig into how it’s setup. I first blew away the caches in /var/cache/lxc, then tried installing a fresh box with a completely stock template. The network interface didn’t acquire one of the usual network ip addresses. A look in the logs suggested the dhcp server was receiving a DHCP request, and it was sending back a lease, but the client wasn’t picking that up.

tail /var/log/syslog
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 00:22:3e:ea:aa:fa 

A google suggested my problem was the UDP checksums – bug 1204069. A look using tcpdump suggested that indeed my checksums were corrupt; note the ‘bad udp cksum’ on the second packet.

sudo tcpdump -vvv -i lxcbr0
21:36:34.009788 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) > [udp sum ok] BOOTP/DHCP, Request from 00:22:3e:ea:aa:fa (oui Unknown), length 300, xid 0xb407cc75, secs 49, Flags [none] (0x0000)
	  Client-Ethernet-Address 00:22:3e:ea:aa:fa (oui Unknown)
	  Vendor-rfc1048 Extensions
	    DHCP-Message Option 53, length 1: Discover
	    Parameter-Request Option 55, length 13: 
	      Subnet-Mask, BR, Time-Zone, Default-Gateway
	      Domain-Name, Domain-Name-Server, Option 119, Hostname
	      Netbios-Name-Server, Netbios-Scope, MTU, Classless-Static-Route
	    END Option 255, length 0
	    PAD Option 0, length 0, occurs 41
21:36:34.010027 IP (tos 0xc0, ttl 64, id 44699, offset 0, flags [none], proto UDP (17), length 328) > [bad udp cksum 0x1c2d -> 0x1a4c!] BOOTP/DHCP, Reply, length 300, xid 0xb407cc75, secs 49, Flags [none] (0x0000)
	  Client-Ethernet-Address 00:22:3e:ea:aa:fa (oui Unknown)
	  Vendor-rfc1048 Extensions
	    DHCP-Message Option 53, length 1: Offer
	    Server-ID Option 54, length 4:
	    Lease-Time Option 51, length 4: 3600
	    RN Option 58, length 4: 1800
	    RB Option 59, length 4: 3150
	    Subnet-Mask Option 1, length 4:
	    BR Option 28, length 4:
	    Default-Gateway Option 3, length 4:
	    Domain-Name-Server Option 6, length 4:
	    END Option 255, length 0
	    PAD Option 0, length 0, occurs 8

The bug suggests that that should no longer be a problem, so I carried on googling. Turning my search criteria to DHCP udp problems. That came up with bug 930962 which gave a potential work around and suggested that it should be fixed. Since it said there should already be a firewall fix in the config I decided to take a look at the configuration. To figure out where that was I used dpkg -L

dpkg -L lxc

After a little digging I found the network configuration in /etc/init/lxc-net.conf where the setup of all the networking out of the box that I so like about ubuntu was all there. There was no mangle line as suggested in the bug report. At that point I checked the version I was running, and the version on the bug report and realised that was a pretty recent bug fix, and I simply don’t have it. Rather than figure out where the fixed package is, I figured I may as well just try to fix it myself for now since I’ve come so far.

sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport bootpc -j CHECKSUM --checksum-fill

After that the networking came up, or more specifically the server obtained an ip address and I was able to ssh to it. I added pretty much that line into my /etc/init.lxc-net.conf and it now works out of the box again and I have a better understanding of how lxc sets up it’s network. I’ve also finally solved a case of checksum offload issues for myself rather than just hearing about them. Possibly in the strangest way, i.e. not turning off the network offload of the checksums.

Of course the adventure didn’t end there. I then found that salt wasn’t working. When I looked for salt keys I didn’t find any requests from new servers. A tcpdump showed a similar checksum issue with the salt traffic. A quick hack with iptables again solved that problem.

sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport 4505 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --sport 4505 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport 4506 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --sport 4506 -j CHECKSUM --checksum-fill

Testing out uninstall to fix a perl modules dependency issue

Just recently after an upgrade to some CPAN modules I started getting this crash on one of my machines when the Catalyst::View::JSON was loaded.

#     Error:  Couldn't instantiate component "TestApp::View::JSON", "Recursive inheritance detected in package 'Types::Serialiser::BooleanBase' at (eval 1547) line 76."Compilation failed in require at (eval 4) line 2.

The actual source of that error appears to be JSON::XS rather than Types::Serialiser::BooleanBase or TestApp::View::JSON.

I didn’t investigate the error properly, or really fix it properly. Instead I tested out one of the newer features of cpanm, the -U uninstall flag. I simply uninstalled JSON::XS and hey presto, no more crash.

cpanm -U JSON::XS
JSON::XS contains the following files:


Are you sure you want to uninstall JSON::XS? [y] y

That probably warrants some explanation. The new Catalyst::Runtime now appears to pull in the new alternative to JSON::XS, Cpanel::JSON::XS so this can now be used instead, and so things just worked. It’s probably a bit drastic a solution for most systems at the moment, I’m sure it will demonstrate any places where I have direct dependencies on JSON::XS. On my development box that should be handy however. I’d rather be using a single library for that single purpose.

Getting started with Salt

I’ve finally started to use the salt stack in anger now and I thought I’d make some quick notes on what I thought I was slow to pick up on.

Specifying that a box needs to have x services installed is all done in the configuration. I understood that there are two aspects to salt, one is running things via the command line, the second is configuration management. I thought I’d be able to do something like ‘salt provision box-a service-b’. Instead you generally put the machine names and what they should have in top.sls in the /srv/salt directory along with all the other config.

Pillars appears to be intended to provide the site specific data generally. In general the salt .sls files are the config that can be used everywhere, sometimes templated, and the pillar data can be used in those templates to insert data specific to a site/machine. This means you should be able to use the same basic salt setup on multiple salt-masters, and simply change the data on them to generate machines with different users and setups.

Salt appears to be very opaque to start with. It took me too long to realise that like most things, it logs. In fact it logs very well so you can really turn the log level up to a very high level if you are troubleshooting. It can be interesting to turn it up if you want to see what it’s doing in real time, otherwise you just see problems logged.

It’s well worth taking a look at the state documentation as you’re looking at the tutorials. I found the examples in the tutorial relatively hard to pick apart until I was able to see the references for the various states.


The file state is particularly worth taking a look at. A lot of deployment is pushing files about and modifying them.


Make sure you run the latest versions from salt’s official repositories if possible. The ones packaged with your OS are likely to be old. And don’t run inconsistent versions between your masters and minions. That way leads to fail.

If something obvious doesn’t appear to work, try it with a different package. I found my simple couchdb installation didn’t work quite right out of the box. I tried exactly the same thing with memcached and it did. This turned out to be a bug.

I’m still early on my journey into salt, but so far it appears to be very useful.

Salt service running

If you’re using salt to ensure you’re running your services and you constantly see your service being started it might be because the service doesn’t support the status command.

    State: - service
    Name:      openerp
    Function:  running
        Result:    True
        Comment:   Started Service openerp
        Changes:   openerp: True

If you tweak the salt minion to log trace messages you’ll see that salt calls service openerp status.

log_level: trace

# service salt-minion restart

2013-10-02 19:26:32,602 [salt.loaded.int.module.cmdmod][INFO    ] Executing command 'service openerp status' in directory '/root'
2013-10-02 19:26:32,611 [salt.loaded.int.module.cmdmod][INFO    ] Executing command 'service openerp start' in directory '/root'

If we try that on the command line we’ll see that the openerp service doesn’t support the status option.

root@openerp-aq:~# service openerp status
Usage: openerp-server {start|stop|restart|force-reload}

That means salt gets an error and assumes the service isn’t running so it starts it.

The service state has an alternative way to check for the service running by looking at ps and grepping for the process. This is done by specifying the sig key.

    - running
    - sig: openerp
    - require:
      - pkg: openerp

Now we finally see what we want, salt leaving the service running.

    State: - service
    Name:      openerp
    Function:  running
        Result:    True
        Comment:   The service openerp is already running

In the log we now see,

2013-10-02 19:31:15,371 [salt.loaded.int.module.cmdmod][INFO    ] Executing command "ps -efH | grep 'openerp' | grep -v grep | awk '{print $2}'" in directory '/root'

Grepping for a single, arbitrary character in a bunch of files.

I had a random error complaining about being unable to read a source file without any real explanation of which file was the problem.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 25: ordinal not in range(128)

The most logical way to rip through a number of files is find. On linux xxd is the simplest tool for hex dumping and grep can then look at it’s output. The problem occurs when you realise you can’t use | in an -exec with find, or at least I haven’t figured out how. The simplest way to work around that is to put your command into a tiny shell script and -exec that. Obvious really, but it always takes me a moment to remember that.

Note that this is for grepping for a single byte. Grepping for multiple bytes would require a different approach entirely.

cat > bingrep.sh
xxd -g 1 $2 | grep -i "\<$1\>" -q
chmod +x bingrep.sh
find . -type f -name "*.py" -exec ./bingrep.sh c3 {} \; -print

That allows me to look for the rogue 0xc3 characters in the source code I was dealing with.

I suspect I ought to be able to do something similar with regex’s, but you run the risk of stupid interpretation problems. Sometimes it’s simpler to just look at it in the raw, relatively speaking.

A few tips on using the OpenERP XML RPC API

I’ve been developing applications that talk to OpenERP via it’s API for a little while now and I figured it would be worth noting down some general pointers for using it. The API is used by all the UI clients for OpenERP so it definitely allows you to do everything that they do. It’s also got a reasonable amount of access to the data layer so you can do an awful lot via it. From simply creating interfaces to it, to streamlining a process with a slick UI, there are a lot of possibilities with it.

Here are my tips for developing with it,

  1. Run OpenERP with the --log-request and --log-response command line flags to see how OpenERP achieves it’s tasks. It’s also helpful to see what your application is sending.
  2. Don’t be afraid to read the code. Watching OpenERP is a great starting point, but sometimes you need to look at the code to understand what a parameter is for, or how it’s getting the data.
  3. Don’t be afraid to use the debugger. Putting --debug on the command line allows the python debugger to kick in when there is an exception. It also allows you to stick ‘import pdb; pdb.set_trace()’ onto a line in the code you want to investigate and drop into there in the debugger.
  4. Pass the context like you are told to, it makes life easier if you are setup to simply pass it along with your calls (but also be ready to add data to it). The context contains things like the language which is used for translations. You want your applications language to be consistent with any users of OpenERP itself otherwise things will get weird. Also note that some calls require extra information to be passed in via it. It’s not a completely opaque blob that you simply pass about between calls.
  5. Limit the columns you ask for. When you make a read call you can specify which columns you are interested in. Do that from the start or you’ll end up with performance problems later on. Once you have a partner with 1000’s of orders and invoices etc. a simple read of the res.partner will take a significant chunk of time if you aren’t limiting what you read from it.
  6. Don’t be afraid to extend OpenERP. Even if you aren’t a Python developer by trade, if you’re doing serious data modification you’re better off creating a module and calling a method on that than making lots of API calls. The main reason for that is to get everything into a single transaction. The other reason is speed.
  7. Be careful about filtering by calculated fields, they generally get turned into a ‘TRUE’ statement in the SQL. This can really screw things up when you have OR conditions in your filter. Use the --log-sql if you’re unsure what is happening.
  8. False is used for Null/None in the XML RPC
  9. Returning a dictionary from an OpenERP method via the API requires the keys to be strings. i.e you can’t simply return { id: quantity }. You’d need to do { str(id): quantity } to prevent a crash. None is also a no-no for dictionary values. Convert them to False if you want to be consistent with OpenERP.
  10. Formatting of numbers to n decimal places is largely done client side. OpenERP provides you all the info you need, but you need to deal with that yourself.
  11. Beware of gapless ir.sequence sequences in batch jobs. They have a high potential for causing contention.

Postgres table locks

I’ve just been looking into some issues with locking in Postgres and the documentation as ever has been excellent.


A closer look at the queries provided suggests they don’t show table level locks. At the very least if you do a pg_dump of your database while checking those queries you see nothing, despite there definitely being some locks going on. This query probably isn’t perfect, and is simply based on a quick practical test of running pg_dump against a test db but it may help spot the table locks which could be blocking things.

select pid, usename, datname, current_query 
from pg_catalog.pg_locks l 
inner join pg_catalog.pg_stat_activity a on a.procpid = pid 
where mode like '%ExclusiveLock%';

OpenERP debugging tip – turn off cron

While we’ve been doing a lot of OpenERP deployment we have been discovering various ways to configure it and one turns out to be very handy for debugging. If you’ve ever debugged OpenERP using the --debug flag and dropped into the debugger you have probably noticed the system carries on doing things while you’re sat at the debugger prompt. Often obliterating what you were looking at. This generally happens because OpenERP generally runs with multiple threads out of the box, and some of those threads do the ‘cron’ jobs, the background tasks, so even if you haven’t tried to do anything on the website, there will be activity. If you want to prevent the cron activity from making your debugging session more confusing than it needs to be add the --max-cron-threads=0 flag when you run OpenERP.

DBIx::Class and Postgres tweaks at startup

After connection you can do simple commands thanks to the on_connect_do connection setting. One thing I sometimes do is turn down the whining. Postgres can be quite noisy when you’re creating tables,

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "blah_pkey" for table "blah"
NOTICE:  CREATE TABLE will create implicit sequence "blah_id_seq" for serial column "blah.id"

So I sometimes SET client_min_messages=WARNING at connection. That makes deploying a schema a lot less verbal.

If you want to deploy to a different schema within a database you can also do a ‘SET search_path TO’ statement at that point too. That can be quite handy if you want to deploy the same tables again to an alternative schema within a database.

perl -I lib -MMyModule::Schema::DB -e "MyModule::Schema::DB->connect('dbi:Pg:dbname=database;host=postgres', 'username', 'password', { on_connect_do => 'SET search_path TO temp' })->deploy"

If you’re setting up a Catalyst config for DBIC you can set the connection options like this,

    connect_info dbi:Pg:dbname=database
    connect_info username
    connect_info password
        on_connect_do  [ SET client_min_messages=WARNING ]
        quote_names 1

Adhoc parameters to joins in DBIx::Class

Update: there are a few updates to the caveats on this post based on the comments by Peter Rabbitson.

I’ve been using DBIx::Class for a couple of years now and I’ve always been impressed with it’s flexibility.  I’ve rarely felt the need to write my own SQL queries because it’s simply so good at it, and it’s generally easy to get it to do what I want.

The one exception to that was custom adhoc joins.  In general DBIx::Class wants to know about how you’re going to join up front.  Anything else tends to require a custom view.  

The other night I realised I could come up with a way to deal with slightly more complex joins while still making the most of all my result definitions.  The extended relationships explained by frew on his blog demonstrate how to add decent join conditions.  The one thing missing was adhoc parameters.  They can be added by providing a bind parameter.  Since relationships don’t traditionally require any extra search parameters, I’d recommend indicating that the relationship isn’t for public consumption, and providing a wrapper method around it.

For example, here is a the relationship in the Result class,

  _date_range => 'DBICTest::Schema::CD',
  sub {
    my $args = shift;
    return (
      { "$args->{foreign_alias}.artist" => { -ident => "$args->{self_alias}.artistid" },
        -and => [
            "$args->{foreign_alias}.year"   => { '>' => \"?" },
            "$args->{foreign_alias}.year"   => { '<' => \"?" },

And then a method to exploit it in the ResultSet class.

sub join_date_range
    my $self = shift;
    my $start = shift;
    my $end = shift;
    $self->search(undef, {
        join => ['_by_name'],
        bind => [ $start, $end ],

Then you can use it like this,

$schema->resultset("Artist")->join_date_range(1990, 1992)->search(...)->all;

You can of course do more complex joins too, even joining up multiple times.  All you need to do is specify all the necessary bind parameters.

The most impressive thing is that the joins work fine even with the chained searches that DBIC is so good at.  You can of course also do search_related and most of the usual links, you just need to specify the bind parameters.  

There are a few of caveats however.  This isn’t strictly speaking an intentional feature, or at least it wasn’t consciously designed to work this way (as far as I know).

  1. Attempting to do a prefetch across the join fails with the error “Can't prefetch has_many _date_range (join cond too complex)”. Update: there is a new version of DBIC in the pipeline that fixes this. Right now that version is 0.08241-TRIAL. The feature you’re probably looking for in the Changes file is “Multiple has_many prefetch” when checking if a new live release has it.
  2. You might have looked at that -and => [] construction and thought that’s a bit pointless, a standard hash would be simpler and achieve the same effect. Unfortunately the bind parameters are positional, and a hash doesn’t have a guaranteed order. That means you need to be extra careful with your query when you have multiple parameters you need to specify, to ensure the binds happen to the correct place holders. Update: as Peter Rabbitson pointed out, it’s not actually that simple. DBIC does try to make sure you have a guaranteed order by sorting the keys of the hashes so that it always produces the same SQL. This means that you probably just need to try it and see which order in which it requires the parameters most of the time.
  3. Update: I was incorrect about not being able to use extra binding information with bind, the syntax Peter Rabbitson suggested works perfectly. i.e. bind => [ [ $opt1 => $val1 ], [ $opt => $val 2 ]… ]The final caveat is that the bind parameters don’t currently take any extra type information. Normally most of the places you are exposed directly to bindings you can specify types in order to help DBIC create the correct query. It doesn’t appear to be possible to provide that information via the bind parameter on a search.

This isn’t strictly a documented feature, but hopefully it’s helpful to a few people. If you’re wondering why you’d need to do this at all, consider the difference between these two queries.

FROM a LEFT JOIN b ON a.id = b.owner_id AND a.name = b.name


FROM a LEFT JOIN b ON a.id = b.owner_id
WHERE a.name = b.name

In the course of figuring this out, I also discovered the -ident key which indicates that you’re dealing with something like a column name, and should therefore be quoted if you have column name quoting turned on.  A handy feature to go along with using references to pass in literal bits of SQL.