Testing out uninstall to fix a perl modules dependency issue

Just recently after an upgrade to some CPAN modules I started getting this crash on one of my machines when the Catalyst::View::JSON was loaded.

#     Error:  Couldn't instantiate component "TestApp::View::JSON", "Recursive inheritance detected in package 'Types::Serialiser::BooleanBase' at (eval 1547) line 76."Compilation failed in require at (eval 4) line 2.

The actual source of that error appears to be JSON::XS rather than Types::Serialiser::BooleanBase or TestApp::View::JSON.

I didn’t investigate the error properly, or really fix it properly. Instead I tested out one of the newer features of cpanm, the -U uninstall flag. I simply uninstalled JSON::XS and hey presto, no more crash.

cpanm -U JSON::XS
JSON::XS contains the following files:


Are you sure you want to uninstall JSON::XS? [y] y

That probably warrants some explanation. The new Catalyst::Runtime now appears to pull in the new alternative to JSON::XS, Cpanel::JSON::XS so this can now be used instead, and so things just worked. It’s probably a bit drastic a solution for most systems at the moment, I’m sure it will demonstrate any places where I have direct dependencies on JSON::XS. On my development box that should be handy however. I’d rather be using a single library for that single purpose.


Grepping for a single, arbitrary character in a bunch of files.

I had a random error complaining about being unable to read a source file without any real explanation of which file was the problem.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 25: ordinal not in range(128)

The most logical way to rip through a number of files is find. On linux xxd is the simplest tool for hex dumping and grep can then look at it’s output. The problem occurs when you realise you can’t use | in an -exec with find, or at least I haven’t figured out how. The simplest way to work around that is to put your command into a tiny shell script and -exec that. Obvious really, but it always takes me a moment to remember that.

Note that this is for grepping for a single byte. Grepping for multiple bytes would require a different approach entirely.

cat > bingrep.sh
xxd -g 1 $2 | grep -i "\<$1\>" -q
chmod +x bingrep.sh
find . -type f -name "*.py" -exec ./bingrep.sh c3 {} \; -print

That allows me to look for the rogue 0xc3 characters in the source code I was dealing with.

I suspect I ought to be able to do something similar with regex’s, but you run the risk of stupid interpretation problems. Sometimes it’s simpler to just look at it in the raw, relatively speaking.

A few tips on using the OpenERP XML RPC API

I’ve been developing applications that talk to OpenERP via it’s API for a little while now and I figured it would be worth noting down some general pointers for using it. The API is used by all the UI clients for OpenERP so it definitely allows you to do everything that they do. It’s also got a reasonable amount of access to the data layer so you can do an awful lot via it. From simply creating interfaces to it, to streamlining a process with a slick UI, there are a lot of possibilities with it.

Here are my tips for developing with it,

  1. Run OpenERP with the --log-request and --log-response command line flags to see how OpenERP achieves it’s tasks. It’s also helpful to see what your application is sending.
  2. Don’t be afraid to read the code. Watching OpenERP is a great starting point, but sometimes you need to look at the code to understand what a parameter is for, or how it’s getting the data.
  3. Don’t be afraid to use the debugger. Putting --debug on the command line allows the python debugger to kick in when there is an exception. It also allows you to stick ‘import pdb; pdb.set_trace()’ onto a line in the code you want to investigate and drop into there in the debugger.
  4. Pass the context like you are told to, it makes life easier if you are setup to simply pass it along with your calls (but also be ready to add data to it). The context contains things like the language which is used for translations. You want your applications language to be consistent with any users of OpenERP itself otherwise things will get weird. Also note that some calls require extra information to be passed in via it. It’s not a completely opaque blob that you simply pass about between calls.
  5. Limit the columns you ask for. When you make a read call you can specify which columns you are interested in. Do that from the start or you’ll end up with performance problems later on. Once you have a partner with 1000’s of orders and invoices etc. a simple read of the res.partner will take a significant chunk of time if you aren’t limiting what you read from it.
  6. Don’t be afraid to extend OpenERP. Even if you aren’t a Python developer by trade, if you’re doing serious data modification you’re better off creating a module and calling a method on that than making lots of API calls. The main reason for that is to get everything into a single transaction. The other reason is speed.
  7. Be careful about filtering by calculated fields, they generally get turned into a ‘TRUE’ statement in the SQL. This can really screw things up when you have OR conditions in your filter. Use the --log-sql if you’re unsure what is happening.
  8. False is used for Null/None in the XML RPC
  9. Returning a dictionary from an OpenERP method via the API requires the keys to be strings. i.e you can’t simply return { id: quantity }. You’d need to do { str(id): quantity } to prevent a crash. None is also a no-no for dictionary values. Convert them to False if you want to be consistent with OpenERP.
  10. Formatting of numbers to n decimal places is largely done client side. OpenERP provides you all the info you need, but you need to deal with that yourself.
  11. Beware of gapless ir.sequence sequences in batch jobs. They have a high potential for causing contention.

Postgres table locks

I’ve just been looking into some issues with locking in Postgres and the documentation as ever has been excellent.


A closer look at the queries provided suggests they don’t show table level locks. At the very least if you do a pg_dump of your database while checking those queries you see nothing, despite there definitely being some locks going on. This query probably isn’t perfect, and is simply based on a quick practical test of running pg_dump against a test db but it may help spot the table locks which could be blocking things.

select pid, usename, datname, current_query 
from pg_catalog.pg_locks l 
inner join pg_catalog.pg_stat_activity a on a.procpid = pid 
where mode like '%ExclusiveLock%';

OpenERP debugging tip – turn off cron

While we’ve been doing a lot of OpenERP deployment we have been discovering various ways to configure it and one turns out to be very handy for debugging. If you’ve ever debugged OpenERP using the --debug flag and dropped into the debugger you have probably noticed the system carries on doing things while you’re sat at the debugger prompt. Often obliterating what you were looking at. This generally happens because OpenERP generally runs with multiple threads out of the box, and some of those threads do the ‘cron’ jobs, the background tasks, so even if you haven’t tried to do anything on the website, there will be activity. If you want to prevent the cron activity from making your debugging session more confusing than it needs to be add the --max-cron-threads=0 flag when you run OpenERP.

DBIx::Class and Postgres tweaks at startup

After connection you can do simple commands thanks to the on_connect_do connection setting. One thing I sometimes do is turn down the whining. Postgres can be quite noisy when you’re creating tables,

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "blah_pkey" for table "blah"
NOTICE:  CREATE TABLE will create implicit sequence "blah_id_seq" for serial column "blah.id"

So I sometimes SET client_min_messages=WARNING at connection. That makes deploying a schema a lot less verbal.

If you want to deploy to a different schema within a database you can also do a ‘SET search_path TO’ statement at that point too. That can be quite handy if you want to deploy the same tables again to an alternative schema within a database.

perl -I lib -MMyModule::Schema::DB -e "MyModule::Schema::DB->connect('dbi:Pg:dbname=database;host=postgres', 'username', 'password', { on_connect_do => 'SET search_path TO temp' })->deploy"

If you’re setting up a Catalyst config for DBIC you can set the connection options like this,

    connect_info dbi:Pg:dbname=database
    connect_info username
    connect_info password
        on_connect_do  [ SET client_min_messages=WARNING ]
        quote_names 1

Adhoc parameters to joins in DBIx::Class

Update: there are a few updates to the caveats on this post based on the comments by Peter Rabbitson.

I’ve been using DBIx::Class for a couple of years now and I’ve always been impressed with it’s flexibility.  I’ve rarely felt the need to write my own SQL queries because it’s simply so good at it, and it’s generally easy to get it to do what I want.

The one exception to that was custom adhoc joins.  In general DBIx::Class wants to know about how you’re going to join up front.  Anything else tends to require a custom view.  

The other night I realised I could come up with a way to deal with slightly more complex joins while still making the most of all my result definitions.  The extended relationships explained by frew on his blog demonstrate how to add decent join conditions.  The one thing missing was adhoc parameters.  They can be added by providing a bind parameter.  Since relationships don’t traditionally require any extra search parameters, I’d recommend indicating that the relationship isn’t for public consumption, and providing a wrapper method around it.

For example, here is a the relationship in the Result class,

  _date_range => 'DBICTest::Schema::CD',
  sub {
    my $args = shift;
    return (
      { "$args->{foreign_alias}.artist" => { -ident => "$args->{self_alias}.artistid" },
        -and => [
            "$args->{foreign_alias}.year"   => { '>' => \"?" },
            "$args->{foreign_alias}.year"   => { '<' => \"?" },

And then a method to exploit it in the ResultSet class.

sub join_date_range
    my $self = shift;
    my $start = shift;
    my $end = shift;
    $self->search(undef, {
        join => ['_by_name'],
        bind => [ $start, $end ],

Then you can use it like this,

$schema->resultset("Artist")->join_date_range(1990, 1992)->search(...)->all;

You can of course do more complex joins too, even joining up multiple times.  All you need to do is specify all the necessary bind parameters.

The most impressive thing is that the joins work fine even with the chained searches that DBIC is so good at.  You can of course also do search_related and most of the usual links, you just need to specify the bind parameters.  

There are a few of caveats however.  This isn’t strictly speaking an intentional feature, or at least it wasn’t consciously designed to work this way (as far as I know).

  1. Attempting to do a prefetch across the join fails with the error “Can't prefetch has_many _date_range (join cond too complex)”. Update: there is a new version of DBIC in the pipeline that fixes this. Right now that version is 0.08241-TRIAL. The feature you’re probably looking for in the Changes file is “Multiple has_many prefetch” when checking if a new live release has it.
  2. You might have looked at that -and => [] construction and thought that’s a bit pointless, a standard hash would be simpler and achieve the same effect. Unfortunately the bind parameters are positional, and a hash doesn’t have a guaranteed order. That means you need to be extra careful with your query when you have multiple parameters you need to specify, to ensure the binds happen to the correct place holders. Update: as Peter Rabbitson pointed out, it’s not actually that simple. DBIC does try to make sure you have a guaranteed order by sorting the keys of the hashes so that it always produces the same SQL. This means that you probably just need to try it and see which order in which it requires the parameters most of the time.
  3. Update: I was incorrect about not being able to use extra binding information with bind, the syntax Peter Rabbitson suggested works perfectly. i.e. bind => [ [ $opt1 => $val1 ], [ $opt => $val 2 ]… ]The final caveat is that the bind parameters don’t currently take any extra type information. Normally most of the places you are exposed directly to bindings you can specify types in order to help DBIC create the correct query. It doesn’t appear to be possible to provide that information via the bind parameter on a search.

This isn’t strictly a documented feature, but hopefully it’s helpful to a few people. If you’re wondering why you’d need to do this at all, consider the difference between these two queries.

FROM a LEFT JOIN b ON a.id = b.owner_id AND a.name = b.name


FROM a LEFT JOIN b ON a.id = b.owner_id
WHERE a.name = b.name

In the course of figuring this out, I also discovered the -ident key which indicates that you’re dealing with something like a column name, and should therefore be quoted if you have column name quoting turned on.  A handy feature to go along with using references to pass in literal bits of SQL.


I only occasionally use gdb so I end up spending my time relearning it each time.  Hopefully these notes will make that process easier in the future.  This document assumes you already know the s and n commands and how to list the source code (l).  These are the commands I don’t know off by heart, and have found useful. It’s just a subset, but a useful one so far.  It’s also well worth downloading the manual pdf.

To compile a program with the debug symbols use the -ggdb command line switch.

Here is a basic summary of some of the useful commands.

set args [command line args]
bt # stack trace
up/down # move up the stack
p # print
p/x # display hex value
p/x (int[5]) *0xffffd320 # display values at address as integers
x address # display contents of memory location
x/16b address # display 16 bytes
x/32cb address # display 16 bytes as characters
x/5i address # display as instructions (i.e. assembly)
x/5i $pc - 6 # displays current code
info registers # display register contents
$pc, $sp # program counter and stack pointer, can also use $eip
set $sp += 4 # add 4 to stack pointer
set var variablename = 4 # set a program variable
set {int}0x800321 = 4 # sets memory location 0x800321 to 4
j 0x32211 # jump to 0x32211
find # search memory
b *0xaddress # set breakpoint at an address (b nnn sets at a source line)

I’ve also been using lxc a lot recently and being able to create an x86 based machine is useful.  This creates a machine named x86 for testing x86 code.  The apt-get line is to be run on the machine once it’s loaded to install gdb and the compilers.

lxc-create -n x86 -t ubuntu -- -a i386
apt-get install build-essential gdb

Catalyst Config Hack

With a lot of modules for our Catalyst systems we have separate models. We then use a subset of them in a single application, and it makes sense to actually store all those database models in a single physical database. This means we end up with a lot of duplicate model config keys in our catalyst config.

    connect_info dbi:Pg:dbname=app_db;host=db-server
    connect_info dbi:Pg:dbname=app_db;host=db-server
    connect_info dbi:Pg:dbname=app_db;host=db-server

A lot of database configurations aren’t just a single line, and you end up spending forever copy/pasting and then modifying the config. I wanted to come up with a way to avoid all that repetition.

The Catalyst::Plugin::ConfigLoader provides two potential hooks for things to do after the configuration has been loaded. One is a finalize_config, the other is config_substitutions, via the substitutions settings. Because we are using CatalystX::AppBuilder the finalize_config doesn’t appear to be hookable, or at least I didn’t figure out how to. The substitutions is however perfectly usable because that just requires config setup in code.

   $config->{'Plugin::ConfigLoader'}->{substitutions} = {
        duplicate => sub {
            my $c = shift;
            my $from = shift;
            my $to = shift;
            $c->config->{$to} = $c->config->{$from};

Then this lets me do this in the config file.

    connect_info dbi:Pg:dbname=app_db;host=db-server
    connect_info dbusername
    connect_info dbpassword
      quote_char "\""
      name_sep .


This copies the configuration specified for the Processor to the SysParams, AuditTrail and AuthDB model config settings. This happens right after the configuration has been loaded, and before the models are loaded so all the settings are there just in time. That saves me lots of copy/paste, and even more editing. I don’t even need to copy those directives into my _local.conf because the _local.conf settings for the Processor model will be what get copied.

Skippng Python unit tests if a dependency is missing (fixed)

I got some feedback on my previous post about skipping tests in python unittests pointing out my solution was flawed.  As Mu Mind pointed out, the denizens of stackoverflow pointed out the solution has a problem when run directly from python.  At first I didn’t realise how flawed; technically I had run my tests via python and nosetest regularly.  I just hadn’t realised that I’d never run the tests via python when I was missing the dependency.  If you do that you get this ugly error,

ERROR: test_openihm_gui_interface_db_mixin (unittest.loader.ModuleImportFailure)
ImportError: Failed to import test module: test_openihm_gui_interface_db_mixin
Traceback (most recent call last):
  File "python2.7/unittest/loader.py", line 252, in _find_tests
    module = self._get_module_from_name(name)
  File "python2.7/unittest/loader.py", line 230, in _get_module_from_name
  File "tests/test_openihm_gui_interface_db_mixin.py", line 6, in 
    raise unittest.SkipTest("Need PyQt4 installed to do gui tests")
SkipTest: Need PyQt4 installed to do gui tests

It does tell you what the problem was clearly, but it really wasn’t the intention.  The idea was to silently skip the test.

From the answers and comments on the stackoverflow post I stitched together this ugly but hopefully working solution for whatever your unit test runner of choice is.

import unittest
    import PyQt4
    # the rest of the imports

    class TestDataEntryMixin(unittest.TestCase):
        def test_somefeature(self):
            # actual tests go here.

except ImportError, e:
    if e.message.find('PyQt4') >= 0:
        class TestMissingDependency(unittest.TestCase):

            @unittest.skip('Missing dependency - ' + e.message)
            def test_fail():

if __name__ == '__main__':

I dislike the fact that I can’t hide away the logic at the top, but surrounding your whole test with the code works.  Then if the import fails we create a dummy test case that does a skip to indicate the problem. I’ve also tried to ensure that we only catch the exception we’re expecting, and pass through any we aren’t.

Now if you run the tests in verbose mode you’ll see this when there is a missing dependency.

test_fail (test_openihm_gui_interface_mixins.TestMissingDependency) ... skipped 'Missing dependency - No module named PyQt4'