DBIx::Class join pruning

DBIx::Class has a join optimiser to prune out any unnecessary joins from the queries it generates.  Occasionally this can cause code that looks good to fail unexpectedly.  The most obvious time this normally happens when using literal SQL.

The most common example of this is when doing an aggregate operation with a group by.  Here I’m trying to sum and count some values on a joined table.  This looks like pretty standard DBIC code, but it fails.  I don’t use the linked table in the group by, just in the columns where it’s my own raw SQL.

sub total
    my $self = shift;
    my @columns = $self->result_source->columns;
    my $me = $self->current_source_alias;
    return $self->search(undef, {
        join => ['items'],
        '+columns' => [
            { total => \'sum(amount) as total' },
            { lines => \'count(amount) as lines' },
        group_by => [map {"$me.$_"} @columns],

        total => { '>' => 10 }
        '+columns' => ['total', 'lines']


DBI Exception: DBD::Pg::st execute failed: ERROR:  column "amount" does not exist
LINE 1: ... (SELECT "me"."id", "me"."name", "me"."type", sum(amount) as...
                                                             ^ [for Statement "SELECT "me"."id", "me"."name", "me"."type", "me"."total", "me"."lines" FROM (SELECT "me"."id", "me"."name", "me"."type", sum(amount) as total, count(amount) as lines FROM "orders" "me" GROUP BY "me"."id", "me"."name", "me"."type") "me" WHERE ( "total" > ? )" with ParamValues: 1='10']

Looking closely at the SQL generated the items alias isn’t joined in.  The optimiser has pruned that join because it couldn’t see any reference to the column.  It turns out that DBIC does really try to do the right thing, it’s just we didn’t give it enough info to give it a hint.

Adding the ‘items.’ qualifier to the column names in the SQL allows DBIC to realise that that join is necessary.

   return $self->search(undef, {
        join => ['items'],
        '+columns' => [
            # NOTE: fully qualified name
            # even though they are not ambiguous.
            { total => \'sum(items.amount) as total' },
            { lines => \'count(items.amount) as lines' },
        group_by => [map {"$me.$_"} @columns],

With that table alias, not strictly necessary for the SQL since those column names are unambiguous, DBIC realises that it is needs the join.

SELECT "me"."id", "me"."name", "me"."type", "me"."total", "me"."lines"
  FROM (
    SELECT "me"."id", "me"."name", "me"."type", sum( items.amount ) AS total, count( items.amount ) AS lines
      FROM "orders" "me"
      LEFT JOIN "order_items" "items"
        ON "items"."order_id" = "me"."id"
    GROUP BY "me"."id", "me"."name", "me"."type"
   ) "me"
WHERE "total" > '10'

Vim word breaks and perl

I use vim for coding my Perl and in the most recent versions of Ubuntu I’ve not been keen on one of the changes to the syntax defaults.  The word definition has start to include : which means that class names like Blah::Thing count as one word which isn’t how I’m used to working.  Luckily vim is very configurable and the vim irc channel is also really helpful so I was able to find where the configuration was and how to override it quickly.

Finding out that it’s related to the iskeyword configuration was pretty simple, but changing that didn’t have any effect.  It turns out it is set in a Perl syntax specific configuration so overriding it is a little more tricky.

Figuring out where a configuration setting was set is done using :verb set ?

:verb set isk?
Last set from /usr/share/vim/vim74/ftplugin/perl.vim

Knowing where that was set I could then override it with an ‘after’ configuration file.

" ~/.vim/after/ftplugin/perl.vim
set iskeyword-=:

Then when I check in vim,

:verb set isk?
Last set from ~/.vim/after/ftplugin/perl.vim

For more of a discussion of the language specific configuration and the after files see this vim wiki entry – http://vim.wikia.com/wiki/Keep_your_vimrc_file_clean

DBIx::Class Hackathon

I attended the first DBIx::Hackathon this weekend and I want to thank Jess for organising it, and Shadowcat who put a lot of time and effort into the event. Kudos to the sponsors Eligo and CTRL-O who provided backing.

I spent the day trying to improve the ‘where’ feature. It was my first patch to the core DBIx::Class module. Previously, I’ve reported little bugs to the project. Ribasushi, as ever was a great help with explaining what was needed in terms of fixing the issue. In fact he knows the code base so well he practically spoon fed me the solution as it’s just a minor tweak in his mind ;)

I suspect however this patch will never see the light of day. In some ways, I wish I had done more preparation for the hackathon so that I could have discussed it more intelligently in person. However, I’m glad I implemented the patch, because now it makes it easy to prove whether the feature is a good idea or not. One of the nice things about a hackathon is that you can free up a chunk of (relatively) guaranteed time for a project, something I find a lot harder to do in general life.

To try to demonstrate and test the various types of relationship I created a simple project on github. I’ve used DBIx::Class::Candy which is a small sugar layer on top of DBIx::Class so I hope you’ll forgive the slightly simpler syntax for my code. It’s a very trivial contrived 2 table database at the moment that allows me to play with some of the relationships.

My suspicion since I encountered the where feature is that it’s unfinished* and that extending it in a meaningful way will be tricky without causing lots of surprises to existing code that uses it.

The where attribute is described as useful for filtering relationships. As you’d imagine when you use it it sticks a where clause onto the search you do when following the relationship.

package …::Order;
has_many delivery_charge => '...::OrderItem', 'order_id',
            { where => { type => 'delivery' } };

In my example I have a relationship that allows me to pick out the delivery charges from the order as they are line items of the type ‘delivery’. Assuming you have fetched an order from the database and you follow the ‘delivery_charge’ accessor you’ll get an OrderItem resultset filtered to the type ‘delivery’.

$order->delivery_charge; # resultset containing order items of type ‘delivery’
# SELECT me.id, me.order_id, me.name, me.type, me.quantity, me.price 
#   FROM order_items me 
# WHERE me.order_id = '1' AND type = 'delivery'

That’s all pretty obvious really.

The problem really becomes obvious when you do a prefetch on the relationship.

my $order = $schema->resultset(‘Order’)->search({ id => 1 }, { 
prefetch => ['delivery_charge']
$order->id; # 1
my @delivery_charges = $order->delivery_charge->all; # actually returns all order lines, where clause not applied.

SELECT me.id, me.name, me.active, delivery_charge.id, delivery_charge.order_id, delivery_charge.name, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = me.id 

The logical fix is to apply the where clause when joins happen. This is what the patch I created does.

Only where clause restrictions and joins are tricky beasts. There are two fundamental problems.

  1. Conceptually you probably really want the where clause to become a clause in a left outer join. I’ll explain more in a moment.
  2. You need to worry about table aliases. Since you’re now dealing with multiple tables you need to worry about ambiguity.

The first problem is the more serious and, in my opinion, will kill the current patch, and possibly any realistic attempt to fixing this feature.

Lets cover the first point now. The prefetch is a good example of how we want to get a list of orders, and we want the delivery charge items loaded at the same time so that we don’t have to make subsequent database calls. With the where clause being generated a query looks something like this,

SELECT me.id, me.name, me.active, delivery_charge.id, delivery_charge.order_id, delivery_charge.name, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = me.id 
WHERE type = 'delivery'

Now consider a query for orders placed for today. When we make that query we expect to see all the orders placed today. If however we have some orders with delivery charges, and some that have none because they were collected in store, guess what happens when that where clause kicks in. It ends up filtering out the orders without any delivery charges. This is an SQL 101 type of problem that you generally solve with a left join and the clause promoted into the join condition. In fact the query generated already half does that, it does just that with the id link, so that the missing items shouldn’t prevent the main thing from being found. It’s just the where clause is in the wrong place. With the join ‘fixed’ you end up being surprised by your results when you join on one a relationship with a where clause.

The second problem will cause things to fail more visibly. Lets consider if we add a type field to our Order class now. A perfectly logical thing, we could add types like ‘store’, ‘internet’, ‘mail-order’. Then the query generated blows up again. You could hard code in a table alias into the where clause but that will fail when you have complex queries and you haven’t predicted the generated table alias correctly. DBIC doesn’t know what to do to automatically fix the reference and there isn’t any special syntax to do so.

I think the first problem is the more serious, and is the reason the feature should be left as it is. The second I think will ensure that it is because it’s likely to visibly blow up legacy code. It’s the legacy code that really makes fixing this awkward. As ribasushi made clear on the day, it would be fixable in a reasonably clear way by making the join condition a sub, but the existing code as it stands will fail if we add it to the join condition. It would be possible to add new semantics that enable newer style behaviour and fall back to the current behaviour when the old style declaration is used, but is it worth it? The where keyname seems like a good idea, except is it really? Assuming you don’t want problem 1 – the overly restricted search results – it won’t actually be a where clause in a join, so ‘where’ doesn’t make so much sense. If we wanted syntactic sugar to make this feature easy to use and just work we’d be better off coming up with a new key and semantics. The ‘where’ feature I would rather leave as a footnote in the documentation as something to avoid in newer code.

The sane way to achieve the same result these days is with the ‘Custom join conditions’ documented in DBIx::Class:Relationship::Base. They allow you to produce arbitrarily complex conditions. Because it takes a sub it becomes easy to deal with the ambiguity issue clearly and it also becomes possible to do far more complex things too.

has_many standard_products => '...::OrderItem',
            sub {         
                my $args = shift;
                return ( 
                        "$args->{self_alias}.id" => { -ident => "$args->{foreign_alias}.order_id" },
                        "$args->{foreign_alias}.type" => 'standard',                            
            }; # 'Custom join conditions' perldoc DBIx::Class:Relationship::Base

The downside of this compared to the where keyword is that it requires you to specify the whole condition, even the id join, rather than simply bolting on an extra condition. It’s great for allowing you to do anything, but the simple case is a little cumbersome.

See my previous blog post for an example of altering the join at runtime.

* apologies to the person whose baby I am calling ugly.

Testing out uninstall to fix a perl modules dependency issue

Just recently after an upgrade to some CPAN modules I started getting this crash on one of my machines when the Catalyst::View::JSON was loaded.

#     Error:  Couldn't instantiate component "TestApp::View::JSON", "Recursive inheritance detected in package 'Types::Serialiser::BooleanBase' at (eval 1547) line 76."Compilation failed in require at (eval 4) line 2.

The actual source of that error appears to be JSON::XS rather than Types::Serialiser::BooleanBase or TestApp::View::JSON.

I didn’t investigate the error properly, or really fix it properly. Instead I tested out one of the newer features of cpanm, the -U uninstall flag. I simply uninstalled JSON::XS and hey presto, no more crash.

cpanm -U JSON::XS
JSON::XS contains the following files:


Are you sure you want to uninstall JSON::XS? [y] y

That probably warrants some explanation. The new Catalyst::Runtime now appears to pull in the new alternative to JSON::XS, Cpanel::JSON::XS so this can now be used instead, and so things just worked. It’s probably a bit drastic a solution for most systems at the moment, I’m sure it will demonstrate any places where I have direct dependencies on JSON::XS. On my development box that should be handy however. I’d rather be using a single library for that single purpose.

DBIx::Class and Postgres tweaks at startup

After connection you can do simple commands thanks to the on_connect_do connection setting. One thing I sometimes do is turn down the whining. Postgres can be quite noisy when you’re creating tables,

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "blah_pkey" for table "blah"
NOTICE:  CREATE TABLE will create implicit sequence "blah_id_seq" for serial column "blah.id"

So I sometimes SET client_min_messages=WARNING at connection. That makes deploying a schema a lot less verbal.

If you want to deploy to a different schema within a database you can also do a ‘SET search_path TO’ statement at that point too. That can be quite handy if you want to deploy the same tables again to an alternative schema within a database.

perl -I lib -MMyModule::Schema::DB -e "MyModule::Schema::DB->connect('dbi:Pg:dbname=database;host=postgres', 'username', 'password', { on_connect_do => 'SET search_path TO temp' })->deploy"

If you’re setting up a Catalyst config for DBIC you can set the connection options like this,

    connect_info dbi:Pg:dbname=database
    connect_info username
    connect_info password
        on_connect_do  [ SET client_min_messages=WARNING ]
        quote_names 1

Adhoc parameters to joins in DBIx::Class

Update: there are a few updates to the caveats on this post based on the comments by Peter Rabbitson.

I’ve been using DBIx::Class for a couple of years now and I’ve always been impressed with it’s flexibility.  I’ve rarely felt the need to write my own SQL queries because it’s simply so good at it, and it’s generally easy to get it to do what I want.

The one exception to that was custom adhoc joins.  In general DBIx::Class wants to know about how you’re going to join up front.  Anything else tends to require a custom view.  

The other night I realised I could come up with a way to deal with slightly more complex joins while still making the most of all my result definitions.  The extended relationships explained by frew on his blog demonstrate how to add decent join conditions.  The one thing missing was adhoc parameters.  They can be added by providing a bind parameter.  Since relationships don’t traditionally require any extra search parameters, I’d recommend indicating that the relationship isn’t for public consumption, and providing a wrapper method around it.

For example, here is a the relationship in the Result class,

  _date_range => 'DBICTest::Schema::CD',
  sub {
    my $args = shift;
    return (
      { "$args->{foreign_alias}.artist" => { -ident => "$args->{self_alias}.artistid" },
        -and => [
            "$args->{foreign_alias}.year"   => { '>' => \"?" },
            "$args->{foreign_alias}.year"   => { '<' => \"?" },

And then a method to exploit it in the ResultSet class.

sub join_date_range
    my $self = shift;
    my $start = shift;
    my $end = shift;
    $self->search(undef, {
        join => ['_by_name'],
        bind => [ $start, $end ],

Then you can use it like this,

$schema->resultset("Artist")->join_date_range(1990, 1992)->search(...)->all;

You can of course do more complex joins too, even joining up multiple times.  All you need to do is specify all the necessary bind parameters.

The most impressive thing is that the joins work fine even with the chained searches that DBIC is so good at.  You can of course also do search_related and most of the usual links, you just need to specify the bind parameters.  

There are a few of caveats however.  This isn’t strictly speaking an intentional feature, or at least it wasn’t consciously designed to work this way (as far as I know).

  1. Attempting to do a prefetch across the join fails with the error “Can't prefetch has_many _date_range (join cond too complex)”. Update: there is a new version of DBIC in the pipeline that fixes this. Right now that version is 0.08241-TRIAL. The feature you’re probably looking for in the Changes file is “Multiple has_many prefetch” when checking if a new live release has it.
  2. You might have looked at that -and => [] construction and thought that’s a bit pointless, a standard hash would be simpler and achieve the same effect. Unfortunately the bind parameters are positional, and a hash doesn’t have a guaranteed order. That means you need to be extra careful with your query when you have multiple parameters you need to specify, to ensure the binds happen to the correct place holders. Update: as Peter Rabbitson pointed out, it’s not actually that simple. DBIC does try to make sure you have a guaranteed order by sorting the keys of the hashes so that it always produces the same SQL. This means that you probably just need to try it and see which order in which it requires the parameters most of the time.
  3. Update: I was incorrect about not being able to use extra binding information with bind, the syntax Peter Rabbitson suggested works perfectly. i.e. bind => [ [ $opt1 => $val1 ], [ $opt => $val 2 ]… ]The final caveat is that the bind parameters don’t currently take any extra type information. Normally most of the places you are exposed directly to bindings you can specify types in order to help DBIC create the correct query. It doesn’t appear to be possible to provide that information via the bind parameter on a search.

This isn’t strictly a documented feature, but hopefully it’s helpful to a few people. If you’re wondering why you’d need to do this at all, consider the difference between these two queries.

FROM a LEFT JOIN b ON a.id = b.owner_id AND a.name = b.name


FROM a LEFT JOIN b ON a.id = b.owner_id
WHERE a.name = b.name

In the course of figuring this out, I also discovered the -ident key which indicates that you’re dealing with something like a column name, and should therefore be quoted if you have column name quoting turned on.  A handy feature to go along with using references to pass in literal bits of SQL.

Catalyst Config Hack

With a lot of modules for our Catalyst systems we have separate models. We then use a subset of them in a single application, and it makes sense to actually store all those database models in a single physical database. This means we end up with a lot of duplicate model config keys in our catalyst config.

    connect_info dbi:Pg:dbname=app_db;host=db-server
    connect_info dbi:Pg:dbname=app_db;host=db-server
    connect_info dbi:Pg:dbname=app_db;host=db-server

A lot of database configurations aren’t just a single line, and you end up spending forever copy/pasting and then modifying the config. I wanted to come up with a way to avoid all that repetition.

The Catalyst::Plugin::ConfigLoader provides two potential hooks for things to do after the configuration has been loaded. One is a finalize_config, the other is config_substitutions, via the substitutions settings. Because we are using CatalystX::AppBuilder the finalize_config doesn’t appear to be hookable, or at least I didn’t figure out how to. The substitutions is however perfectly usable because that just requires config setup in code.

   $config->{'Plugin::ConfigLoader'}->{substitutions} = {
        duplicate => sub {
            my $c = shift;
            my $from = shift;
            my $to = shift;
            $c->config->{$to} = $c->config->{$from};

Then this lets me do this in the config file.

    connect_info dbi:Pg:dbname=app_db;host=db-server
    connect_info dbusername
    connect_info dbpassword
      quote_char "\""
      name_sep .


This copies the configuration specified for the Processor to the SysParams, AuditTrail and AuthDB model config settings. This happens right after the configuration has been loaded, and before the models are loaded so all the settings are there just in time. That saves me lots of copy/paste, and even more editing. I don’t even need to copy those directives into my _local.conf because the _local.conf settings for the Processor model will be what get copied.

Perlbrew with options like threading

This blog post on the *n*x blog gives a great description of how to install perl with threading, something that you need to do if you want to run padre.

The only thing I’d add that you can use the –as to install perl with an alias. This is useful if you want to build a threaded version of a perl you already have installed. You can simply do,

perlbrew install perl-5.14.2 -Dusethreads -Duselargefiles -Dcccdlflags=-fPIC -Doptimize=-O2 -Duseshrplib -Duse64bitall -Darchname=x86_64-linux-gnu -Dccflags=-DDEBIAN --as threaded-perl-5.14.2

(Note that I’ve customised this for my Ubuntu 64 bit os).

Perl debugger

Since writing my initial post on my settings for the Perl debugger I’ve found another setting that I’ve found to be invaluable. If you create a file, ~/.perldb it will load your settings each time, mine now has two lines on whatever machine I’m using,

$DB::deep = 1000;

The dumpDepth=2 trick is something I picked up from Chisel’s blog post on the subject and simply limits your x to a depth of 2 by default. This makes life a lot simpler and means I get caught out by accidentally dumping the entire state of an application less often.

Faking it with Test::WWW::Mechanize::PSGI

Or should that be getting real?  The Test::WWW::Mechanize::Catalyst module has a really handy feature, with the CATALYST_SERVER environment variable you can set to point your tests at a real live server.  This is really handy for a couple of things.  One of those being monitoring real traffic for those cases where you can’t quite decide whether it’s your test or your server that’s broken.  You can fire up Wireshark and watch the actual traffic going over the wire.

With Test:::WWW:Mechanize::PSGI there doesn’t appear to be that option, and Test::WWW::Mechanize and the modules it wrap don’t really appear to have any simple way to provide that.  A look at the T:W:M:Catalyst module suggests it’s actually a nice chunk of work that they did to implement that feature.  Since I’m lazy and I wanted to solve a problem I came up with a simple way to fake it for now.  I converted the urls from simple /path to proper qualified http://localhost:5000/path urls in my tests then added a simple bit of code to flip between Test::WWW:Mechanize and Test::WWW::Mechanize::PSGI.

my $mech;
    $mech = Test::WWW::Mechanize->new;
    my $app = Plack::Util::load_psgi 'app.psgi';
    $mech = Test::WWW::Mechanize::PSGI->new( app => $app );

Now if I run the tests with EXTERNAL_SERVER=1 it goes to a real server rather than straight to code.  That means I can listen on the loopback adaptor in Wireshark and see what’s actually going over the wire simply.  It’s not as neat as the CATALYST_SERVER feature, but it does for now.