DBIx::Class Hackathon

I attended the first DBIx::Hackathon this weekend and I want to thank Jess for organising it, and Shadowcat who put a lot of time and effort into the event. Kudos to the sponsors Eligo and CTRL-O who provided backing.

I spent the day trying to improve the ‘where’ feature. It was my first patch to the core DBIx::Class module. Previously, I’ve reported little bugs to the project. Ribasushi, as ever was a great help with explaining what was needed in terms of fixing the issue. In fact he knows the code base so well he practically spoon fed me the solution as it’s just a minor tweak in his mind 😉

I suspect however this patch will never see the light of day. In some ways, I wish I had done more preparation for the hackathon so that I could have discussed it more intelligently in person. However, I’m glad I implemented the patch, because now it makes it easy to prove whether the feature is a good idea or not. One of the nice things about a hackathon is that you can free up a chunk of (relatively) guaranteed time for a project, something I find a lot harder to do in general life.

To try to demonstrate and test the various types of relationship I created a simple project on github. I’ve used DBIx::Class::Candy which is a small sugar layer on top of DBIx::Class so I hope you’ll forgive the slightly simpler syntax for my code. It’s a very trivial contrived 2 table database at the moment that allows me to play with some of the relationships.

My suspicion since I encountered the where feature is that it’s unfinished* and that extending it in a meaningful way will be tricky without causing lots of surprises to existing code that uses it.

The where attribute is described as useful for filtering relationships. As you’d imagine when you use it it sticks a where clause onto the search you do when following the relationship.

package …::Order;
has_many delivery_charge => '...::OrderItem', 'order_id',
            { where => { type => 'delivery' } };

In my example I have a relationship that allows me to pick out the delivery charges from the order as they are line items of the type ‘delivery’. Assuming you have fetched an order from the database and you follow the ‘delivery_charge’ accessor you’ll get an OrderItem resultset filtered to the type ‘delivery’.

$order->delivery_charge; # resultset containing order items of type ‘delivery’
# SELECT me.id, me.order_id, me.name, me.type, me.quantity, me.price 
#   FROM order_items me 
# WHERE me.order_id = '1' AND type = 'delivery'

That’s all pretty obvious really.

The problem really becomes obvious when you do a prefetch on the relationship.

my $order = $schema->resultset(‘Order’)->search({ id => 1 }, { 
prefetch => ['delivery_charge']
$order->id; # 1
my @delivery_charges = $order->delivery_charge->all; # actually returns all order lines, where clause not applied.

SELECT me.id, me.name, me.active, delivery_charge.id, delivery_charge.order_id, delivery_charge.name, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = me.id 

The logical fix is to apply the where clause when joins happen. This is what the patch I created does.

Only where clause restrictions and joins are tricky beasts. There are two fundamental problems.

  1. Conceptually you probably really want the where clause to become a clause in a left outer join. I’ll explain more in a moment.
  2. You need to worry about table aliases. Since you’re now dealing with multiple tables you need to worry about ambiguity.

The first problem is the more serious and, in my opinion, will kill the current patch, and possibly any realistic attempt to fixing this feature.

Lets cover the first point now. The prefetch is a good example of how we want to get a list of orders, and we want the delivery charge items loaded at the same time so that we don’t have to make subsequent database calls. With the where clause being generated a query looks something like this,

SELECT me.id, me.name, me.active, delivery_charge.id, delivery_charge.order_id, delivery_charge.name, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = me.id 
WHERE type = 'delivery'

Now consider a query for orders placed for today. When we make that query we expect to see all the orders placed today. If however we have some orders with delivery charges, and some that have none because they were collected in store, guess what happens when that where clause kicks in. It ends up filtering out the orders without any delivery charges. This is an SQL 101 type of problem that you generally solve with a left join and the clause promoted into the join condition. In fact the query generated already half does that, it does just that with the id link, so that the missing items shouldn’t prevent the main thing from being found. It’s just the where clause is in the wrong place. With the join ‘fixed’ you end up being surprised by your results when you join on one a relationship with a where clause.

The second problem will cause things to fail more visibly. Lets consider if we add a type field to our Order class now. A perfectly logical thing, we could add types like ‘store’, ‘internet’, ‘mail-order’. Then the query generated blows up again. You could hard code in a table alias into the where clause but that will fail when you have complex queries and you haven’t predicted the generated table alias correctly. DBIC doesn’t know what to do to automatically fix the reference and there isn’t any special syntax to do so.

I think the first problem is the more serious, and is the reason the feature should be left as it is. The second I think will ensure that it is because it’s likely to visibly blow up legacy code. It’s the legacy code that really makes fixing this awkward. As ribasushi made clear on the day, it would be fixable in a reasonably clear way by making the join condition a sub, but the existing code as it stands will fail if we add it to the join condition. It would be possible to add new semantics that enable newer style behaviour and fall back to the current behaviour when the old style declaration is used, but is it worth it? The where keyname seems like a good idea, except is it really? Assuming you don’t want problem 1 – the overly restricted search results – it won’t actually be a where clause in a join, so ‘where’ doesn’t make so much sense. If we wanted syntactic sugar to make this feature easy to use and just work we’d be better off coming up with a new key and semantics. The ‘where’ feature I would rather leave as a footnote in the documentation as something to avoid in newer code.

The sane way to achieve the same result these days is with the ‘Custom join conditions’ documented in DBIx::Class:Relationship::Base. They allow you to produce arbitrarily complex conditions. Because it takes a sub it becomes easy to deal with the ambiguity issue clearly and it also becomes possible to do far more complex things too.

has_many standard_products => '...::OrderItem',
            sub {         
                my $args = shift;
                return ( 
                        "$args->{self_alias}.id" => { -ident => "$args->{foreign_alias}.order_id" },
                        "$args->{foreign_alias}.type" => 'standard',                            
            }; # 'Custom join conditions' perldoc DBIx::Class:Relationship::Base

The downside of this compared to the where keyword is that it requires you to specify the whole condition, even the id join, rather than simply bolting on an extra condition. It’s great for allowing you to do anything, but the simple case is a little cumbersome.

See my previous blog post for an example of altering the join at runtime.

* apologies to the person whose baby I am calling ugly.