Tag Archives: perl5

DBIx::Class Hackathon

I attended the first DBIx::Hackathon this weekend and I want to thank Jess for organising it, and Shadowcat who put a lot of time and effort into the event. Kudos to the sponsors Eligo and CTRL-O who provided backing.

I spent the day trying to improve the ‘where’ feature. It was my first patch to the core DBIx::Class module. Previously, I’ve reported little bugs to the project. Ribasushi, as ever was a great help with explaining what was needed in terms of fixing the issue. In fact he knows the code base so well he practically spoon fed me the solution as it’s just a minor tweak in his mind ;)

I suspect however this patch will never see the light of day. In some ways, I wish I had done more preparation for the hackathon so that I could have discussed it more intelligently in person. However, I’m glad I implemented the patch, because now it makes it easy to prove whether the feature is a good idea or not. One of the nice things about a hackathon is that you can free up a chunk of (relatively) guaranteed time for a project, something I find a lot harder to do in general life.

To try to demonstrate and test the various types of relationship I created a simple project on github. I’ve used DBIx::Class::Candy which is a small sugar layer on top of DBIx::Class so I hope you’ll forgive the slightly simpler syntax for my code. It’s a very trivial contrived 2 table database at the moment that allows me to play with some of the relationships.

My suspicion since I encountered the where feature is that it’s unfinished* and that extending it in a meaningful way will be tricky without causing lots of surprises to existing code that uses it.

The where attribute is described as useful for filtering relationships. As you’d imagine when you use it it sticks a where clause onto the search you do when following the relationship.

package …::Order;
...
has_many delivery_charge => '...::OrderItem', 'order_id',
            { where => { type => 'delivery' } };

In my example I have a relationship that allows me to pick out the delivery charges from the order as they are line items of the type ‘delivery’. Assuming you have fetched an order from the database and you follow the ‘delivery_charge’ accessor you’ll get an OrderItem resultset filtered to the type ‘delivery’.

$order->delivery_charge; # resultset containing order items of type ‘delivery’
# SELECT me.id, me.order_id, me.name, me.type, me.quantity, me.price 
#   FROM order_items me 
# WHERE me.order_id = '1' AND type = 'delivery'

That’s all pretty obvious really.

The problem really becomes obvious when you do a prefetch on the relationship.

my $order = $schema->resultset(‘Order’)->search({ id => 1 }, { 
prefetch => ['delivery_charge']
})->first;
$order->id; # 1
my @delivery_charges = $order->delivery_charge->all; # actually returns all order lines, where clause not applied.

SELECT me.id, me.name, me.active, delivery_charge.id, delivery_charge.order_id, delivery_charge.name, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = me.id 

The logical fix is to apply the where clause when joins happen. This is what the patch I created does.

Only where clause restrictions and joins are tricky beasts. There are two fundamental problems.

  1. Conceptually you probably really want the where clause to become a clause in a left outer join. I’ll explain more in a moment.
  2. You need to worry about table aliases. Since you’re now dealing with multiple tables you need to worry about ambiguity.

The first problem is the more serious and, in my opinion, will kill the current patch, and possibly any realistic attempt to fixing this feature.

Lets cover the first point now. The prefetch is a good example of how we want to get a list of orders, and we want the delivery charge items loaded at the same time so that we don’t have to make subsequent database calls. With the where clause being generated a query looks something like this,

SELECT me.id, me.name, me.active, delivery_charge.id, delivery_charge.order_id, delivery_charge.name, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = me.id 
WHERE type = 'delivery'

Now consider a query for orders placed for today. When we make that query we expect to see all the orders placed today. If however we have some orders with delivery charges, and some that have none because they were collected in store, guess what happens when that where clause kicks in. It ends up filtering out the orders without any delivery charges. This is an SQL 101 type of problem that you generally solve with a left join and the clause promoted into the join condition. In fact the query generated already half does that, it does just that with the id link, so that the missing items shouldn’t prevent the main thing from being found. It’s just the where clause is in the wrong place. With the join ‘fixed’ you end up being surprised by your results when you join on one a relationship with a where clause.

The second problem will cause things to fail more visibly. Lets consider if we add a type field to our Order class now. A perfectly logical thing, we could add types like ‘store’, ‘internet’, ‘mail-order’. Then the query generated blows up again. You could hard code in a table alias into the where clause but that will fail when you have complex queries and you haven’t predicted the generated table alias correctly. DBIC doesn’t know what to do to automatically fix the reference and there isn’t any special syntax to do so.

I think the first problem is the more serious, and is the reason the feature should be left as it is. The second I think will ensure that it is because it’s likely to visibly blow up legacy code. It’s the legacy code that really makes fixing this awkward. As ribasushi made clear on the day, it would be fixable in a reasonably clear way by making the join condition a sub, but the existing code as it stands will fail if we add it to the join condition. It would be possible to add new semantics that enable newer style behaviour and fall back to the current behaviour when the old style declaration is used, but is it worth it? The where keyname seems like a good idea, except is it really? Assuming you don’t want problem 1 – the overly restricted search results – it won’t actually be a where clause in a join, so ‘where’ doesn’t make so much sense. If we wanted syntactic sugar to make this feature easy to use and just work we’d be better off coming up with a new key and semantics. The ‘where’ feature I would rather leave as a footnote in the documentation as something to avoid in newer code.

The sane way to achieve the same result these days is with the ‘Custom join conditions’ documented in DBIx::Class:Relationship::Base. They allow you to produce arbitrarily complex conditions. Because it takes a sub it becomes easy to deal with the ambiguity issue clearly and it also becomes possible to do far more complex things too.

has_many standard_products => '...::OrderItem',
            sub {         
                my $args = shift;
                return ( 
                    {    
                        "$args->{self_alias}.id" => { -ident => "$args->{foreign_alias}.order_id" },
                        "$args->{foreign_alias}.type" => 'standard',                            
                    }                                                                           
                );                      
            }; # 'Custom join conditions' perldoc DBIx::Class:Relationship::Base

The downside of this compared to the where keyword is that it requires you to specify the whole condition, even the id join, rather than simply bolting on an extra condition. It’s great for allowing you to do anything, but the simple case is a little cumbersome.

See my previous blog post for an example of altering the join at runtime.

* apologies to the person whose baby I am calling ugly.

Tagged

Testing out uninstall to fix a perl modules dependency issue

Just recently after an upgrade to some CPAN modules I started getting this crash on one of my machines when the Catalyst::View::JSON was loaded.

#     Error:  Couldn't instantiate component "TestApp::View::JSON", "Recursive inheritance detected in package 'Types::Serialiser::BooleanBase' at (eval 1547) line 76."Compilation failed in require at (eval 4) line 2.

The actual source of that error appears to be JSON::XS rather than Types::Serialiser::BooleanBase or TestApp::View::JSON.

I didn’t investigate the error properly, or really fix it properly. Instead I tested out one of the newer features of cpanm, the -U uninstall flag. I simply uninstalled JSON::XS and hey presto, no more crash.

cpanm -U JSON::XS
JSON::XS contains the following files:

  /home/colin/perl5/perlbrew/perls/perl-5.14.4/bin/json_xs
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/JSON/XS.pm
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/JSON/XS/Boolean.pm
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/auto/JSON/XS/XS.bs
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/auto/JSON/XS/XS.so
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/man/man1/json_xs.1
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/man/man3/JSON::XS.3
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/man/man3/JSON::XS::Boolean.3

Are you sure you want to uninstall JSON::XS? [y] y

That probably warrants some explanation. The new Catalyst::Runtime now appears to pull in the new alternative to JSON::XS, Cpanel::JSON::XS so this can now be used instead, and so things just worked. It’s probably a bit drastic a solution for most systems at the moment, I’m sure it will demonstrate any places where I have direct dependencies on JSON::XS. On my development box that should be handy however. I’d rather be using a single library for that single purpose.

Tagged , ,

DBIx::Class and Postgres tweaks at startup

After connection you can do simple commands thanks to the on_connect_do connection setting. One thing I sometimes do is turn down the whining. Postgres can be quite noisy when you’re creating tables,

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "blah_pkey" for table "blah"
NOTICE:  CREATE TABLE will create implicit sequence "blah_id_seq" for serial column "blah.id"
...

So I sometimes SET client_min_messages=WARNING at connection. That makes deploying a schema a lot less verbal.

If you want to deploy to a different schema within a database you can also do a ‘SET search_path TO’ statement at that point too. That can be quite handy if you want to deploy the same tables again to an alternative schema within a database.

perl -I lib -MMyModule::Schema::DB -e "MyModule::Schema::DB->connect('dbi:Pg:dbname=database;host=postgres', 'username', 'password', { on_connect_do => 'SET search_path TO temp' })->deploy"

If you’re setting up a Catalyst config for DBIC you can set the connection options like this,

<Model::MyDB>
    connect_info dbi:Pg:dbname=database
    connect_info username
    connect_info password
    <connect_info>
        on_connect_do  [ SET client_min_messages=WARNING ]
        quote_names 1
...
Tagged ,

Adhoc parameters to joins in DBIx::Class

Update: there are a few updates to the caveats on this post based on the comments by Peter Rabbitson.

I’ve been using DBIx::Class for a couple of years now and I’ve always been impressed with it’s flexibility.  I’ve rarely felt the need to write my own SQL queries because it’s simply so good at it, and it’s generally easy to get it to do what I want.

The one exception to that was custom adhoc joins.  In general DBIx::Class wants to know about how you’re going to join up front.  Anything else tends to require a custom view.  

The other night I realised I could come up with a way to deal with slightly more complex joins while still making the most of all my result definitions.  The extended relationships explained by frew on his blog demonstrate how to add decent join conditions.  The one thing missing was adhoc parameters.  They can be added by providing a bind parameter.  Since relationships don’t traditionally require any extra search parameters, I’d recommend indicating that the relationship isn’t for public consumption, and providing a wrapper method around it.

For example, here is a the relationship in the Result class,

__PACKAGE__->has_many(
  _date_range => 'DBICTest::Schema::CD',
  sub {
    my $args = shift;
    return (
      { "$args->{foreign_alias}.artist" => { -ident => "$args->{self_alias}.artistid" },
        -and => [
            "$args->{foreign_alias}.year"   => { '>' => \"?" },
            "$args->{foreign_alias}.year"   => { '<' => \"?" },
        ],
      }
    );
  }
);

And then a method to exploit it in the ResultSet class.

sub join_date_range
{
    my $self = shift;
    my $start = shift;
    my $end = shift;
    $self->search(undef, {
        join => ['_by_name'],
        bind => [ $start, $end ],
    });
}

Then you can use it like this,

$schema->resultset("Artist")->join_date_range(1990, 1992)->search(...)->all;

You can of course do more complex joins too, even joining up multiple times.  All you need to do is specify all the necessary bind parameters.

The most impressive thing is that the joins work fine even with the chained searches that DBIC is so good at.  You can of course also do search_related and most of the usual links, you just need to specify the bind parameters.  

There are a few of caveats however.  This isn’t strictly speaking an intentional feature, or at least it wasn’t consciously designed to work this way (as far as I know).

  1. Attempting to do a prefetch across the join fails with the error “Can't prefetch has_many _date_range (join cond too complex)”. Update: there is a new version of DBIC in the pipeline that fixes this. Right now that version is 0.08241-TRIAL. The feature you’re probably looking for in the Changes file is “Multiple has_many prefetch” when checking if a new live release has it.
  2. You might have looked at that -and => [] construction and thought that’s a bit pointless, a standard hash would be simpler and achieve the same effect. Unfortunately the bind parameters are positional, and a hash doesn’t have a guaranteed order. That means you need to be extra careful with your query when you have multiple parameters you need to specify, to ensure the binds happen to the correct place holders. Update: as Peter Rabbitson pointed out, it’s not actually that simple. DBIC does try to make sure you have a guaranteed order by sorting the keys of the hashes so that it always produces the same SQL. This means that you probably just need to try it and see which order in which it requires the parameters most of the time.
  3. Update: I was incorrect about not being able to use extra binding information with bind, the syntax Peter Rabbitson suggested works perfectly. i.e. bind => [ [ $opt1 => $val1 ], [ $opt => $val 2 ]… ]The final caveat is that the bind parameters don’t currently take any extra type information. Normally most of the places you are exposed directly to bindings you can specify types in order to help DBIC create the correct query. It doesn’t appear to be possible to provide that information via the bind parameter on a search.

This isn’t strictly a documented feature, but hopefully it’s helpful to a few people. If you’re wondering why you’d need to do this at all, consider the difference between these two queries.

SELECT *
FROM a LEFT JOIN b ON a.id = b.owner_id AND a.name = b.name

and

SELECT *
FROM a LEFT JOIN b ON a.id = b.owner_id
WHERE a.name = b.name

In the course of figuring this out, I also discovered the -ident key which indicates that you’re dealing with something like a column name, and should therefore be quoted if you have column name quoting turned on.  A handy feature to go along with using references to pass in literal bits of SQL.

Tagged

Catalyst Config Hack

With a lot of modules for our Catalyst systems we have separate models. We then use a subset of them in a single application, and it makes sense to actually store all those database models in a single physical database. This means we end up with a lot of duplicate model config keys in our catalyst config.

<Model::Processor>
    connect_info dbi:Pg:dbname=app_db;host=db-server
</Model::Processor>
<Model::SysParams>
    connect_info dbi:Pg:dbname=app_db;host=db-server
</Model::SysParams>
<Model::AuditTrail>
    connect_info dbi:Pg:dbname=app_db;host=db-server
</Model::AuditTrail>

A lot of database configurations aren’t just a single line, and you end up spending forever copy/pasting and then modifying the config. I wanted to come up with a way to avoid all that repetition.

The Catalyst::Plugin::ConfigLoader provides two potential hooks for things to do after the configuration has been loaded. One is a finalize_config, the other is config_substitutions, via the substitutions settings. Because we are using CatalystX::AppBuilder the finalize_config doesn’t appear to be hookable, or at least I didn’t figure out how to. The substitutions is however perfectly usable because that just requires config setup in code.

   $config->{'Plugin::ConfigLoader'}->{substitutions} = {
        duplicate => sub {
            my $c = shift;
            my $from = shift;
            my $to = shift;
            $c->config->{$to} = $c->config->{$from};
        }
    };

Then this lets me do this in the config file.

<Model::Processor>
    connect_info dbi:Pg:dbname=app_db;host=db-server
    connect_info dbusername
    connect_info dbpassword
    <connect_info>
      quote_char "\""
      name_sep .
    </connect_info>
</Model::Processor>

__duplicate(Model::TokenProcessor,Model::SysParams)__
__duplicate(Model::TokenProcessor,Model::AuditTrail)__
__duplicate(Model::TokenProcessor,Model::AuthDB)__

This copies the configuration specified for the Processor to the SysParams, AuditTrail and AuthDB model config settings. This happens right after the configuration has been loaded, and before the models are loaded so all the settings are there just in time. That saves me lots of copy/paste, and even more editing. I don’t even need to copy those directives into my _local.conf because the _local.conf settings for the Processor model will be what get copied.

Tagged ,

Perlbrew with options like threading

This blog post on the *n*x blog gives a great description of how to install perl with threading, something that you need to do if you want to run padre.

The only thing I’d add that you can use the –as to install perl with an alias. This is useful if you want to build a threaded version of a perl you already have installed. You can simply do,

perlbrew install perl-5.14.2 -Dusethreads -Duselargefiles -Dcccdlflags=-fPIC -Doptimize=-O2 -Duseshrplib -Duse64bitall -Darchname=x86_64-linux-gnu -Dccflags=-DDEBIAN --as threaded-perl-5.14.2

(Note that I’ve customised this for my Ubuntu 64 bit os).

Tagged

Perl debugger

Since writing my initial post on my settings for the Perl debugger I’ve found another setting that I’ve found to be invaluable. If you create a file, ~/.perldb it will load your settings each time, mine now has two lines on whatever machine I’m using,

$DB::deep = 1000;
parse_options('dumpDepth=2');

The dumpDepth=2 trick is something I picked up from Chisel’s blog post on the subject and simply limits your x to a depth of 2 by default. This makes life a lot simpler and means I get caught out by accidentally dumping the entire state of an application less often.

Tagged

Faking it with Test::WWW::Mechanize::PSGI

Or should that be getting real?  The Test::WWW::Mechanize::Catalyst module has a really handy feature, with the CATALYST_SERVER environment variable you can set to point your tests at a real live server.  This is really handy for a couple of things.  One of those being monitoring real traffic for those cases where you can’t quite decide whether it’s your test or your server that’s broken.  You can fire up Wireshark and watch the actual traffic going over the wire.

With Test:::WWW:Mechanize::PSGI there doesn’t appear to be that option, and Test::WWW::Mechanize and the modules it wrap don’t really appear to have any simple way to provide that.  A look at the T:W:M:Catalyst module suggests it’s actually a nice chunk of work that they did to implement that feature.  Since I’m lazy and I wanted to solve a problem I came up with a simple way to fake it for now.  I converted the urls from simple /path to proper qualified http://localhost:5000/path urls in my tests then added a simple bit of code to flip between Test::WWW:Mechanize and Test::WWW::Mechanize::PSGI.

my $mech;
if($ENV{EXTERNAL_SERVER})
{
    $mech = Test::WWW::Mechanize->new;
}
else
{
    my $app = Plack::Util::load_psgi 'app.psgi';
    $mech = Test::WWW::Mechanize::PSGI->new( app => $app );
}

Now if I run the tests with EXTERNAL_SERVER=1 it goes to a real server rather than straight to code.  That means I can listen on the loopback adaptor in Wireshark and see what’s actually going over the wire simply.  It’s not as neat as the CATALYST_SERVER feature, but it does for now.

Tagged

POST testing JSON REST API’s with WWW::Mechanize

Having just read the article on POST and PUT in REST API’s I realised I’d goofed a couple of my operations on one of my API’s.

I have tests and this is Perl so how hard can it be to convert over? With Catalyst::Action::REST indeed it is pretty simple to convert my calls, in fact it’s a case of changing the word PUT to POST in some of my function names. It’s the tests where things got interesting. I’m using Test::WWW::Mechanize variants to do my testing because it’s nice and simple. Unfortunately switching from put_ok to post_ok didn’t produce the desired results. When it came to the API it wasn’t reading the data at all. A bit of digging revealed that the post_ok call encoded the parameters in the application/x-www-form-urlencoded style before posting, where as the put_ok call just passed the json through raw.

Some digging into the Catalyst::Action::REST module revealed that they may well have had a similar issue because they created a little helper module called Test::Rest (not to be confused with Test::Rest on CPAN) which created the requests by hand for use in the test suite. Of course they may have simply been avoiding dependencies, and just been lucky to avoid the magic.

I didn’t manage to figure out a way to turn it off so in the end I did a similar thing. The fix for my test suite was to create a simple sub like this that rolled my own POST request with out any magic, then to call $mech->request to simply pass the request through like I was already doing with the DELETE’s. It’s basically a dumbed down version of a method from the Test::Rest from Catalyst::Action::REST.

use HTTP::Request;

sub construct_post
{
    my $url = shift;
    my $data = shift;

    my $req = HTTP::Request->new( "POST" => $url );
    $req->content_type( 'application/json' );
    $req->content_length(
        do { use bytes; length( $data ) }
    );
    $req->content( $data );
    return $req;
}
Tagged

Catalyst and Plack testing

Since Catalyst has switched to Plack for it’s underlying engine it’s opened up lots of funky new possibilities. You can move parts of your infrastructure outside of your catalyst app, while still making use of the catalyst configuration, and still keeping it in the code for the project.

When it comes to testing it does not appear that the Catalyst::Test and Test::WWW::Mechanize::Catalyst modules automatically pick up your .psgi file when building the test server. This might be a feature for some tests, but sometimes you’ll definitely want to test the whole lot together. Luckily that’s fairly simple. When it comes to replacing Test::WWW::Mechanize::Catalyst you can switch to PSGI instead of Catalyst. Probably the closest thing to Catalyst::Test is Plack::Test. Both test modules require you to provide a $app object which Plack::Util makes it easy to load from your existing .psgi file.

Here’s a really simple test converted over to use a .psgi file.

use Test::Most;
use Test::WWW::Mechanize::PSGI;
use Plack::Util;

my $app = Plack::Util::load_psgi 'app.psgi';
my $mech = Test::WWW::Mechanize::PSGI->new( app => $app );

$mech->get_ok('/');

done_testing();
Tagged , ,
Follow

Get every new post delivered to your Inbox.

Join 58 other followers