DBIx::Class Hackathon

I attended the first DBIx::Hackathon this weekend and I want to thank Jess for organising it, and Shadowcat who put a lot of time and effort into the event. Kudos to the sponsors Eligo and CTRL-O who provided backing.

I spent the day trying to improve the ‘where’ feature. It was my first patch to the core DBIx::Class module. Previously, I’ve reported little bugs to the project. Ribasushi, as ever was a great help with explaining what was needed in terms of fixing the issue. In fact he knows the code base so well he practically spoon fed me the solution as it’s just a minor tweak in his mind ;)

I suspect however this patch will never see the light of day. In some ways, I wish I had done more preparation for the hackathon so that I could have discussed it more intelligently in person. However, I’m glad I implemented the patch, because now it makes it easy to prove whether the feature is a good idea or not. One of the nice things about a hackathon is that you can free up a chunk of (relatively) guaranteed time for a project, something I find a lot harder to do in general life.

To try to demonstrate and test the various types of relationship I created a simple project on github. I’ve used DBIx::Class::Candy which is a small sugar layer on top of DBIx::Class so I hope you’ll forgive the slightly simpler syntax for my code. It’s a very trivial contrived 2 table database at the moment that allows me to play with some of the relationships.

My suspicion since I encountered the where feature is that it’s unfinished* and that extending it in a meaningful way will be tricky without causing lots of surprises to existing code that uses it.

The where attribute is described as useful for filtering relationships. As you’d imagine when you use it it sticks a where clause onto the search you do when following the relationship.

package …::Order;
...
has_many delivery_charge => '...::OrderItem', 'order_id',
            { where => { type => 'delivery' } };

In my example I have a relationship that allows me to pick out the delivery charges from the order as they are line items of the type ‘delivery’. Assuming you have fetched an order from the database and you follow the ‘delivery_charge’ accessor you’ll get an OrderItem resultset filtered to the type ‘delivery’.

$order->delivery_charge; # resultset containing order items of type ‘delivery’
# SELECT me.id, me.order_id, me.name, me.type, me.quantity, me.price 
#   FROM order_items me 
# WHERE me.order_id = '1' AND type = 'delivery'

That’s all pretty obvious really.

The problem really becomes obvious when you do a prefetch on the relationship.

my $order = $schema->resultset(‘Order’)->search({ id => 1 }, { 
prefetch => ['delivery_charge']
})->first;
$order->id; # 1
my @delivery_charges = $order->delivery_charge->all; # actually returns all order lines, where clause not applied.

SELECT me.id, me.name, me.active, delivery_charge.id, delivery_charge.order_id, delivery_charge.name, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = me.id 

The logical fix is to apply the where clause when joins happen. This is what the patch I created does.

Only where clause restrictions and joins are tricky beasts. There are two fundamental problems.

  1. Conceptually you probably really want the where clause to become a clause in a left outer join. I’ll explain more in a moment.
  2. You need to worry about table aliases. Since you’re now dealing with multiple tables you need to worry about ambiguity.

The first problem is the more serious and, in my opinion, will kill the current patch, and possibly any realistic attempt to fixing this feature.

Lets cover the first point now. The prefetch is a good example of how we want to get a list of orders, and we want the delivery charge items loaded at the same time so that we don’t have to make subsequent database calls. With the where clause being generated a query looks something like this,

SELECT me.id, me.name, me.active, delivery_charge.id, delivery_charge.order_id, delivery_charge.name, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = me.id 
WHERE type = 'delivery'

Now consider a query for orders placed for today. When we make that query we expect to see all the orders placed today. If however we have some orders with delivery charges, and some that have none because they were collected in store, guess what happens when that where clause kicks in. It ends up filtering out the orders without any delivery charges. This is an SQL 101 type of problem that you generally solve with a left join and the clause promoted into the join condition. In fact the query generated already half does that, it does just that with the id link, so that the missing items shouldn’t prevent the main thing from being found. It’s just the where clause is in the wrong place. With the join ‘fixed’ you end up being surprised by your results when you join on one a relationship with a where clause.

The second problem will cause things to fail more visibly. Lets consider if we add a type field to our Order class now. A perfectly logical thing, we could add types like ‘store’, ‘internet’, ‘mail-order’. Then the query generated blows up again. You could hard code in a table alias into the where clause but that will fail when you have complex queries and you haven’t predicted the generated table alias correctly. DBIC doesn’t know what to do to automatically fix the reference and there isn’t any special syntax to do so.

I think the first problem is the more serious, and is the reason the feature should be left as it is. The second I think will ensure that it is because it’s likely to visibly blow up legacy code. It’s the legacy code that really makes fixing this awkward. As ribasushi made clear on the day, it would be fixable in a reasonably clear way by making the join condition a sub, but the existing code as it stands will fail if we add it to the join condition. It would be possible to add new semantics that enable newer style behaviour and fall back to the current behaviour when the old style declaration is used, but is it worth it? The where keyname seems like a good idea, except is it really? Assuming you don’t want problem 1 – the overly restricted search results – it won’t actually be a where clause in a join, so ‘where’ doesn’t make so much sense. If we wanted syntactic sugar to make this feature easy to use and just work we’d be better off coming up with a new key and semantics. The ‘where’ feature I would rather leave as a footnote in the documentation as something to avoid in newer code.

The sane way to achieve the same result these days is with the ‘Custom join conditions’ documented in DBIx::Class:Relationship::Base. They allow you to produce arbitrarily complex conditions. Because it takes a sub it becomes easy to deal with the ambiguity issue clearly and it also becomes possible to do far more complex things too.

has_many standard_products => '...::OrderItem',
            sub {         
                my $args = shift;
                return ( 
                    {    
                        "$args->{self_alias}.id" => { -ident => "$args->{foreign_alias}.order_id" },
                        "$args->{foreign_alias}.type" => 'standard',                            
                    }                                                                           
                );                      
            }; # 'Custom join conditions' perldoc DBIx::Class:Relationship::Base

The downside of this compared to the where keyword is that it requires you to specify the whole condition, even the id join, rather than simply bolting on an extra condition. It’s great for allowing you to do anything, but the simple case is a little cumbersome.

See my previous blog post for an example of altering the join at runtime.

* apologies to the person whose baby I am calling ugly.

Tagged

New Salt version break

I started getting these error messages last week and I discovered it was actually caused by an incompatibility between the new version 2014.01 and an old server 0.17. In general if you’re finding strange errors, particularly with configuration that used to work, double check your version numbers match. You can always do a ‘salt machine test.version’ to check the clients version.

salt machine state.highstate
machine:
----------
	State: - no
	Name:  	states
	Function:  None
    	Result:	False
    	Comment:   No Top file or external nodes data matches found
    	Changes:

Or via salt-call state.highstate

[ERROR   ] Got a bad pillar from master, type bool, expecting dict: False
Tagged ,

LXC Networking

My infrequently used laptop appears to have developed some issues with the networking on new lxc containers (Ubuntu 13.10) so I finally started to dig into how it’s setup. I first blew away the caches in /var/cache/lxc, then tried installing a fresh box with a completely stock template. The network interface didn’t acquire one of the usual 10.0.3.0 network ip addresses. A look in the logs suggested the dhcp server was receiving a DHCP request, and it was sending back a lease, but the client wasn’t picking that up.

tail /var/log/syslog
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 10.0.3.231 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 10.0.3.231 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 10.0.3.231 00:22:3e:ea:aa:fa 

A google suggested my problem was the UDP checksums – bug 1204069. A look using tcpdump suggested that indeed my checksums were corrupt; note the ‘bad udp cksum’ on the second packet.

sudo tcpdump -vvv -i lxcbr0
21:36:34.009788 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 00:22:3e:ea:aa:fa (oui Unknown), length 300, xid 0xb407cc75, secs 49, Flags [none] (0x0000)
	  Client-Ethernet-Address 00:22:3e:ea:aa:fa (oui Unknown)
	  Vendor-rfc1048 Extensions
	    DHCP-Message Option 53, length 1: Discover
	    Parameter-Request Option 55, length 13: 
	      Subnet-Mask, BR, Time-Zone, Default-Gateway
	      Domain-Name, Domain-Name-Server, Option 119, Hostname
	      Netbios-Name-Server, Netbios-Scope, MTU, Classless-Static-Route
	      NTP
	    END Option 255, length 0
	    PAD Option 0, length 0, occurs 41
21:36:34.010027 IP (tos 0xc0, ttl 64, id 44699, offset 0, flags [none], proto UDP (17), length 328)
    10.0.3.1.bootps > 10.0.3.231.bootpc: [bad udp cksum 0x1c2d -> 0x1a4c!] BOOTP/DHCP, Reply, length 300, xid 0xb407cc75, secs 49, Flags [none] (0x0000)
	  Your-IP 10.0.3.231
	  Server-IP 10.0.3.1
	  Client-Ethernet-Address 00:22:3e:ea:aa:fa (oui Unknown)
	  Vendor-rfc1048 Extensions
	    DHCP-Message Option 53, length 1: Offer
	    Server-ID Option 54, length 4: 10.0.3.1
	    Lease-Time Option 51, length 4: 3600
	    RN Option 58, length 4: 1800
	    RB Option 59, length 4: 3150
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    BR Option 28, length 4: 10.0.3.255
	    Default-Gateway Option 3, length 4: 10.0.3.1
	    Domain-Name-Server Option 6, length 4: 10.0.3.1
	    END Option 255, length 0
	    PAD Option 0, length 0, occurs 8

The bug suggests that that should no longer be a problem, so I carried on googling. Turning my search criteria to DHCP udp problems. That came up with bug 930962 which gave a potential work around and suggested that it should be fixed. Since it said there should already be a firewall fix in the config I decided to take a look at the configuration. To figure out where that was I used dpkg -L

dpkg -L lxc
…
/etc/init/lxc-net.conf
…

After a little digging I found the network configuration in /etc/init/lxc-net.conf where the setup of all the networking out of the box that I so like about ubuntu was all there. There was no mangle line as suggested in the bug report. At that point I checked the version I was running, and the version on the bug report and realised that was a pretty recent bug fix, and I simply don’t have it. Rather than figure out where the fixed package is, I figured I may as well just try to fix it myself for now since I’ve come so far.

sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport bootpc -j CHECKSUM --checksum-fill

After that the networking came up, or more specifically the server obtained an ip address and I was able to ssh to it. I added pretty much that line into my /etc/init.lxc-net.conf and it now works out of the box again and I have a better understanding of how lxc sets up it’s network. I’ve also finally solved a case of checksum offload issues for myself rather than just hearing about them. Possibly in the strangest way, i.e. not turning off the network offload of the checksums.

Of course the adventure didn’t end there. I then found that salt wasn’t working. When I looked for salt keys I didn’t find any requests from new servers. A tcpdump showed a similar checksum issue with the salt traffic. A quick hack with iptables again solved that problem.

sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport 4505 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --sport 4505 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport 4506 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --sport 4506 -j CHECKSUM --checksum-fill
Tagged , , ,

Testing out uninstall to fix a perl modules dependency issue

Just recently after an upgrade to some CPAN modules I started getting this crash on one of my machines when the Catalyst::View::JSON was loaded.

#     Error:  Couldn't instantiate component "TestApp::View::JSON", "Recursive inheritance detected in package 'Types::Serialiser::BooleanBase' at (eval 1547) line 76."Compilation failed in require at (eval 4) line 2.

The actual source of that error appears to be JSON::XS rather than Types::Serialiser::BooleanBase or TestApp::View::JSON.

I didn’t investigate the error properly, or really fix it properly. Instead I tested out one of the newer features of cpanm, the -U uninstall flag. I simply uninstalled JSON::XS and hey presto, no more crash.

cpanm -U JSON::XS
JSON::XS contains the following files:

  /home/colin/perl5/perlbrew/perls/perl-5.14.4/bin/json_xs
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/JSON/XS.pm
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/JSON/XS/Boolean.pm
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/auto/JSON/XS/XS.bs
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/auto/JSON/XS/XS.so
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/man/man1/json_xs.1
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/man/man3/JSON::XS.3
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/man/man3/JSON::XS::Boolean.3

Are you sure you want to uninstall JSON::XS? [y] y

That probably warrants some explanation. The new Catalyst::Runtime now appears to pull in the new alternative to JSON::XS, Cpanel::JSON::XS so this can now be used instead, and so things just worked. It’s probably a bit drastic a solution for most systems at the moment, I’m sure it will demonstrate any places where I have direct dependencies on JSON::XS. On my development box that should be handy however. I’d rather be using a single library for that single purpose.

Tagged , ,

Getting started with Salt

I’ve finally started to use the salt stack in anger now and I thought I’d make some quick notes on what I thought I was slow to pick up on.

Specifying that a box needs to have x services installed is all done in the configuration. I understood that there are two aspects to salt, one is running things via the command line, the second is configuration management. I thought I’d be able to do something like ‘salt provision box-a service-b’. Instead you generally put the machine names and what they should have in top.sls in the /srv/salt directory along with all the other config.

Pillars appears to be intended to provide the site specific data generally. In general the salt .sls files are the config that can be used everywhere, sometimes templated, and the pillar data can be used in those templates to insert data specific to a site/machine. This means you should be able to use the same basic salt setup on multiple salt-masters, and simply change the data on them to generate machines with different users and setups.

Salt appears to be very opaque to start with. It took me too long to realise that like most things, it logs. In fact it logs very well so you can really turn the log level up to a very high level if you are troubleshooting. It can be interesting to turn it up if you want to see what it’s doing in real time, otherwise you just see problems logged.

It’s well worth taking a look at the state documentation as you’re looking at the tutorials. I found the examples in the tutorial relatively hard to pick apart until I was able to see the references for the various states.

http://docs.saltstack.com/ref/states/all

The file state is particularly worth taking a look at. A lot of deployment is pushing files about and modifying them.

http://docs.saltstack.com/ref/states/all/salt.states.file.html#module-salt.states.file

Make sure you run the latest versions from salt’s official repositories if possible. The ones packaged with your OS are likely to be old. And don’t run inconsistent versions between your masters and minions. That way leads to fail.

If something obvious doesn’t appear to work, try it with a different package. I found my simple couchdb installation didn’t work quite right out of the box. I tried exactly the same thing with memcached and it did. This turned out to be a bug.

I’m still early on my journey into salt, but so far it appears to be very useful.

Tagged ,

Salt service running

If you’re using salt to ensure you’re running your services and you constantly see your service being started it might be because the service doesn’t support the status command.

----------
    State: - service
    Name:      openerp
    Function:  running
        Result:    True
        Comment:   Started Service openerp
        Changes:   openerp: True

If you tweak the salt minion to log trace messages you’ll see that salt calls service openerp status.

/etc/salt/minion:
log_level: trace

# service salt-minion restart

/var/log/salt/minion
2013-10-02 19:26:32,602 [salt.loaded.int.module.cmdmod][INFO    ] Executing command 'service openerp status' in directory '/root'
2013-10-02 19:26:32,611 [salt.loaded.int.module.cmdmod][INFO    ] Executing command 'service openerp start' in directory '/root'

If we try that on the command line we’ll see that the openerp service doesn’t support the status option.

root@openerp-aq:~# service openerp status
Usage: openerp-server {start|stop|restart|force-reload}

That means salt gets an error and assumes the service isn’t running so it starts it.

The service state has an alternative way to check for the service running by looking at ps and grepping for the process. This is done by specifying the sig key.

openerp:
  service:
    - running
    - sig: openerp
    - require:
      - pkg: openerp

Now we finally see what we want, salt leaving the service running.

----------
    State: - service
    Name:      openerp
    Function:  running
        Result:    True
        Comment:   The service openerp is already running
        Changes:   

In the log we now see,

2013-10-02 19:31:15,371 [salt.loaded.int.module.cmdmod][INFO    ] Executing command "ps -efH | grep 'openerp' | grep -v grep | awk '{print $2}'" in directory '/root'
Tagged ,

Grepping for a single, arbitrary character in a bunch of files.

I had a random error complaining about being unable to read a source file without any real explanation of which file was the problem.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 25: ordinal not in range(128)

The most logical way to rip through a number of files is find. On linux xxd is the simplest tool for hex dumping and grep can then look at it’s output. The problem occurs when you realise you can’t use | in an -exec with find, or at least I haven’t figured out how. The simplest way to work around that is to put your command into a tiny shell script and -exec that. Obvious really, but it always takes me a moment to remember that.

Note that this is for grepping for a single byte. Grepping for multiple bytes would require a different approach entirely.

cat > bingrep.sh
#!/bin/sh
xxd -g 1 $2 | grep -i "\<$1\>" -q
^d
chmod +x bingrep.sh
find . -type f -name "*.py" -exec ./bingrep.sh c3 {} \; -print

That allows me to look for the rogue 0xc3 characters in the source code I was dealing with.

I suspect I ought to be able to do something similar with regex’s, but you run the risk of stupid interpretation problems. Sometimes it’s simpler to just look at it in the raw, relatively speaking.

Tagged

A few tips on using the OpenERP XML RPC API

I’ve been developing applications that talk to OpenERP via it’s API for a little while now and I figured it would be worth noting down some general pointers for using it. The API is used by all the UI clients for OpenERP so it definitely allows you to do everything that they do. It’s also got a reasonable amount of access to the data layer so you can do an awful lot via it. From simply creating interfaces to it, to streamlining a process with a slick UI, there are a lot of possibilities with it.

Here are my tips for developing with it,

  1. Run OpenERP with the --log-request and --log-response command line flags to see how OpenERP achieves it’s tasks. It’s also helpful to see what your application is sending.
  2. Don’t be afraid to read the code. Watching OpenERP is a great starting point, but sometimes you need to look at the code to understand what a parameter is for, or how it’s getting the data.
  3. Don’t be afraid to use the debugger. Putting --debug on the command line allows the python debugger to kick in when there is an exception. It also allows you to stick ‘import pdb; pdb.set_trace()’ onto a line in the code you want to investigate and drop into there in the debugger.
  4. Pass the context like you are told to, it makes life easier if you are setup to simply pass it along with your calls (but also be ready to add data to it). The context contains things like the language which is used for translations. You want your applications language to be consistent with any users of OpenERP itself otherwise things will get weird. Also note that some calls require extra information to be passed in via it. It’s not a completely opaque blob that you simply pass about between calls.
  5. Limit the columns you ask for. When you make a read call you can specify which columns you are interested in. Do that from the start or you’ll end up with performance problems later on. Once you have a partner with 1000’s of orders and invoices etc. a simple read of the res.partner will take a significant chunk of time if you aren’t limiting what you read from it.
  6. Don’t be afraid to extend OpenERP. Even if you aren’t a Python developer by trade, if you’re doing serious data modification you’re better off creating a module and calling a method on that than making lots of API calls. The main reason for that is to get everything into a single transaction. The other reason is speed.
  7. Be careful about filtering by calculated fields, they generally get turned into a ‘TRUE’ statement in the SQL. This can really screw things up when you have OR conditions in your filter. Use the --log-sql if you’re unsure what is happening.
  8. False is used for Null/None in the XML RPC
  9. Returning a dictionary from an OpenERP method via the API requires the keys to be strings. i.e you can’t simply return { id: quantity }. You’d need to do { str(id): quantity } to prevent a crash. None is also a no-no for dictionary values. Convert them to False if you want to be consistent with OpenERP.
  10. Formatting of numbers to n decimal places is largely done client side. OpenERP provides you all the info you need, but you need to deal with that yourself.
  11. Beware of gapless ir.sequence sequences in batch jobs. They have a high potential for causing contention.
Tagged

Postgres table locks

I’ve just been looking into some issues with locking in Postgres and the documentation as ever has been excellent.

http://wiki.postgresql.org/wiki/Lock_Monitoring

A closer look at the queries provided suggests they don’t show table level locks. At the very least if you do a pg_dump of your database while checking those queries you see nothing, despite there definitely being some locks going on. This query probably isn’t perfect, and is simply based on a quick practical test of running pg_dump against a test db but it may help spot the table locks which could be blocking things.

select pid, usename, datname, current_query 
from pg_catalog.pg_locks l 
inner join pg_catalog.pg_stat_activity a on a.procpid = pid 
where mode like '%ExclusiveLock%';
Tagged

OpenERP debugging tip – turn off cron

While we’ve been doing a lot of OpenERP deployment we have been discovering various ways to configure it and one turns out to be very handy for debugging. If you’ve ever debugged OpenERP using the --debug flag and dropped into the debugger you have probably noticed the system carries on doing things while you’re sat at the debugger prompt. Often obliterating what you were looking at. This generally happens because OpenERP generally runs with multiple threads out of the box, and some of those threads do the ‘cron’ jobs, the background tasks, so even if you haven’t tried to do anything on the website, there will be activity. If you want to prevent the cron activity from making your debugging session more confusing than it needs to be add the --max-cron-threads=0 flag when you run OpenERP.

Tagged
Follow

Get every new post delivered to your Inbox.

Join 58 other followers