Setting up a new CPAN Mini Mirror

It’s time for me to set up a new dev laptop and I wanted to set up a CPAN Mirror quickly. Using a mirror has been something I’ve done for a long time now, it’s great for coding without internet. Unless you’re doing lots of module installs this is likely to use more bandwidth than downloading on an ad hoc basis, but the tradeoff is that I can install almost any module I realise I need regardless of whether I have connectivity or not. The only limitations will be down to external dependencies on things like operating system packages/tools.

I have previously setup mirrors using lxc and salt to provision the machine, but this time I decided to convert that set up to Docker. I also simplified the setup to just the mirror as I didn’t need to inject modules into the CPAN server any more. Well, in truth I never really did on my laptops, I simply did that because it was useful for work at the time. I created the salt configuration to make it easier to re-provision new servers for a work setup that allowed for private mirror that also had work modules allowing for a full CPAN type deployment process of both public and private code.

To do this I’ve used docker-compose, that’s almost always the best option for a laptop setup. Even with a single machine it generally makes life simpler as you can encode all the configuration in the file so that you have a few simple consistent commands to build/setup/run your containers. I’ve also set it to the highest version number currently available partly just to see what’s available, and partly because I want to make use of some of the newer features of volumes. It doesn’t appear that I can express everything I want to perfectly in the docker-compose file, but I suspect that they aren’t really targeting my situation. While docker is partly really popular because it works really well on developer laptops, cached file system layers and lightweight machines work really well in a constrained environment, fundamentally docker is aiming for servers and so they’re trying to deal with issues of sharing resources across multiple machines rather than working on a single machine.

What I really wanted to be able to do was specify the exact details of the volume I wanted shared between the containers, from the location down to the user id’s for them. If that’s currently fully possible I haven’t seen a way to do it.

This setup is not designed for the general internet or with security in mind (not that a simple mirror really should really have much in the way of threats). I don’t even expose the ports, just print out the url for use when running cpanm. I also just update the server manually rather than setup a cron job for it as I don’t really want to use that much bandwidth on this. I don’t use the mirror that often, but when I do it’s really valuable, even if it’s not all the latest and greatest versions of the modules.

Having all the modules locally can also be great when you want to do some analysis of what existing modules are doing. It’s reasonably easy to write scripts to say find all the XS modules and then extract their C code to see which call a particular function.

The configuration is on github here, The modules downloaded are kept in a volume outside the containers so updating/removing etc. should be easy. In theory it should even be possible to wrap the set up around an existing mirror if you already have files.

An Operations guide to Catalyst configuration in Docker

We use the Perl Catalyst web framework at $work.  A lot.  It’s got most of the stuff you want for web sites and services, and it’s pretty solid and it’s lovely and stable.

As with most systems it has a well established method of configuration and it allows you to use configuration files.  Theses are handy for all sorts of reasons, but in a docker environment most of the file system generally wants to be essentially static.  If you need to change config files on every different deployment of your container you’re probably going to need to do something ugly.  Docker is much happier with allowing you to push in settings via environment variables instead.  This fits much more neatly into the 12 factor app methodology that using docker itself fits neatly into.

Of course Catalyst has a solution to that that doesn’t require wholesale change, you can just load the Catalyst::Plugin::ConfigLoader::Environment plugin to your application.  That allows you to override parts of your configuration.

Perl modules generally have names like this, My::App, often (but not always) with capitalized names, and :: between words.  The corresponding catalyst configuration file for that application would be named my_app.conf.  It is simply a lowercase version of the app name, with :: replaced with _.

The environment variables you override the config with need to be an all uppercase version of the app name, again with :: replaced by ::.  So it would start MY_APP.  Then you put another _ after that, and then you specify the configuration section you want to override.

Let’s take a look at an example config,

using_frontend_proxy 1 

   connect_info dbi:Pg:host=db;dbname=patch
    connect_info someuser
    connect_info averysecurepassword
      quote_char "\""
      name_sep .
      pg_enable_utf8 1

Note that catalyst config files can generally be in lots of different formats.  This is kind of an apache like config style, but it can also be yaml and other formats.  The configuration loaded Cataylst apps commonly use is very flexible.

If we want to change the using_frontend_proxy setting we can set the environment variable MY_APP_using_frontend_proxy to 0.

To set more complex data structures we can set a json value.  This will get merged into the config, so if something exists in the hash/dictionary but isn’t overwritten then it will generally be left alone.

The configuration file above roughly translates to a dictionary looking like this in code (expressed in json),

    "Model::Cabbages": {
        "connect_info": [
                "name_sep": ".",
                "pg_enable_utf8": "1",
                "quote_char": "\""
    "using_frontend_proxy": "1"

Note how the repeated connect_info were translated into an array, and then a dictionary.  This may seem strange, but the connect_info block is a very common structure in the perl world as most if not all will be passed straight onto DBI, the workhorse doing most database work.

So to change the password in the database configuration we do,


Since that’s an array of configuration settings we end up having to specify the whole lot, otherwise we lose some of the configuration.  Also note that we turned the :: into __.  That is generally a good thing when working on the command line as putting :: into env variables via bash is tricky (if possible).  

The one oddity is if you have a configuration key with more than 1 set of double colons.  E.g. Model::Payment::Client.  Then you just need to live with the :: otherwise your override will be interpreted in an odd fashion.

Perl and docker-compose configurations can set environment variables with :: easily though so this generally isn’t a big deal.  env and set can confuse however as one will show environment variables with ::, and the other won’t.


Take the configuration filename, drop the extension, upper case it.

Add an underscore.

Take the first section or key, and append that to the environment variable.

That’s the variable that will target that config section.

Now set it’s value to either a string (if it’s a simple string value), or a json structure that matches the configurations effective structure.

Now you discover whether someones done something confusing and eschewed convention. Hopefully not.

There is something missing here, and it’s how to test changes, and generate the correct json easily. I have the tooling to make it reasonably simple, I just need to open source it.

PAUSE permissions code outline

This is a quick brain dump from my brief coding session on PAUSE working on the permissions. This won’t make sense without reading the code yourself. Unless you’re interested in modifying PAUSE or reviewing my changes I would ignore this blog post. I’m writing this for my future self as much as anyone. The behaviour I modified in my pull request is in italics.

The code for setting the permissions when a new distribution is indexed is largely in PAUSE::package. It’s triggered by ./cron/mldistwatch and PAUSE::mldistwatch, but the code that actually sets the permissions is in PAUSE::package.

perm_check checks that the current user has permission to be uploading this distro and bails if not. If this is a new package (and they do have permission) then it also adds the user to the perms table as they are effectively first come.

give_regdowner_perms adds people to the perms table where necessary. This was supporting the module permissions in amongst other things. It is now also where we copy the permissions from the main module.

The checkin method as well as adding the record to the packages table also adds to the primeur table via the checkin_into_primeur function. If the x_authority flag is used then that user is added, otherwise the main module is looked up, or failing that the uploading user.

Note that the first come (primeur) users appear in both the perms and primeur tables.

There are special checks for Perl distributions in the permissions code that will change the behaviour of the permissions. I am purposely not mentioning them as I never took the time to understand them.

A quick note on testing. As well as the tests which work of the dummy distributions in the corpus directory there is a test utility. To get a look at the permissions database in practice use the build-fresh-cpan script. This will build a one off cpan environment that you can examine. Just pass it a package to index and then you can check the permissions database.

$ perl one-off-utils/build-fresh-cpan corpus/mld/009/authors/O/OO/OOOPPP/Jenkins-Hack-0.14.tar.gz
Initialized empty Git repository in /tmp/h4L3GU6Awi/git/.git/
$ ls
cpan  db  git  Maildir  pause.log  run
$ cd db
$ ls
authen.sqlite  mod.sqlite
$ sqlite3 mod.sqlite 
SQLite version 2014-10-29 13:59:56
Enter ".help" for usage hints.
sqlite> select * from primeur;
sqlite> select * from perms;

Docker logging and buffering

When you start using Docker one of the things it’s quite possibly you’ve hit is output buffering. Docker encourages you to log to stdout/err from your program, and then use docker to feed your regular logging solution.

This generally exhibits itself as you seeing no logging, then after the program has been running for a while you come back to the logs and find they are all there. It doesn’t get as much press as caching but it can be just as head scratching.

With Perl for example the buffering kicks in when it’s not connected to a terminal. For that this somewhat idiomatic snippet can be useful,

select( ( select(\*STDERR), $|=1 )[0] );
select( ( select(\*STDOUT), $|=1 )[0] );

As ever with Perl there is more than one way to do it of course…

Note that you could experience this sort of thing when using other logging setups. If you pipe to logger to output to rsyslog you could experience the same issues.

Perl QA Hackathon report (#perlqah2016)

Thank you to all the people who sponsored the Perl QAH hackathon and all those that provided their time. It was a very productive environment for a lot of projects.

I worked on 2 primary things relating to PAUSE permissions. PAUSE (indexing) itself and a new module for testing that permissions are consistent named Test::PAUSE::ConsistentPermissions.

The permissions of distributions on CPAN are generally something you don’t think about until you become a co-maintainer of a distribution and you see red on or Then you discover that there is a system by which each module in your distribution is effectively owned by somebody, and that some others might also have been granted permission to upload a file.

Let’s take Test::DBIx::Class as an example. I’m a co-maintainer, and JJNAPIORK is the owner. There is also another Co-Maintainer.

Using the script from Test::PAUSE::ConsistentPermissions I can look at the permissions currently assigned.

$ pause-check-distro-perms Test-DBIx-Class Test::DBIx::Class
Distribution: Test-DBIx-Class
Module: Test::DBIx::Class

When I upload a new release of Test::DBIx::Class to PAUSE any new modules added will be given to me. So I will be owner and no-one will be Comaint. I then need to grant PHAYLON comaint (which I can do as owner), and then pass ownership back to JJNAPIORK.

There is an alternative mechanism involving a bit of metadata named ‘x_authority’. With that we could ensure that JJNAPIORK retains ownership of all the modules within the distribution. The downside of that is that while I would also gain comaint on those new modules as the uploader, PHAYLON would not. Since I wasn’t the owner either, I wouldn’t be able to assign him Comaint, and I would have to ask JJNAPIORK to do that instead.

I believe there were historically alternative methods for managing this, but PAUSE has been through some rationalisation and simplification of some features and those features don’t exist anymore.

I came to this hackathon wanting to work on this problem having encountered Karen Etheridge (ETHER)’s prototype Dist::Zilla::Plugin::AuthorityFromModule which suggested a potential solution.

We had a meeting with the interested parties at the hackathon about how we could deal with this scenario better. Ricardo Signes (RJBS) suggested that we could make use of the fact that there is a designated ‘main module’ for permissions. We could  use that for the defaults. This is one step better than the previously suggested solution as it makes use of some of the previous rationalisation of PAUSE permissions and won’t require authors to provide extra metadata. This should mean that permissions are much more likely to just work out of the box.

With that potential solution suggested RJBS gave me some assistance to get working on the PAUSE indexer. I started with adding tests, then wrote the code.

The change made so far is very minimal, only affecting the indexing of new packages. No changes have been made to the user interface, or the complex permissions you can currently end up with. The pull request is here – Note that it also benefits from Peter Sergeant (Perl Careers)’s work to hook the PAUSE tests into Travis giving it a green tick indicating the tests passed.

The other thing I worked on was Test::PAUSE::ConsistentPermissions to allow us to check that a distribution has consistent permissions. I created a script for checking a distro on PAUSE (not too dissimilar to Neil Bowers App::PAUSE::CheckPerms module) and a test function for using a distro’s release tests. This is a bit like Test::PAUSE::Permissions, but rather than check whether you have permission to upload the distribution, it checks whether the permissions are consistent. These two properties don’t necessarily coincide.

During the event I was also able to create a couple of minor pull requests.

Here’s the obligatory thank you to the full list of sponsors. Thank you all.

The sponsors for the Perl QA Hackathon 2016,

DBIx::Class join pruning

DBIx::Class has a join optimiser to prune out any unnecessary joins from the queries it generates.  Occasionally this can cause code that looks good to fail unexpectedly.  The most obvious time this normally happens when using literal SQL.

The most common example of this is when doing an aggregate operation with a group by.  Here I’m trying to sum and count some values on a joined table.  This looks like pretty standard DBIC code, but it fails.  I don’t use the linked table in the group by, just in the columns where it’s my own raw SQL.

sub total
    my $self = shift;
    my @columns = $self->result_source->columns;
    my $me = $self->current_source_alias;
    return $self->search(undef, {
        join => ['items'],
        '+columns' => [
            { total => \'sum(amount) as total' },
            { lines => \'count(amount) as lines' },
        group_by => [map {"$me.$_"} @columns],

        total => { '>' => 10 }
        '+columns' => ['total', 'lines']


DBI Exception: DBD::Pg::st execute failed: ERROR:  column "amount" does not exist
LINE 1: ... (SELECT "me"."id", "me"."name", "me"."type", sum(amount) as...
                                                             ^ [for Statement "SELECT "me"."id", "me"."name", "me"."type", "me"."total", "me"."lines" FROM (SELECT "me"."id", "me"."name", "me"."type", sum(amount) as total, count(amount) as lines FROM "orders" "me" GROUP BY "me"."id", "me"."name", "me"."type") "me" WHERE ( "total" > ? )" with ParamValues: 1='10']

Looking closely at the SQL generated the items alias isn’t joined in.  The optimiser has pruned that join because it couldn’t see any reference to the column.  It turns out that DBIC does really try to do the right thing, it’s just we didn’t give it enough info to give it a hint.

Adding the ‘items.’ qualifier to the column names in the SQL allows DBIC to realise that that join is necessary.

   return $self->search(undef, {
        join => ['items'],
        '+columns' => [
            # NOTE: fully qualified name
            # even though they are not ambiguous.
            { total => \'sum(items.amount) as total' },
            { lines => \'count(items.amount) as lines' },
        group_by => [map {"$me.$_"} @columns],

With that table alias, not strictly necessary for the SQL since those column names are unambiguous, DBIC realises that it is needs the join.

SELECT "me"."id", "me"."name", "me"."type", "me"."total", "me"."lines"
  FROM (
    SELECT "me"."id", "me"."name", "me"."type", sum( items.amount ) AS total, count( items.amount ) AS lines
      FROM "orders" "me"
      LEFT JOIN "order_items" "items"
        ON "items"."order_id" = "me"."id"
    GROUP BY "me"."id", "me"."name", "me"."type"
   ) "me"
WHERE "total" > '10'

Vim word breaks and perl

I use vim for coding my Perl and in the most recent versions of Ubuntu I’ve not been keen on one of the changes to the syntax defaults.  The word definition has start to include : which means that class names like Blah::Thing count as one word which isn’t how I’m used to working.  Luckily vim is very configurable and the vim irc channel is also really helpful so I was able to find where the configuration was and how to override it quickly.

Finding out that it’s related to the iskeyword configuration was pretty simple, but changing that didn’t have any effect.  It turns out it is set in a Perl syntax specific configuration so overriding it is a little more tricky.

Figuring out where a configuration setting was set is done using :verb set ?

:verb set isk?
Last set from /usr/share/vim/vim74/ftplugin/perl.vim

Knowing where that was set I could then override it with an ‘after’ configuration file.

" ~/.vim/after/ftplugin/perl.vim
set iskeyword-=:

Then when I check in vim,

:verb set isk?
Last set from ~/.vim/after/ftplugin/perl.vim

For more of a discussion of the language specific configuration and the after files see this vim wiki entry –

DBIx::Class Hackathon

I attended the first DBIx::Hackathon this weekend and I want to thank Jess for organising it, and Shadowcat who put a lot of time and effort into the event. Kudos to the sponsors Eligo and CTRL-O who provided backing.

I spent the day trying to improve the ‘where’ feature. It was my first patch to the core DBIx::Class module. Previously, I’ve reported little bugs to the project. Ribasushi, as ever was a great help with explaining what was needed in terms of fixing the issue. In fact he knows the code base so well he practically spoon fed me the solution as it’s just a minor tweak in his mind 😉

I suspect however this patch will never see the light of day. In some ways, I wish I had done more preparation for the hackathon so that I could have discussed it more intelligently in person. However, I’m glad I implemented the patch, because now it makes it easy to prove whether the feature is a good idea or not. One of the nice things about a hackathon is that you can free up a chunk of (relatively) guaranteed time for a project, something I find a lot harder to do in general life.

To try to demonstrate and test the various types of relationship I created a simple project on github. I’ve used DBIx::Class::Candy which is a small sugar layer on top of DBIx::Class so I hope you’ll forgive the slightly simpler syntax for my code. It’s a very trivial contrived 2 table database at the moment that allows me to play with some of the relationships.

My suspicion since I encountered the where feature is that it’s unfinished* and that extending it in a meaningful way will be tricky without causing lots of surprises to existing code that uses it.

The where attribute is described as useful for filtering relationships. As you’d imagine when you use it it sticks a where clause onto the search you do when following the relationship.

package …::Order;
has_many delivery_charge => '...::OrderItem', 'order_id',
            { where => { type => 'delivery' } };

In my example I have a relationship that allows me to pick out the delivery charges from the order as they are line items of the type ‘delivery’. Assuming you have fetched an order from the database and you follow the ‘delivery_charge’ accessor you’ll get an OrderItem resultset filtered to the type ‘delivery’.

$order->delivery_charge; # resultset containing order items of type ‘delivery’
# SELECT, me.order_id,, me.type, me.quantity, me.price 
#   FROM order_items me 
# WHERE me.order_id = '1' AND type = 'delivery'

That’s all pretty obvious really.

The problem really becomes obvious when you do a prefetch on the relationship.

my $order = $schema->resultset(‘Order’)->search({ id => 1 }, { 
prefetch => ['delivery_charge']
$order->id; # 1
my @delivery_charges = $order->delivery_charge->all; # actually returns all order lines, where clause not applied.

SELECT,,,, delivery_charge.order_id,, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = 

The logical fix is to apply the where clause when joins happen. This is what the patch I created does.

Only where clause restrictions and joins are tricky beasts. There are two fundamental problems.

  1. Conceptually you probably really want the where clause to become a clause in a left outer join. I’ll explain more in a moment.
  2. You need to worry about table aliases. Since you’re now dealing with multiple tables you need to worry about ambiguity.

The first problem is the more serious and, in my opinion, will kill the current patch, and possibly any realistic attempt to fixing this feature.

Lets cover the first point now. The prefetch is a good example of how we want to get a list of orders, and we want the delivery charge items loaded at the same time so that we don’t have to make subsequent database calls. With the where clause being generated a query looks something like this,

SELECT,,,, delivery_charge.order_id,, delivery_charge.type, delivery_charge.quantity, delivery_charge.price 
  FROM orders me 
  LEFT JOIN order_items delivery_charge 
    ON delivery_charge.order_id = 
WHERE type = 'delivery'

Now consider a query for orders placed for today. When we make that query we expect to see all the orders placed today. If however we have some orders with delivery charges, and some that have none because they were collected in store, guess what happens when that where clause kicks in. It ends up filtering out the orders without any delivery charges. This is an SQL 101 type of problem that you generally solve with a left join and the clause promoted into the join condition. In fact the query generated already half does that, it does just that with the id link, so that the missing items shouldn’t prevent the main thing from being found. It’s just the where clause is in the wrong place. With the join ‘fixed’ you end up being surprised by your results when you join on one a relationship with a where clause.

The second problem will cause things to fail more visibly. Lets consider if we add a type field to our Order class now. A perfectly logical thing, we could add types like ‘store’, ‘internet’, ‘mail-order’. Then the query generated blows up again. You could hard code in a table alias into the where clause but that will fail when you have complex queries and you haven’t predicted the generated table alias correctly. DBIC doesn’t know what to do to automatically fix the reference and there isn’t any special syntax to do so.

I think the first problem is the more serious, and is the reason the feature should be left as it is. The second I think will ensure that it is because it’s likely to visibly blow up legacy code. It’s the legacy code that really makes fixing this awkward. As ribasushi made clear on the day, it would be fixable in a reasonably clear way by making the join condition a sub, but the existing code as it stands will fail if we add it to the join condition. It would be possible to add new semantics that enable newer style behaviour and fall back to the current behaviour when the old style declaration is used, but is it worth it? The where keyname seems like a good idea, except is it really? Assuming you don’t want problem 1 – the overly restricted search results – it won’t actually be a where clause in a join, so ‘where’ doesn’t make so much sense. If we wanted syntactic sugar to make this feature easy to use and just work we’d be better off coming up with a new key and semantics. The ‘where’ feature I would rather leave as a footnote in the documentation as something to avoid in newer code.

The sane way to achieve the same result these days is with the ‘Custom join conditions’ documented in DBIx::Class:Relationship::Base. They allow you to produce arbitrarily complex conditions. Because it takes a sub it becomes easy to deal with the ambiguity issue clearly and it also becomes possible to do far more complex things too.

has_many standard_products => '...::OrderItem',
            sub {         
                my $args = shift;
                return ( 
                        "$args->{self_alias}.id" => { -ident => "$args->{foreign_alias}.order_id" },
                        "$args->{foreign_alias}.type" => 'standard',                            
            }; # 'Custom join conditions' perldoc DBIx::Class:Relationship::Base

The downside of this compared to the where keyword is that it requires you to specify the whole condition, even the id join, rather than simply bolting on an extra condition. It’s great for allowing you to do anything, but the simple case is a little cumbersome.

See my previous blog post for an example of altering the join at runtime.

* apologies to the person whose baby I am calling ugly.

Testing out uninstall to fix a perl modules dependency issue

Just recently after an upgrade to some CPAN modules I started getting this crash on one of my machines when the Catalyst::View::JSON was loaded.

#     Error:  Couldn't instantiate component "TestApp::View::JSON", "Recursive inheritance detected in package 'Types::Serialiser::BooleanBase' at (eval 1547) line 76."Compilation failed in require at (eval 4) line 2.

The actual source of that error appears to be JSON::XS rather than Types::Serialiser::BooleanBase or TestApp::View::JSON.

I didn’t investigate the error properly, or really fix it properly. Instead I tested out one of the newer features of cpanm, the -U uninstall flag. I simply uninstalled JSON::XS and hey presto, no more crash.

cpanm -U JSON::XS
JSON::XS contains the following files:


Are you sure you want to uninstall JSON::XS? [y] y

That probably warrants some explanation. The new Catalyst::Runtime now appears to pull in the new alternative to JSON::XS, Cpanel::JSON::XS so this can now be used instead, and so things just worked. It’s probably a bit drastic a solution for most systems at the moment, I’m sure it will demonstrate any places where I have direct dependencies on JSON::XS. On my development box that should be handy however. I’d rather be using a single library for that single purpose.

DBIx::Class and Postgres tweaks at startup

After connection you can do simple commands thanks to the on_connect_do connection setting. One thing I sometimes do is turn down the whining. Postgres can be quite noisy when you’re creating tables,

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "blah_pkey" for table "blah"
NOTICE:  CREATE TABLE will create implicit sequence "blah_id_seq" for serial column ""

So I sometimes SET client_min_messages=WARNING at connection. That makes deploying a schema a lot less verbal.

If you want to deploy to a different schema within a database you can also do a ‘SET search_path TO’ statement at that point too. That can be quite handy if you want to deploy the same tables again to an alternative schema within a database.

perl -I lib -MMyModule::Schema::DB -e "MyModule::Schema::DB->connect('dbi:Pg:dbname=database;host=postgres', 'username', 'password', { on_connect_do => 'SET search_path TO temp' })->deploy"

If you’re setting up a Catalyst config for DBIC you can set the connection options like this,

    connect_info dbi:Pg:dbname=database
    connect_info username
    connect_info password
        on_connect_do  [ SET client_min_messages=WARNING ]
        quote_names 1