LXC Networking

My infrequently used laptop appears to have developed some issues with the networking on new lxc containers (Ubuntu 13.10) so I finally started to dig into how it’s setup. I first blew away the caches in /var/cache/lxc, then tried installing a fresh box with a completely stock template. The network interface didn’t acquire one of the usual 10.0.3.0 network ip addresses. A look in the logs suggested the dhcp server was receiving a DHCP request, and it was sending back a lease, but the client wasn’t picking that up.

tail /var/log/syslog
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 10.0.3.231 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 10.0.3.231 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa 
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 10.0.3.231 00:22:3e:ea:aa:fa 

A google suggested my problem was the UDP checksums – bug 1204069. A look using tcpdump suggested that indeed my checksums were corrupt; note the ‘bad udp cksum’ on the second packet.

sudo tcpdump -vvv -i lxcbr0
21:36:34.009788 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 00:22:3e:ea:aa:fa (oui Unknown), length 300, xid 0xb407cc75, secs 49, Flags [none] (0x0000)
	  Client-Ethernet-Address 00:22:3e:ea:aa:fa (oui Unknown)
	  Vendor-rfc1048 Extensions
	    DHCP-Message Option 53, length 1: Discover
	    Parameter-Request Option 55, length 13: 
	      Subnet-Mask, BR, Time-Zone, Default-Gateway
	      Domain-Name, Domain-Name-Server, Option 119, Hostname
	      Netbios-Name-Server, Netbios-Scope, MTU, Classless-Static-Route
	      NTP
	    END Option 255, length 0
	    PAD Option 0, length 0, occurs 41
21:36:34.010027 IP (tos 0xc0, ttl 64, id 44699, offset 0, flags [none], proto UDP (17), length 328)
    10.0.3.1.bootps > 10.0.3.231.bootpc: [bad udp cksum 0x1c2d -> 0x1a4c!] BOOTP/DHCP, Reply, length 300, xid 0xb407cc75, secs 49, Flags [none] (0x0000)
	  Your-IP 10.0.3.231
	  Server-IP 10.0.3.1
	  Client-Ethernet-Address 00:22:3e:ea:aa:fa (oui Unknown)
	  Vendor-rfc1048 Extensions
	    DHCP-Message Option 53, length 1: Offer
	    Server-ID Option 54, length 4: 10.0.3.1
	    Lease-Time Option 51, length 4: 3600
	    RN Option 58, length 4: 1800
	    RB Option 59, length 4: 3150
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    BR Option 28, length 4: 10.0.3.255
	    Default-Gateway Option 3, length 4: 10.0.3.1
	    Domain-Name-Server Option 6, length 4: 10.0.3.1
	    END Option 255, length 0
	    PAD Option 0, length 0, occurs 8

The bug suggests that that should no longer be a problem, so I carried on googling. Turning my search criteria to DHCP udp problems. That came up with bug 930962 which gave a potential work around and suggested that it should be fixed. Since it said there should already be a firewall fix in the config I decided to take a look at the configuration. To figure out where that was I used dpkg -L

dpkg -L lxc
…
/etc/init/lxc-net.conf
…

After a little digging I found the network configuration in /etc/init/lxc-net.conf where the setup of all the networking out of the box that I so like about ubuntu was all there. There was no mangle line as suggested in the bug report. At that point I checked the version I was running, and the version on the bug report and realised that was a pretty recent bug fix, and I simply don’t have it. Rather than figure out where the fixed package is, I figured I may as well just try to fix it myself for now since I’ve come so far.

sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport bootpc -j CHECKSUM --checksum-fill

After that the networking came up, or more specifically the server obtained an ip address and I was able to ssh to it. I added pretty much that line into my /etc/init.lxc-net.conf and it now works out of the box again and I have a better understanding of how lxc sets up it’s network. I’ve also finally solved a case of checksum offload issues for myself rather than just hearing about them. Possibly in the strangest way, i.e. not turning off the network offload of the checksums.

Of course the adventure didn’t end there. I then found that salt wasn’t working. When I looked for salt keys I didn’t find any requests from new servers. A tcpdump showed a similar checksum issue with the salt traffic. A quick hack with iptables again solved that problem.

sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport 4505 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --sport 4505 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport 4506 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --sport 4506 -j CHECKSUM --checksum-fill
Advertisements

Testing out uninstall to fix a perl modules dependency issue

Just recently after an upgrade to some CPAN modules I started getting this crash on one of my machines when the Catalyst::View::JSON was loaded.

#     Error:  Couldn't instantiate component "TestApp::View::JSON", "Recursive inheritance detected in package 'Types::Serialiser::BooleanBase' at (eval 1547) line 76."Compilation failed in require at (eval 4) line 2.

The actual source of that error appears to be JSON::XS rather than Types::Serialiser::BooleanBase or TestApp::View::JSON.

I didn’t investigate the error properly, or really fix it properly. Instead I tested out one of the newer features of cpanm, the -U uninstall flag. I simply uninstalled JSON::XS and hey presto, no more crash.

cpanm -U JSON::XS
JSON::XS contains the following files:

  /home/colin/perl5/perlbrew/perls/perl-5.14.4/bin/json_xs
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/JSON/XS.pm
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/JSON/XS/Boolean.pm
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/auto/JSON/XS/XS.bs
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/lib/site_perl/5.14.4/x86_64-linux/auto/JSON/XS/XS.so
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/man/man1/json_xs.1
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/man/man3/JSON::XS.3
  /home/colin/perl5/perlbrew/perls/perl-5.14.4/man/man3/JSON::XS::Boolean.3

Are you sure you want to uninstall JSON::XS? [y] y

That probably warrants some explanation. The new Catalyst::Runtime now appears to pull in the new alternative to JSON::XS, Cpanel::JSON::XS so this can now be used instead, and so things just worked. It’s probably a bit drastic a solution for most systems at the moment, I’m sure it will demonstrate any places where I have direct dependencies on JSON::XS. On my development box that should be handy however. I’d rather be using a single library for that single purpose.