My infrequently used laptop appears to have developed some issues with the networking on new lxc containers (Ubuntu 13.10) so I finally started to dig into how it’s setup. I first blew away the caches in /var/cache/lxc
, then tried installing a fresh box with a completely stock template. The network interface didn’t acquire one of the usual 10.0.3.0 network ip addresses. A look in the logs suggested the dhcp server was receiving a DHCP request, and it was sending back a lease, but the client wasn’t picking that up.
tail /var/log/syslog
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 10.0.3.231 00:22:3e:ea:aa:fa
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 10.0.3.231 00:22:3e:ea:aa:fa
dnsmasq-dhcp[1261]: DHCPDISCOVER(lxcbr0) 00:22:3e:ea:aa:fa
dnsmasq-dhcp[1261]: DHCPOFFER(lxcbr0) 10.0.3.231 00:22:3e:ea:aa:fa
A google suggested my problem was the UDP checksums – bug 1204069. A look using tcpdump suggested that indeed my checksums were corrupt; note the ‘bad udp cksum’ on the second packet.
sudo tcpdump -vvv -i lxcbr0
21:36:34.009788 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 00:22:3e:ea:aa:fa (oui Unknown), length 300, xid 0xb407cc75, secs 49, Flags [none] (0x0000)
Client-Ethernet-Address 00:22:3e:ea:aa:fa (oui Unknown)
Vendor-rfc1048 Extensions
DHCP-Message Option 53, length 1: Discover
Parameter-Request Option 55, length 13:
Subnet-Mask, BR, Time-Zone, Default-Gateway
Domain-Name, Domain-Name-Server, Option 119, Hostname
Netbios-Name-Server, Netbios-Scope, MTU, Classless-Static-Route
NTP
END Option 255, length 0
PAD Option 0, length 0, occurs 41
21:36:34.010027 IP (tos 0xc0, ttl 64, id 44699, offset 0, flags [none], proto UDP (17), length 328)
10.0.3.1.bootps > 10.0.3.231.bootpc: [bad udp cksum 0x1c2d -> 0x1a4c!] BOOTP/DHCP, Reply, length 300, xid 0xb407cc75, secs 49, Flags [none] (0x0000)
Your-IP 10.0.3.231
Server-IP 10.0.3.1
Client-Ethernet-Address 00:22:3e:ea:aa:fa (oui Unknown)
Vendor-rfc1048 Extensions
DHCP-Message Option 53, length 1: Offer
Server-ID Option 54, length 4: 10.0.3.1
Lease-Time Option 51, length 4: 3600
RN Option 58, length 4: 1800
RB Option 59, length 4: 3150
Subnet-Mask Option 1, length 4: 255.255.255.0
BR Option 28, length 4: 10.0.3.255
Default-Gateway Option 3, length 4: 10.0.3.1
Domain-Name-Server Option 6, length 4: 10.0.3.1
END Option 255, length 0
PAD Option 0, length 0, occurs 8
The bug suggests that that should no longer be a problem, so I carried on googling. Turning my search criteria to DHCP udp problems. That came up with bug 930962 which gave a potential work around and suggested that it should be fixed. Since it said there should already be a firewall fix in the config I decided to take a look at the configuration. To figure out where that was I used dpkg -L
dpkg -L lxc
…
/etc/init/lxc-net.conf
…
After a little digging I found the network configuration in /etc/init/lxc-net.conf
where the setup of all the networking out of the box that I so like about ubuntu was all there. There was no mangle line as suggested in the bug report. At that point I checked the version I was running, and the version on the bug report and realised that was a pretty recent bug fix, and I simply don’t have it. Rather than figure out where the fixed package is, I figured I may as well just try to fix it myself for now since I’ve come so far.
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport bootpc -j CHECKSUM --checksum-fill
After that the networking came up, or more specifically the server obtained an ip address and I was able to ssh to it. I added pretty much that line into my /etc/init.lxc-net.conf
and it now works out of the box again and I have a better understanding of how lxc sets up it’s network. I’ve also finally solved a case of checksum offload issues for myself rather than just hearing about them. Possibly in the strangest way, i.e. not turning off the network offload of the checksums.
Of course the adventure didn’t end there. I then found that salt wasn’t working. When I looked for salt keys I didn’t find any requests from new servers. A tcpdump showed a similar checksum issue with the salt traffic. A quick hack with iptables again solved that problem.
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport 4505 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --sport 4505 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --dport 4506 -j CHECKSUM --checksum-fill
sudo iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp --sport 4506 -j CHECKSUM --checksum-fill