Category: Uncategorized

Lessons from DigitalOcean Networking

If you’ve come from running virtual machines on AWS, Azure, or Google Cloud, you will be familiar with the idea that the VMs can have a public Internet-facing IP address and a private IP address, or some combination or multiple of the two options.

DigitalOcean offers something similar, but just different enough to throw you when you’re accustomed to the networking models of the other cloud providers. When you create a DigitalOcean Droplet via their Control Panel, or via their API, you have the option to enable “Private networking” but when you read the official documentation, this feature is actually called “Shared private networking” and it is a very important distinction.

Where private networking in AWS, Azure, or Google Cloud gives your VM a private interface to a network shared only with your VMs, the shared private networking in DigitalOcean is, according to this DigitalOcean tutorial, “accessible to other VPSs in the same datacenter–which includes the VPSs of other customers in the same datacenter”. And I have verified that statement is true.

To clarify, if you enable private networking on a DigitalOcean VM in their SFO2 region, every other VM in the SFO2 region from every other DigitalOcean customer can route packets to your VM’s private network interface. While I advocate the use of strict firewall configurations in any cloud hosting environment, the importance of doing so correctly is much higher on DigitalOcean, even for non-Production environments where firewalls have a history of being more relaxed.

The bright side of all this is that DigitalOcean’s tag-based Cloud Firewall applies to both the public and private network interfaces and implements a deny-by-default behaviour. By using tags to restrict which other droplets are permitted to communicate on specific ports and protocols you can achieve a very similar level of isolation as offered by other cloud providers.

There is another caveat though: to improve the security of this shared private networking environment, DigitalOcean do not allow VMs to send packets with a source IP address that does not match their assigned private IP address. This prevents you, for example, from operating one DigitalOcean VM as a Virtual Private Network gateway for your other DigitalOcean VMs to connect through to another non-DigitalOcean private network.

In summary, while DigitalOcean is providing a great service, and adding new features seemingly every quarter, it offers a conceptual model slightly out of sync with the big name cloud companies, and you need to by mindful of this, but the same would be true I guess for people experienced with DigitalOcean moving to AWS or Azure.

 

Finding deleted code in git

Recently, Matt Hilton blogged about Source Control Antipatterns which included the practice of commenting code instead of deleting the code.

As wholeheartedly as I agree with deleting code, I know that a popular objection is that deleted code is harder to find. While it might be harder than your favourite editor’s Find In Files feature, it is important to know how to use the tools central to your development workflow.

For my work, and seemingly the majority of projects today, git is the version control tool of choice. So I’m sharing some git commands here that I have found useful for locating deleted code. I’m using the Varnish Cache repository for my examples if you want to try them yourself.

If you know some text from the code that was deleted, you can find the commit where it was deleted. In this example I’m looking for when the C structure named smu was deleted.

$ git log -G "struct +smu" --oneline

766dee0 Drop long broken umem code

If you know the name of a file that was deleted, but aren’t sure which directory the file was in, you can find the commit when the file was deleted with:

$ git log --oneline -- **/storage_umem.c

766dee0 Drop long broken umem code
b07c34f include cleanup - found by FlexeLint
75615a6 When I grow up, I want to learn to program in C

 

If you’re not even sure what the deleted file was named but just want to see recent commits with deleted files, you can use:

$ git log --diff-filter=D --summary

commit f4faa6e3c431d6ccf581f5683af56008e4d4be10
Author: Federico G. Schwindt <fgsch@lodoss.net>
Date: Fri Mar 10 18:59:14 2017 +0000

Fold r00936.vtc into vcc_action.c tests

delete mode 100644 bin/varnishtest/tests/r00936.vtc

There is a lot more you can do with git log than just find deleted code but hopefully these examples are a useful start.

Beware Docker and sysctl defaults on GCE

On Google Compute Engine (GCE) the latest VM boot images (at the time of writing) for Ubuntu 14.04 and 16.04 (eg ubuntu-1604-xenial-v20170811) ship with a file at /etc/sysctl.d/99-gce.conf which contains:

net.ipv4.ip_forward = 0

This kernel parameter determines whether packets can be forwarded between network interfaces. On its own, the presence of this line isn’t a big deal.

Separately, when you start the Docker daemon (at least in version 17.06.0-ce), it sets this kernel parameter to 1 (assuming you haven’t specified --ip-forward=false in the Docker configuration). Docker needs packet forwarding enabled so that Docker containers using the default bridge network can communicate outside the host.

If you later execute sysctl --system or similar after has Docker has started, for example to apply a new value for the nf_conntrack_max kernel parameter that you’ve specified in another file under /etc/sysctl.d/, then the ip_forward parameter will revert to 0 care of GCE’s default conf file.

At this point you’ll find your containers cannot reach the outside world, for example this will fail to resolve:

docker run ubuntu:16.04 getent hosts google.com

This will remain broken for all existing or new containers until you set the ip_forward parameter back to 1 manually or by restarting the Docker daemon.

If you’re using any Docker version since v1.8 (released about 2 years ago) you should see the following message when running a container with bridge networking if IP forwarding is disabled:

WARNING: IPv4 forwarding is disabled. Networking will not work.

Of course, that only helps if you’re using docker run interactively and does not help if the parameter gets changed after the containers are already running.

If you’re in this situation, add your own file to /etc/sysctl.d/ that follows 99-gce.conf alphabetically (eg 99-luftballon.conf) and ensure it contains:

net.ipv4.ip_forward = 0

You may also want to ensure the file has a trailing LF character to avoid any issues with processing it.

You can check the current value of the ip_forward kernel parameter with one of these two commands:

sysctl net.ipv4.ip_forward
cat /proc/sys/net/ipv4/ip_forward

The case of the addled ARP

In recent weeks we started receiving alerts whenever a new AWS EC2 Instance running Ubuntu 14.04 LTS was launched for a specific Auto Scaling Group. On average, one new instance would be provisioned per day but the fault would only occur for about one or two of the new instances per week.

The alert was an indicator that the new instance was unable to communicate with the message broker located on another instance. However, after approximately 20 minutes the issue would self-resolve. Also, if we manually provisioned a new replacement instance, it would successfully communicate with the broker.

With the short window of failure and no consistent period between occurrences so this problem continued through several operations shifts and staff members before a plan was established to capture more details of the problem.

On the next alert we were able to investigate and establish several facts:

  1. The affected instance was unable to communicate due to a connection timeout. It was sending TCP SYN packets and receiving no reply.
  2. The message broker was receiving the TCP SYN packet from the affected instance and replying with a SYN+ACK packet but the MAC address on the reply packet did not match the MAC address on the incoming SYN packet.
  3. Running ip neigh show on the message broker instance reported that the IP address of the affected instance was associated with an unrelated MAC address and was in the STALE state but occasionally also in the REACHABLE state.
  4. The unrelated MAC address was not associated with any other instances running in the VPC nor any that had been recently terminated.

At this point we setup two monitors on the message broker instance while we waited for the problem to self-resolve. The first was a tcpdump to capture all ARP traffic and the second was a shell script to continuously poll and record the ARP table. The ARP traffic capture contained very little and nothing at all helpful but the ARP table records were very interesting.

While the affected instance was unable to connect to the message broker, the ARP table cycled through the states REACHABLE then STALE then DELAY​ and back to REACHABLE again, retaining the same incorrect MAC address association the whole time. The DELAY state never lasted as long as five seconds.

At the moment when the problem self-resolved, the DELAY state did last for five seconds and then transitioned to the PROBE state, then to the FAILED state and finally back to REACHABLE but this time with the correct MAC address.

This insight lead one of our team members to find this Red Hat bug describing a Linux kernel issue that aligned with exactly the behaviour we were experiencing. Unfortunately the fix for this bug wasn’t merged until Linux kernel 4.11 which was only released in May and reportedly won’t be officially available in Ubuntu until Artful Aardvark 17.10.

Our assessment of all the stale ARP entries on the message broker combined with the known scaling behaviours of the messaging clients suggested that some entries had been there for at least 8 weeks. So this wasn’t a by-product of replacing instances rapidly and recycling IP addresses in the subnet too quickly.

As an interim solution we have implemented a cron job to remove any stale entries from the message broker’s ARP table and this has prevented the alerts from re-appearing for several weeks now.

 

Always upgrade ixgbevf on AWS EC2

I recently had a frustrating experience with network connectivity for a set of AWS EC2 Instances running Ubuntu Trusty 14.04.

Three instances, running Graphite and Carbon Cache 0.9.15 would intermittently become unreachable on the network for seconds or minutes at a time and several times a day. There was no obvious pattern to when these events would occur and when they did there was no interesting change in their CPU utilisation, memory usage, or disk IO aside from the inevitable reduction in activity associated with a lack of data or queries coming from the network.

AWS reported the Graphite instances were failing their Instance Status Check. The external instances attempting to communicate with these Graphite machines just experienced TCP timeouts. When the Graphite instances themselves became network-reachable again, their system logs showed that processes had continued running as normal during the outage.

The first hint of a reason for this behaviour came from the Graphite instances’ syslog reporting No route to host during the outage while a cron job was attempting to connect to another instance on the same subnet in the same Availability Zone. This suggested something was wrong with either ARP, or the network interface, but there were no logs or kernel messages suggesting the network interface had gone down and EC2 resolves ARP at the Hypervisor.

I configured collectd to harvest all the network-related metrics possible on the Graphite instances themselves and I configured VPC Flow Logs to record details of all the network traffic in the subnet. After the next period of failed connectivity I discovered that Flow Logs showed that all packets were reaching the EC2 Network Interfaces of the Graphite instances but the instance’s collectd data showed no packets received, but no network errors either.

These Graphite instances were now running the AWS M4 Instance Type but they were not originally provisioned as such which lead me to investigate the Enhanced Networking features available to these instance types. I eventually found this suspicious paragraph in the AWS documentation:

In the above Ubuntu instance, the module is installed, but the version is 2.11.3-k, which does not have all of the latest bug fixes that the recommended version 2.14.2 does. In this case, the ixgbevf module would work, but a newer version can still be installed and loaded on the instance for the best experience.

Enabling Enhanced Networking with the Intel 82599 VF Interface on Linux Instances in a VPC

Our instances were running the 2.11.3-k version of the ixgbevf driver mentioned in the documentation, which is the older “would work” version but also the most recent version included with Ubuntu Trusty. Some further research into this network driver on AWS revealed some other discussions of similarly flakey network connectivity so I decided to upgrade the driver on one of the Graphite instances.

As per the same AWS documentation, the recommended version 2.14.2 does not build properly on some versions of Ubuntu, so I installed version 2.16.4, which required an OS restart. I monitored the upgraded instance for 24 hours and it remained healthy with no connectivity interruption for the whole period whilst the other two instances continued to fail intermittently so I upgraded the network driver on a second instance. After 72 hours of stable behaviour on the two upgraded instances, I upgraded the third and the problem is now completely resolved for those instances.

Expecting that these issues could easily recur on our other systems I wanted to ensure they were all using the newest driver, however due to the OS restart for the new network driver to load, adding the driver install steps to the provisioning script was undesirable. Experimentation revealed that rmmod and modprobe seem to allow the upgraded network driver to become active without an OS restart but I decided that baking a new AMI with the driver pre-installed was preferred.

I have also discovered that the version of the ixgbevf driver included with Ubuntu Xenial 16.04 is more recent that Trusty’s but still older than the version recommended by AWS so a custom AMI is still required.

I’ve shared my experience and findings with AWS Support and asked them to modify their documentation to more strongly recommend installing the newer driver.

One year in to the new world

My first anniversary of working with Squixa passed recently and I began to reflect on just how much working with an entirely unfamiliar technology stack has been different from working with the Microsoft platform, and how it has been different when compared to my initial expectations.

In the early weeks into the new job I began writing down the names of tools and technologies that I was learning each day but it quickly reached more than 50 long and I stopped updating it. Looking back at that list now, it has become a list of things I use every day, most have formed muscle memories, many I have read the source code for, and a number I have submitted patches to for bug-fixes or enhancements.

On the Microsoft platform I was a regular user and contributor to open-source projects on CodePlex and GitHub and more often than not I trusted .NET Reflector over documentation to better understand how some component should work. Over in *nix land though, source code is unavoidable, sadly sometimes as an alternative to documentation, but more often simply as the preferred distribution method. I’ve certainly read a lot more source code each week than I have previously, and in a wider variety of languages, and although it is sometimes tedious it has also taught me a lot. It’s not just developers that need a compiler installed but any user looking outside what their favourite *nix-flavour packages for them.

I was never particularly bothered by the lack of system package manager on Windows even though I’d heard its absence was oft-maligned by *nix folk. Having used a package manager in anger now, I can both appreciate just how much effort it saves when trying to automate machine provisioning but also found that there are many challenges with version pinning and when one’s chosen distribution does not stay current with new software releases. I’m sure the new Windows 10 PackageManagement (formerly OneGet) will be an awesome step forward for Microsoft in this space.

I wrote a lot of PowerShell before I changed jobs and I revelled in the language’s ability to work with objects and APIs. The typical shell in *nix lacks this but I’ve rarely had to deal with objects or APIs in my new job. Here Everything is a file and normally a plain-text file at that and so languages focused on text manipulation instead are ample. Configuration management systems end up spending most of their time overwriting files generated from templates instead of trying to interact with an API in some idempotent manner. Personally though, dealing with pattern matching and character- or field-offsets still feels too brittle and harder to re-comprehend later.

There are some popular applications in Linux doing some really awesome tricks. One favourite example is the nginx web server which can upgrade its binary, launch a new version of itself, hand-over existing connections and listening sockets and never drop a packet. It’s not that things like this are not achievable on the Windows platform, it’s just that for some unknown reason, nobody is doing it. While Microsoft is still fighting hard against a “just restart it” culture to avoid unnecessary down-time, Torvalds recently merged live kernel patching in Linux.

Ultimately though all the problems are the same across both platforms. You need to make sure you understand exactly what each application needs access to so you can constrain it to the least possible privileges – but not everyone does. You hit resource limits on process counts, file handles, network connections, etc but at different thresholds. You’re susceptible to the same failure conditions but they often have different failure modes, and rarely the one you would have preferred.

For every difference, there are double the similarities. The platforms have different driving principles guiding which solution to prefer for a given problem, but neither is necessarily better, simply idiomatic. At this point I’m expecting that I’ll continue to use whichever platform my current project requires without any favouritism and hopefully be switching back and forth enough to stay abreast of the latest developments on each.

Announcing VclFiddle for Varnish Cache

As part of my new job with Squixa I have been working with Varnish Cache everyday. Varnish, together with its very capable Varnish Configuration Language (VCL), is a great piece of software for getting the best experience for websites that weren’t necessarily built with cache-ability or high-volume traffic in mind.

At the same time though, getting the VCL just right to achieve the desired caching outcome for particular resources can be an exercise in reliably reproducing the expected requests and careful analysis of the varnish logs. It isn’t always possible to find an environment where this can be done with minimal distraction and impact on others.

At a company retreat in October my colleagues and I were discussing this scenario and one of us pointed out how JSFiddle provides a great experience for dealing with similar concerns albeit in the space of client-side JavaScript. I subsequently came to the conclusion that it should be possible build a similar tool for Varnish, so I did and you can use it now at www.vclfiddle.net and it is open-sourced on GitHub too.

VclFiddle enables you to specify a set of Varnish Configuration Language statements (including defining the backend origin server), and a set of HTTP requests and have them executed in a new, isolated Varnish Cache instance. In return you get the raw varnishlog output (including tracing) and all the response headers for each request, including a quick summary of which requests resulted in a cache hit or miss.

Each time a Fiddle is executed, a new Fiddle-specific URL is produced and displayed in the browser address bar and this URL can then be shared with anyone. So, much like JSFiddle, you can use VclFiddle to reproduce a difficult problem you might be having with Varnish and then post the Fiddle URL to your colleagues, or to Twitter, or to an online forum to seek assistance. Or you could share a Fiddle URL to demonstrate some cool behaviour you’ve achieved with Varnish.

VclFiddle is built with Sails.js (a Node.js MVC framework) and Docker. It is the power of Docker that makes it fast for the tool to spawn as many instances and versions of Varnish as needed for each Fiddle to execute and easy for people to add support for different Varnish versions. For example, it takes an average of 709 milliseconds to execute a Fiddle and it took my colleague Glenn less than an hour to add a new Docker image to provide Varnish 2.1 support.

The README in the VclFiddle repository has much more detail on how it works and how to use it. There is also a video demo, and a few example walk-throughs on the left-hand pane of the VclFiddle site. I hope that, if you’re a Varnish user you’ll find VclFiddle useful and it will become a regular tool in your belt. If you’re not familiar with Varnish Cache, perhaps VclFiddle will provide a good introduction to its capabilities so you can adopt it to optimize your web application. In any case, your feedback is welcome by contacting me, the @vclfiddle Twitter account, or via GitHub issues.