Announcing VclFiddle for Varnish Cache

As part of my new job with Squixa I have been working with Varnish Cache everyday. Varnish, together with its very capable Varnish Configuration Language (VCL), is a great piece of software for getting the best experience for websites that weren’t necessarily built with cache-ability or high-volume traffic in mind.

At the same time though, getting the VCL just right to achieve the desired caching outcome for particular resources can be an exercise in reliably reproducing the expected requests and careful analysis of the varnish logs. It isn’t always possible to find an environment where this can be done with minimal distraction and impact on others.

At a company retreat in October my colleagues and I were discussing this scenario and one of us pointed out how JSFiddle provides a great experience for dealing with similar concerns albeit in the space of client-side JavaScript. I subsequently came to the conclusion that it should be possible build a similar tool for Varnish, so I did and you can use it now at www.vclfiddle.net and it is open-sourced on GitHub too.

VclFiddle enables you to specify a set of Varnish Configuration Language statements (including defining the backend origin server), and a set of HTTP requests and have them executed in a new, isolated Varnish Cache instance. In return you get the raw varnishlog output (including tracing) and all the response headers for each request, including a quick summary of which requests resulted in a cache hit or miss.

Each time a Fiddle is executed, a new Fiddle-specific URL is produced and displayed in the browser address bar and this URL can then be shared with anyone. So, much like JSFiddle, you can use VclFiddle to reproduce a difficult problem you might be having with Varnish and then post the Fiddle URL to your colleagues, or to Twitter, or to an online forum to seek assistance. Or you could share a Fiddle URL to demonstrate some cool behaviour you’ve achieved with Varnish.

VclFiddle is built with Sails.js (a Node.js MVC framework) and Docker. It is the power of Docker that makes it fast for the tool to spawn as many instances and versions of Varnish as needed for each Fiddle to execute and easy for people to add support for different Varnish versions. For example, it takes an average of 709 milliseconds to execute a Fiddle and it took my colleague Glenn less than an hour to add a new Docker image to provide Varnish 2.1 support.

The README in the VclFiddle repository has much more detail on how it works and how to use it. There is also a video demo, and a few example walk-throughs on the left-hand pane of the VclFiddle site. I hope that, if you’re a Varnish user you’ll find VclFiddle useful and it will become a regular tool in your belt. If you’re not familiar with Varnish Cache, perhaps VclFiddle will provide a good introduction to its capabilities so you can adopt it to optimize your web application. In any case, your feedback is welcome by contacting me, the @vclfiddle Twitter account, or via GitHub issues.

Command line parsing in Windows and Linux

I have been working almost completely on the Linux platform for the last six months as part of my new job. While so much is new and different from the Windows view of the world, there is also a significant amount that is the same, not surprisingly given the hardware underneath is common to both.

Just recently, while working on a new open source project, I discovered a particular nuance in a behavioural difference at the core of the two platforms. This difference is in how a new process is started.

When one process wants to launch another process, no matter which language you’re developing with, ultimately this task is performed by an operating system API. On Windows it is CreateProcess in kernel32.dll and on Linux it is execve (and friends), typically combined with fork.

The Windows API call expects a single string parameter containing all the command-line arguments to pass to the new process, however the Linux API call expects a parameter with an array of strings containing one command-line argument in each element. The key difference here is in where the responsibility lies for tokenising a string of arguments into the array ultimately consumed in the new process’ entry point, commonly the “argv” array in the “main” function found in some form in almost every language.

On Windows it is the new process, or callee, that needs to tokenise the arguments but the standard C library will normally handle that, and for other scenarios the OS provides CommandLineToArgvW in shell32.dll to do the same thing.

On Linux though it is the original process, or caller, that needs to tokenise the arguments first. Often in Linux it is the interactive shell (eg bash, ksh, zsh) that has its own semantics for handling quoting of arguments, variable expansion, and other features when tokenising a command-line into individual arguments. However, at least from my research, if you are developing a program on Linux which accepts a command-line from some user input, or is parsing an audit log, there is no OS function to help with tokenisation – you need to write it yourself.

Obviously, the Linux model allows greater choice in the kinds of advanced command-line interpretation features a shell can offer whereas Windows provides a fixed but consistent model to rely upon. This trade-off embodies the fundamental mindset differences between the two platforms, at least that is how it seems from my relatively limited experience.

PowerShell starts to blur the lines somewhat on the Windows platform as it has its own parsing semantics yet again but this applies mostly to calling Cmdlets which have a very different contract from the single entry point of processes. PowerShell also provides a Parser API for use in your own code.

New Job, New Platform

After about five and a half years I have resigned from my job with Readify. I have had a great time working for Readify as a software developer, a consultant, an ALM specialist, and an infrastructure coder. Had a new opportunity not presented itself I could have easily continued working for Readify for years to come. The decision to leave was definitely not easy.

Over the last 16 years working as an IT professional I’ve had the opportunity to gain experience with almost all aspects of software development, system administration, networking, and security but all of it on the Microsoft platform. I did do some work with PERL and PHP on Apache and MySQL back in the late 90s (like everyone did I’m sure) but I haven’t spent any quality time with Linux or Mac OS X since.

Starting on June 10th this year (2014) I will begin a new job with Squixa. Squixa provide a set of services for improving the end-user performance of existing web sites and exposing analytics to the web site’s owners. Squixa’s implementation currently involves very few Microsoft technologies, if any. Subsequently my future includes the exciting experience of learning a new set of operating systems, development languages, web servers, database systems, build tools, and so on.

I still have a passion for PowerShell and I feel that the direction Microsoft is heading with Azure, Visual Studio Online, and Project K is exciting and promises to become a much better platform than it is today so I will continue to stay informed of new developments. However, aside from small hobby projects, most of my time, effort, and daily challenges will come from the *nix world and future blog posts will likely reflect this.

Queue a Team Build from another and pass parameters

I have previously blogged about queuing a new Team Build at the successful completion of another Team Build for Team Foundation Server 2010. Since then I’ve had a few people ask how to queue a new Team Build and pass information into the new Team Build via the build process parameters. Recently I’ve needed to implement this exact behaviour for a client, and with TFS 2013 which has quite different default build process templates, so I thought I’d share it here.

In my situation I’m building on top the default TfvcTemplate.12.xaml process but the same approach can be easily applied to the Git build templates too. To begin, I have added two build process parameters to the template:

  1. Chained Build Definition Names – this is an optional array of strings which refer to the list of Build Definitions that should be queued upon successful completion of the current build. All the builds will be queued immediately and will execute as the controller and agents are available. The current build does not wait for the completion of the builds it queues. My simple implementation only supports queuing builds within the same Team Project.
  2. Source BuildUri – this is a single, optional, string which will accept the unique Team Build identifier of the previous build that queued it – this is not intended to be specified by a human but could be. When empty, it is ignored. However, when provided by a preceding build, this URI will be used to retrieve the Build Number and Drop Location of that preceding build and these values, plus the URI, will be made available to the projects and scripts executed within the new build. Following the new Team Build 2013 convention, these values are passed as environment variables named:
    • TF_BUILD_SOURCEBUILDURI
    • TF_BUILD_SOURCEBUILDNUMBER
    • TF_BUILD_SOURCEDROPLOCATION

The assumption is that a build definition based on my “chaining” template will only queue other builds based on the same template, or another template which also accepts a SourceBuildUri parameter. This also means that builds can be chained to any depth, each passing the BuildUri of itself to the next build in the chain.

The projects and scripts can use the TF_BUILD_SOURCEDROPLOCATION variable to access the output of the previous build – naturally UNC file share drops are easier to consume than drops into TFS itself. Also the TF_BUILD_SOURCEBUILDURI means that the TFS API can be used to query every aspect of the preceding build, notably including the Information Nodes.

Prior to TFS 2012, queuing a new build from the workflow and passing parameters would have required a custom activity. However, in Team Build 2012 and 2013, Windows Workflow 4.0 is used which includes a new InvokeMethod activity making it possible to add items to the Process Parameters dictionary directly from the XAML.

The final XAML for the Build Process Template with support for queuing and passing parameters is available as a Gist. If you’d like to be able to integrate the same functionality with your own Team Build 2013 template you can see the four discrete edits I made to the default TfvcTemplate.12.xaml file from TFS 2013 in the Gist revisions.

When a build using this chaining template queues another build it explicitly sets the RequestedFor property to the same value as the current build so that the chain of builds will show in the My Builds view of the user who triggered the first build.

In my current implementation, the SourceBuildUri passed to each queued build is the URI of the immediately preceding build, but it some cases it may be more appropriate to propagate the BuildUri of the original build that triggered the entire chain. This would be a somewhat trivial change to the workflow for whomever needs this behaviour instead.

Effectively comparing Team Build Process Templates

I always prefer implementing .NET build customizations through MSBuild and I avoid modifying the Windows Workflow XAML files used by Team Build. However, some customizations are best implemented in the Team Build process, like chaining builds to execute in succession and pass information between them. As a consultant specializing in automated build an deployment I also spend a lot of time understanding Workflow customizations implemented by others.

For me the easiest way to understand the customizations implemented in a particular Team Build XAML file is to use a file differencing tool to compare the current workflow to a previous version of the workflow, or even to compare it to the default Team Build template it was based on. Unfortunately, the Windows Workflow designer in Visual Studio litters the XAML file with a lot of view state, obscuring the intended changes to the build process amongst irrelevant designer-implementation concerns.

To address this problem, I wrote a PowerShell script (available as a GitHub Gist) which removes all the elements and attributes from the XAML file which are known to be unimportant to the process it describes. Conveniently, the XAML file itself lists the set of XML namespace prefixes that can be safely removed in an mc:Ignorable attribute on the root document element.

Typically I use my XAML cleaning PowerShell script before each check-in to ensure the source control history stays clean but I have also used it on existing XAML files created by others to canonicalize them before opening them in a diff tool.

Using the script is as simple as:

.\Remove-IgnoreableXaml.ps1 -Path YourBuildTemplate.xaml

 

Or, if you don’t want to overwrite the file in place, specify an alternate destination:

.\Remove-IgnoreableXaml.ps1 -Path YourBuildTemplate.xaml -Destination YourCleanBuildTemplate.xaml

 

PowerShell Select-Xml versus Get-Content

In PowerShell, one of the most common examples you will see for parsing an XML file into a variable uses the Get-Content cmdlet and the cast operator, like this:

$Document = [xml](Get-Content -Path myfile.xml)

The resulting type of the $Document variable is an instance of System.Xml.XmlDocument. However, there is another approach to get the same, or better, result using the Select-Xml cmdlet:

$Document = ( Select-Xml -Path myfile.xml -XPath / ).Node

Sure, using the second variant is slightly longer, but with an important benefit over the first, and it’s not performance related.

In the first example, the file is first read into an array of strings and then cast. The casting operation (implemented by System.Management.Automation.LanguagePrimitives.ConvertToXml) is using an XmlReaderSettings instance with the IgnoreWhitespace property set to true and an XmlDocument instance with the PreserveWhitespace property set to false.

In the second example, the file is read directly into an XmlDocument (implemented by System.Management.Automation.InternalDeserializer.LoadUnsafeXmlDocument) using an XmlReaderSettings instance with the IgnoreWhitespace property set to false and an XmlDocument instance with the PreserveWhitespace property set to true – the opposite values of the first example.

The Select-Xml approach won’t completely preserve all the original formatting from the source file but it preserves much more than the Get-Content approach will and I’ve found this extremely useful when bulk updating version controlled XML files with a PowerShell script and wanting the resulting file diff to show the intended change and not be obscured by formatting changes.

You could construct the XmlDocument and XmlReaderSettings directly in PowerShell but not in so few characters. You can also load the System.Xml.Linq assembly and use the XDocument class which appears to give slightly better formatting consistency again but it’s still not perfect and PowerShell doesn’t provide the same quick access to elements and attributes as properties on the object.

Override the TFS Team Build OutDir property in TFS 2013

I’ve blogged twice before about the OutDir MSBuild property set by Team Build and I’ve recently discovered that with the default build process templates included with Team Foundation Server 2013, the passing of the OutDir can be disabled via a simple Team Build process parameter.

The parameter I am referring to is the “Output location”:

outputlocation

This parameter’s default value, “SingleFolder”, gives the traditional Team Build behaviour – the OutDir property will be specified on the MSBuild command-line and, unless you’ve made other changes, all build outputs will be dropped into this single folder.

Another value this parameter accepts is “PerProject” but this name can be slightly misleading. The OutDir property will still be specified on the MSBuild command-line but Team Build will append a subfolder for each project that has been specified in the Build Definition. That is, you may choose to build SolutionA.sln and SolutionB.sln from a single Build Definition and the “PerProject” option will split these into “SolutionA” and “SolutionB” subfolders. It will not output to different subfolders for the projects contained within each solution – for this behaviour you should specify the GenerateProjectSpecificOutputFolder property as an MSBuild argument as I’ve blogged previously.

The value of the “Output location” that you’ve probably been looking for is “AsConfigured”. With this setting, Team Build will not pass the OutDir property to MSBuild at all and your projects will all build to their usual locations, just like they do in Visual Studio – presumably to a \bin\ folder under each project. With this setting, it is then your responsibility to configure a post-build target or script to copy the required files from their default build locations to the Team Build binaries share. For this purpose, Team Build provides a “TF_BUILD_BINARIESDIRECTORY” environment variable specifying the destination path to use. There are also some other environment variables populated by Team Build 2013 documented here.

At the end of the build process, Team Build will then copy the contents of the TF_BUILD_BINARIESDIRECTORY to either the UNC path drop folder, or to storage within the TFS Collection database itself as you’ve chosen via the Staging Location setting on the Build Defaults page.

However, before you rush away to use this new capability, consider that the MSBuild, or more accurately the set of Microsoft.*.targets files used by almost all projects, already contain a great quantity of logic for handling which files to copy to the build drop. For example, Web Application projects, will copy the contents of the \bin\ folder and all the other content files (eg css, javascript, and images) whilst excluding C# code files, and the project file. Instead of re-implementing this behaviour yourself, leverage what MSBuild already provides and use the existing hook points to adjust this behaviour when you need to alter it for your situation.

If you’re interested, you’ll find that this new “Output location” behaviour is now implemented in the new RunMSBuild workflow activity, specifically within its RunMSBuildInternal private nested activity.