PowerShell Select-Xml versus Get-Content

In PowerShell, one of the most common examples you will see for parsing an XML file into a variable uses the Get-Content cmdlet and the cast operator, like this:

$Document = [xml](Get-Content -Path myfile.xml)

The resulting type of the $Document variable is an instance of System.Xml.XmlDocument. However, there is another approach to get the same, or better, result using the Select-Xml cmdlet:

$Document = ( Select-Xml -Path myfile.xml -XPath / ).Node

Sure, using the second variant is slightly longer, but with an important benefit over the first, and it’s not performance related.

In the first example, the file is first read into an array of strings and then cast. The casting operation (implemented by System.Management.Automation.LanguagePrimitives.ConvertToXml) is using an XmlReaderSettings instance with the IgnoreWhitespace property set to true and an XmlDocument instance with the PreserveWhitespace property set to false.

In the second example, the file is read directly into an XmlDocument (implemented by System.Management.Automation.InternalDeserializer.LoadUnsafeXmlDocument) using an XmlReaderSettings instance with the IgnoreWhitespace property set to false and an XmlDocument instance with the PreserveWhitespace property set to true – the opposite values of the first example.

The Select-Xml approach won’t completely preserve all the original formatting from the source file but it preserves much more than the Get-Content approach will and I’ve found this extremely useful when bulk updating version controlled XML files with a PowerShell script and wanting the resulting file diff to show the intended change and not be obscured by formatting changes.

You could construct the XmlDocument and XmlReaderSettings directly in PowerShell but not in so few characters. You can also load the System.Xml.Linq assembly and use the XDocument class which appears to give slightly better formatting consistency again but it’s still not perfect and PowerShell doesn’t provide the same quick access to elements and attributes as properties on the object.