Getting started with XML in SQL Server – Part 1 – Parsing XML

A lot of people don’t like XML. It has a lot of overhead, it’s not easy to read for those unfamiliar with the structure and parsing it is a pain in the ass. Well, let’s see what SQL Server offers to load it anyways.


Let’s use this simple XML as an example:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
  <item id="1">
    <name>Item 1</name>
    <comment>This is test item one</comment>
  <item id="2">
    <name>Item 2</name>
    <comment>This is the second test item</comment>

This example represents the data of 2 items. Each item has 3 data elements and one attribute. It is important to know the difference between elements and attributes as they are parsed differently. To keep it simple this example does not use a namespace.

One thing you will notice quickly when dealing with XML is that you need to know the structure as well as the datatypes, you will not be able to do much without that knowledge.

The code below assigns our sample data to a variable with the datatype XML, a SELECT statement displays the assigned value of the variable:

/* declare variable with XML datatype */ 

/* assign sample XML to variable */ 
SET @XML = '<?xml version="1.0" encoding="UTF-8" standalone="no" ?> 
				<item id="1"> 
					<name>Item 1</name> 
					<comment>This is test item one</comment> 
				<item id="2"> 
					<name>Item 2</name> 
					<comment>This is the second test item</comment> 

/* display variable */ 


XML output columns usually are represented as hyperlinks in SSMS (XML columns can – just like VARBINARY – hold quite a lot of data, so think twice before including them in your queries directly). If the XML is not too big you can click on it and a new tab will pop up with the XML code inside:


XML displayed in SSMS


Notice something? SQL Server got rid of the XML declaration. Keep this in mind when dealing with XML files, sometimes SQL Server refuses to convert a string to XML because of the declaration. In those cases you will have to get rid of it. A simple statement using STUFF in combination with PATINDEX will do the trick:

/* trim XML declaration */
SELECT @XML = STUFF(@String,1, PATINDEX('%?>%', @String) + 1, '')


No how do we parse our variable? SQL Server offers different ways to parse XML in SQL server, see 2 different approaches below.


1. Using CROSS APPLY and the value() method:

/* parse @XML using CROSS APPLY */
SELECT cb.item_node.value('./@id'						, 'INT')				AS id
	 , cb.item_node.value('(./name/text())[1]'			, 'VARCHAR(200)')		AS name
	 , cb.item_node.value('(./comment/text())[1]'		, 'VARCHAR(200)')		AS comment
	 , cb.item_node.value('(./quantity/text())[1]'		, 'INT')				AS quantity
	 , XML_Data.value('(/items/item/@id)[1]'			, 'INT')				AS id_without_crossapply
	 , XML_Data.value('(/items/item/name/text())[1]'	, 'VARCHAR(200)')		AS name_without_crossapply
	 , ca.root_node.value('(./item/name/text())[1]'		, 'VARCHAR(200)')		AS name_with_one_crossapply
CROSS APPLY xx.XML_Data.nodes('/items') ca(root_node)
CROSS APPLY ca.root_node.nodes('./item') cb(item_node);

If you execute this statement and have a look at the result you will hopefully see why more than one CROSS APPLY is needed to actually parse the data. Value() requires to specify an ordinal number to ensure it will return a singleton, this is done in square brackets. [1] will return the first available value matching the expression used, [2] the second and so on. To flatten the output a CROSS APPLY for each child element with multiple elements will do the trick, if only the root node is referenced the query will always return the first value taken from the first match for all results (take a look at the output on the columns id_without_crossaply / name_without_crossapply / name_with_one_crossapply).

As mentioned earlier attributes and elements are treated differently. To retrieve the value of an attribute you will add the @ sign to the name of the attribute. Elements will require the exact path, their value is retrieved using text().


2. Using OPENXML:

/* parse @XML using sp_xml_preparedocument */
DECLARE @iDoc int;

EXEC sp_xml_preparedocument @iDoc OUTPUT, @XML;

SELECT id, name, comment, quantity
FROM OPENXML (@iDoc, '/items/item',1)
	  id			INT				'./@id',
	  name			VARCHAR(200)	'./name/text()',
	  comment		VARCHAR(200)	'./comment/text()',
	  quantity		INT				'./quantity/text()'

EXEC sp_xml_removedocument @iDoc;

This approach is quite different. The system stored procedure sp_xml_preparedocument is used to create an object, referenced by an integer that is stored in a variable. This integer is required when using OPENXML to reference the right rowset provider. To access the data a WITH clause is used to define the output columns, allowing the use of aliases or the given name of attributes and elements. If you chose to use the same name for attributes as used in the XML you do not need to specify the path to the attribute (‘./@id’ is not required in this case). Elements however require a path and text() is used once again to retrieve the value.

Note that you should always destroy the object using sp_xml_removedocument once the XML is no longer required to free system resources.

In this example we chose to directly use the /items/item path, which allows to flatten the XML directly. See how the result changes if you change the path to /items (adding /item to the path of each element in the WITH clause): only the first row will be displayed.


Which approach is better? I don’t know. Each XML will be different, these example will not work for every XML out there. But at least now you have the basic means to start tackling XML.


Getting started with XML in SQL Server

Part 2 – Import XML files into your SQL Server

Part 3 (not available yet) will use SSIS to achieve the same results as in Part 1 & 2

Part 4 (not available yet) will show you how to use XQuery and create XML files out of SQL Server

Leave a Reply

Your email address will not be published. Required fields are marked *