Identifying SSIS execution errors in SQL Server 2012+

First of, if you are not using the project deployment model in SSIS 2012+ this blog post is not for you – sorry. The techniques described below assume you are storing your packages in SSISDB. However, you don’t necessarily need to execute your packages through SQL Agent jobs.

Okay, let’s get right into it. After getting too frustrated with waiting for SSIS execution reports out of SSMS, which are not only pretty slow but also give you a lot of information you don’t really need, I figured there has to be a better way to retrieve relevant information directly. What better place to look than tapping directly into SSISDB, the system database which is created automatically when you create an Integration Services Catalog?

Even though this is far from a deep dive into the table structure of SSISDB, I’d like to give you a little more background information on the tables we care about for this script (note that most of the documentation refers to objects with the catalog schema (Microsoft Documentation), those are views that sit on top of the internal schema tables which we use here):

  • internal.executions: lots of helpful information in this table, in particular the [package_name], which account was used to execute the package ([executed_as_name]), additional information about the project the package is part of ([project_name]) and if an environment was used to schedule the package we get information about this as well ([environment_name])
  • internal.operations: this table will tell us when exactly the package was executed ([start_time] and [end_time])
  • internal.operation_messages: holds the error message ([message]) we came here for in the first place. This table holds a LOT of information, so we will have to limit our query to those lines that hold errors, we use the [message_type] column to identify what we want to bring back. It might also help to know the exact time the error occurred ([message_time])
  • internal.event_messages: tells us which step in the SSIS package failed, given you name each component in your package ([message_source_name]), otherwise you’ll see a lot of [Data Flow Task]’s and [Execute SQL Task]’s here
  • internal.execution_parameter_values: all package parameters are listed in this table ([parameter_name] & [parameter_value])
  • internal.event_message_context: last but not least the probably second most useful piece of information is right here, execution-time variable and connection string values ([context_source_name]/[package_path] & [property_value])

To tie it all together there are 2 key identifiers we have to use. Unfortunately Microsoft decided to use [execution_id] on some tables as the primary key and [operation_id] on others, just like you find [event_message_id] here and [operation_message_id] there. Confusing at times and doesn’t make it easy to get a grasp of the structure at first but I am sure they have good reasons for doing so…?!

Now lets have a look at the script (finally!) and I’ll walk you through it:

DECLARE @StartDate DATETIME, @Offset INT = -24;

SET @StartDate = DATEADD(HOUR, @Offset, GETDATE())

SELECT ex.execution_id
	 , ex.package_name
	 , CASE 
           WHEN (om.message_source_type = 10) THEN 'ISServerExec' 
           WHEN (om.message_source_type = 20) THEN 'Transact-SQL stored procedure'
           ELSE em.message_source_name
       END AS task_name
	 , CAST(om.message_time AS DATETIME2(0)) AS ssis_run_datetime
	 , CAST(DATEADD(SECOND, DATEDIFF(SECOND, o.start_time, o.end_time), 0) AS TIME(0)) AS ssis_run_duration
	 , CAST(om.message AS XML) AS ssis_error_message
	 , CAST(ob.ssis_properties AS XML) ssis_properties
	 , CAST(oa.ssis_parameters AS XML) ssis_parameters
	 , ex.executed_as_name
	 , ex.project_name
	 , ex.environment_name
FROM SSISDB.internal.event_messages em WITH (NOLOCK)
INNER JOIN SSISDB.internal.operation_messages om WITH (NOLOCK)
	ON em.event_message_id = om.operation_message_id
INNER JOIN SSISDB.internal.operations o WITH (NOLOCK) 
	ON em.operation_id = o.operation_id
INNER JOIN SSISDB.internal.executions ex WITH (NOLOCK)
	ON o.operation_id = ex.execution_id
OUTER APPLY (SELECT parameter_name AS [name]
			 	  , parameter_value AS [value]
			 FROM ssisdb.internal.execution_parameter_values WITH (NOLOCK)
			 WHERE execution_id = ex.execution_id
			 	 AND object_type < 50
			 	 AND parameter_name NOT LIKE 'CM.%'
			 	 AND sensitive = 0
			 FOR XML RAW('parameter'), ROOT('parameters')) oa(ssis_parameters)
OUTER APPLY (SELECT CASE context_type WHEN 70 THEN context_source_name ELSE REPLACE(STUFF(package_path,1,CHARINDEX('[',package_path) ,''),']','') END [name]
			 	  , CASE context_type WHEN 70 THEN 'Variable' ELSE 'Connection' END [type]
			 	  , property_value [value]
			 FROM ssisdb.internal.event_message_context WITH (NOLOCK)
			 WHERE operation_id = ex.execution_id
				 AND event_message_id = em.event_message_id
			 	 AND ((context_type = 70 AND context_depth = 1)
					OR (context_type = 80 AND property_name = 'ConnectionString'))
			 ORDER BY 2,1
			 FOR XML RAW('property'), ROOT('properties')) ob(ssis_properties)
WHERE CAST(om.message_time AS DATETIME) >= @StartDate
	AND om.message_type = 120
GROUP BY ex.execution_id
		, ex.project_name
		, ex.package_name
		, ex.environment_name
		, ex.executed_as_name
		, om.message_source_type
		, em.message_source_name
		, CAST(om.message_time AS DATETIME2(0))
		, CAST(DATEADD(SECOND, DATEDIFF(SECOND, o.start_time, o.end_time), 0) AS TIME(0))
		, om.message
		, oa.ssis_parameters
		, ob.ssis_properties 
ORDER BY CAST(om.message_time AS DATETIME2(0)) DESC
OPTION(RECOMPILE);

Since you might want to limit your output to a specific time window (I usually only care about the last ~3 hours) we have the variable @Offset to allow us to define how many hours you want to go back in time. The script itself is pretty straight forward, a few INNER JOINs to link all the tables together and two OUTER APPLYs to generate XML outputs which come in handy when you try to identify if the error is caused by a wrong connection string, incorrect package parameter or variable. The WHERE clause limits the output to anything within the specified time window based on [message_time] and [message_type] = 120 ensures that we only return errors. I like to see most recent errors on top, but feel free to remove the ORDER BY in case you don’t care. Just go ahead and remove it!

If you check the code within the OUTER APPLYs you might notice that I limit the output somewhat, for example I am not returning any sensitive package parameters as those will be scrambled anyways and won’t help you figuring out anything. Once again, feel free to add or remove WHERE clauses so it works best for you!

Since I am lazy I chose to cast the error message as XML so you can quickly click on it and see it in a new window for convenience. You might ask what in the devils name made me use WITH (NOLOCK) all over the place, but in a busy environment you might find yourself waiting for quite some time as SSISDB tables are often used heavily, the logging is quite verbose but of course it mostly depends on your SSIS usage. I also threw in an OPTION(RECOMPILE) for good measure as the number of records you want to retrieve can vary quite a bit and more often than not it does more good than harm for this specific query.

Now let’s not stop here. Say we also want to gather any information we can get about the SQL Agent job which was actually executing the SSIS package (you are executing packages that way, RIGHT?!). Back in the old days™ you were able to use the SQL Agent job history to troubleshoot SSIS errors, but nowadays all you get is a generic error message:

Executed as user: [USERACCOUNT HERE]. Microsoft (R) SQL Server Execute Package Utility Version xx.x.xxxx.x for 64-bit Copyright (C) Microsoft Corporation. All rights reserved. Started: xx:xx:xx PM Package execution on IS Server failed. Execution ID: 12345, Execution Status:4. To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report Started: xx:xx:xx PM Finished: xx:xx:xx PM Elapsed: xx.xxx seconds. The package execution failed. The step failed.

They tell you to open the reports for more information. But we don’t want to do that anymore, so I am eternally grateful that the error message contains the [execution_id], we will use it to tie everything back to SSISDB (hooray!). All we need to do now is join a few system tables in msdb and as if by magic we know the job name and step along with the command to execute the package (or whatever else you might want to retrieve):

DECLARE @StartDateINT INT, @StartDate DATETIME, @Offset INT = -24;

SET @StartDate = DATEADD(HOUR, @Offset, GETDATE())
SET @StartDateINT = CAST(CONVERT(CHAR(8), DATEADD(HOUR, @Offset, GETDATE()), 112) AS INT);

IF OBJECT_ID('tempdb..#jobs') IS NOT NULL DROP TABLE #jobs;

CREATE TABLE #jobs(name sysname, step_name sysname, message VARCHAR(MAX), run_status VARCHAR(20), run_datetime DATETIME, execution_id INT, command VARCHAR(MAX))

INSERT INTO #jobs(name, step_name, message, run_status, run_datetime, execution_id, command)
SELECT jj.name
		, jh.step_name
		, jh.message
		, CASE jh.run_status
			WHEN 0 THEN 'Failed'
			WHEN 1 THEN 'Succeeded'
			WHEN 2 THEN 'Retry'
			WHEN 3 THEN 'Canceled'
		END run_status
		, msdb.dbo.agent_datetime(jh.run_date, jh.run_time) run_datetime
		, IIF(jh.message LIKE '%execution id:%'
				, CAST(STUFF(STUFF(jh.message, 1, PATINDEX('%Execution ID:%', jh.message) + 13, ''), PATINDEX('%[^0-9]%', STUFF(jh.message, 1, PATINDEX('%Execution ID:%', jh.message) + 13, '')), LEN(jh.message),'') AS BIGINT)
				, NULL
			) execution_id
		, js.command
FROM msdb.dbo.sysjobhistory jh
INNER JOIN msdb.dbo.sysjobs jj
	ON jh.job_id = jj.job_id
INNER JOIN msdb.dbo.sysjobsteps js
	ON jh.job_id = js.job_id
	AND jh.step_id = js.step_id
WHERE js.subsystem = 'SSIS'
	AND jh.run_date >= @StartDateINT;

SELECT ex.execution_id
	 , ex.package_name
	 , CASE 
           WHEN (om.message_source_type = 10) THEN 'ISServerExec' 
           WHEN (om.message_source_type = 20) THEN 'Transact-SQL stored procedure'
           ELSE em.message_source_name
       END AS task_name
	 , CAST(om.message_time AS DATETIME2(0)) AS ssis_run_datetime
	 , CAST(DATEADD(SECOND, DATEDIFF(SECOND, o.start_time, o.end_time), 0) AS TIME(0)) AS ssis_run_duration
	 , CAST(om.message AS XML) AS ssis_error_message
	 , CAST(ob.ssis_properties AS XML) ssis_properties
	 , CAST(oa.ssis_parameters AS XML) ssis_parameters
	 , ex.executed_as_name
	 , ex.project_name
	 , ex.environment_name
	 , jh.name AS job_name
	 , jh.step_name AS job_step_name
	 , jh.run_datetime AS job_run_datetime
	 , jh.run_status AS job_run_status
	 , jh.message AS job_message
	 , jh.command AS job_command
FROM SSISDB.internal.event_messages em WITH (NOLOCK)
INNER JOIN SSISDB.internal.operation_messages om WITH (NOLOCK)
	ON em.event_message_id = om.operation_message_id
INNER JOIN SSISDB.internal.operations o WITH (NOLOCK) 
	ON em.operation_id = o.operation_id
INNER JOIN SSISDB.internal.executions ex WITH (NOLOCK)
	ON o.operation_id = ex.execution_id
OUTER APPLY (SELECT parameter_name AS [name]
			 	  , parameter_value AS [value]
			 FROM ssisdb.internal.execution_parameter_values WITH (NOLOCK)
			 WHERE execution_id = ex.execution_id
			 	 AND object_type < 50
			 	 AND parameter_name NOT LIKE 'CM.%'
			 	 AND sensitive = 0
			 FOR XML RAW('parameter'), ROOT('parameters')) oa(ssis_parameters)
OUTER APPLY (SELECT CASE context_type WHEN 70 THEN context_source_name ELSE REPLACE(STUFF(package_path,1,CHARINDEX('[',package_path) ,''),']','') END [name]
			 	  , CASE context_type WHEN 70 THEN 'Variable' ELSE 'Connection' END [type]
			 	  , property_value [value]
			 FROM ssisdb.internal.event_message_context WITH (NOLOCK)
			 WHERE operation_id = ex.execution_id
				 AND event_message_id = em.event_message_id
			 	 AND ((context_type = 70 AND context_depth = 1)
					OR (context_type = 80 AND property_name = 'ConnectionString'))
			 ORDER BY 2,1
			 FOR XML RAW('property'), ROOT('properties')) ob(ssis_properties)
LEFT JOIN #jobs jh
	ON ex.execution_id = jh.execution_id
WHERE CAST(om.message_time AS DATETIME) >= @StartDate
	AND om.message_type = 120 
GROUP BY ex.execution_id
		, ex.project_name
		, ex.package_name
		, ex.environment_name
		, ex.executed_as_name
		, om.message_source_type
		, em.message_source_name
		, CAST(om.message_time AS DATETIME2(0))
		, CAST(DATEADD(SECOND, DATEDIFF(SECOND, o.start_time, o.end_time), 0) AS TIME(0))
		, om.message
		, oa.ssis_parameters
		, ob.ssis_properties
		, jh.name
	    , jh.step_name
	    , jh.run_datetime
	    , jh.run_status
	    , jh.message
	    , jh.command 
ORDER BY CAST(om.message_time AS DATETIME2(0)) DESC
OPTION(RECOMPILE);

And there you have it. Let me know if it helped you troubleshoot your painful SSIS issues and how you would improve it. Who knows, maybe I even got it wrong somewhere?

 

 

Okay, you are still here, ey? I got one more for you then. Here is a bonus index for you to improve query performance:

USE [SSISDB]
GO
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [x1_message_type_message_time_i_operation_message_id_operation_id_message_source_type_message] ON [internal].[operation_messages]
(
	[message_type] ASC,
	[message_time] ASC
)
INCLUDE ([operation_message_id],
	[operation_id],
	[message_source_type],
	[message])

That’s all folks!

Getting started with XML in SQL Server – PART 2 – Import XML files into your SQL Server

In order to parse XML data you need to have it ready in your SQL server. Since you will not always want to use SSIS or the Import Wizard let’s have a look at 2 different ways to import XML data into a table using TSQL.

 

1. OPENROWSET:

Probably the most common and easiest way to load any sort of BLOB data into your SQL server is the OPENROWSET function that comes with SQL server. To utilize it you need to ensure that it is enabled. Use the following code to enable it:

EXEC sp_configure 'show advanced options', 1
RECONFIGURE
GO
EXEC sp_configure 'ad hoc distributed queries', 1
RECONFIGURE
GO
EXEC sp_configure 'show advanced options', 0
RECONFIGURE
GO

Once that is taken care of you can import any XML file residing on your machine into a table. Let’s assume you have an XML file in your C: drive in a folder called SQL with the filename Collection.xml. The code below will insert it into a temporary table which we create on the fly:

/* create temp table to store the XML data */
IF (SELECT Object_Id('tempdb..#XML_Data')) IS NULL
	BEGIN
		CREATE TABLE #XML_Data (RowID INT IDENTITY(1,1), XML_Data_Column XML, ts datetime default getdate());
	END
ELSE
	BEGIN
		TRUNCATE TABLE #XML_Data;
	END;

/* load a locally saved XML file to a temp table */ 
BEGIN TRY
	/* use OPENROWSET and open file as a BLOB */
	INSERT INTO #XML_Data(XML_Data_Column)
	SELECT CONVERT(XML, BulkColumn) AS BulkColumn
	FROM OPENROWSET(BULK 'C:SQLCollection.xml', SINGLE_BLOB) AS x;

	SELECT *
	FROM #XML_Data
END TRY

BEGIN CATCH
	/* display error message */
	SELECT ERROR_NUMBER() ErrorCode
		 , ERROR_MESSAGE() ErrorMsg;	
END CATCH

You can now query the column XML_Data_Column and display the file you just loaded.

 

2.  OLE Automation Stored Procedures

Not always does the XML data reside on your server or local machine, often times it needs to be retrieved from the internet using various APIs out there. Even though this is a little more complicated than using OPENROWSET there is no need to worry, SQL server got you covered. Using OLE Automated Stored Procedures you will be able to retrieve your data directly with a few lines of TSQL code. Once again we need to ensure that the functionality is enabled in our SQL server, to make sure it works use the code below:

EXEC sp_configure 'show advanced options', 1
RECONFIGURE 
GO
EXEC sp_configure 'Ole Automation Procedures', 1
RECONFIGURE
GO
EXEC sp_configure 'show advanced options', 0
RECONFIGURE 
GO

In order to make the method call we need to do a few things:

  1. we need to create an object
  2. we need to open this object
  3. we need to let the object know what exactly we want to do
  4. we need to send the request
  5. optionally we can check the response status to see if our request was successful
  6. finally we need to save the response to a table.

Now this seems to be an awful lot to do, but it allows us very conveniently to do everything straight out of SQL server without having to set up a retrieval process outside of our code, so jumping through a few extra hoops might be worth it.

Let’s have a look at an example. The code below makes an API call to the website www.boardgamegeek.com and retrieves a list of all board games I own and have actually found the time to play. Since the site usually does not send the response we are looking for in the first go – they send an acknowledgement response first – we need to put the request in a loop until we get the output we really want. The code looks like this:

/* declare variables needed during the method call */
DECLARE @URL VARCHAR(8000), @Obj int, @HTTPStatus int, @i int;

/* create temp table to temporarily store the xml response */
IF (SELECT Object_Id('tempdb..#XML_Responses')) IS NULL
	BEGIN
		CREATE TABLE #XML_Responses (RowID INT IDENTITY(1,1), XML_Data XML, HTTPStatus int, ts datetime default getdate());
	END
ELSE
	BEGIN
		TRUNCATE TABLE #XML_Responses;
	END;

/* the call might have to be repeaded since the first time the website usually responds with a message stating it accepts the request, the second call renders the XML */
BEGIN TRY
	WHILE ISNULL(@HTTPStatus,0) <> 200
		BEGIN
			/* URL used to get the boardgame list from boardgamegeek.com for Thomas Schutte */
			SELECT @URL = 'http://www.boardgamegeek.com/xmlapi2/collection?username=thrond&own=1&played=1'

			EXEC sp_OACreate 'MSXML2.XMLHttp', @Obj OUT 
			EXEC sp_OAMethod @Obj, 'open', NULL, 'GET', @URL, false
			EXEC sp_OAMethod @Obj, 'setRequestHeader', NULL, 'Content-Type', 'application/x-www-form-urlencoded'
			EXEC sp_OAMethod @Obj, send, NULL, ''
			EXEC sp_OAGetProperty @Obj, 'status', @HTTPStatus OUT

			/* store the result to our temp table */
			INSERT INTO #XML_Responses(XML_Data)
			EXEC sp_OAGetProperty @Obj, 'responseXML.xml';
			
			/* retrieve identity */
			SET @i = @@IDENTITY;

			/* update HTTPStatus to our table */
			UPDATE #XML_Responses
			SET HTTPStatus = @HTTPStatus
			WHERE RowID = @i

			/* wait for retry */
			IF @HTTPStatus <> 200
				BEGIN
					WAITFOR DELAY '00:00:02'
				END
		END;

	/* output all responses */
	SELECT *
	FROM #XML_Responses

END TRY

BEGIN CATCH
	SELECT ERROR_NUMBER() ErrorCode
		 , ERROR_MESSAGE() ErrorMsg;	
END CATCH

You can filter out any responses that do not have a HTTPStatus of 200 so you won’t save any trash in your DB, but for this example I decided it makes sense to leave it in there.

And there you have it. 2 ways to import XML using TSQL into your database. I am sure there are other ways (using a linked server connection comes to mind *ugh*) but for now these 2 will have to do.

Feel free to leave any feedback in the comments below and look out for Part 3 where I’ll show you how to get XML into your SQL server using SSIS.

 

Getting started with XML in SQL Server

Part 1 – Parsing XML

Part 3 (not available yet) will use SSIS to achieve the same results as in Part 1 & 2

Part 4 (not available yet) will show you how to use XQuery and create XML files out of SQL Server

Getting started with XML in SQL Server – Part 1 – Parsing XML

A lot of people don’t like XML. It has a lot of overhead, it’s not easy to read for those unfamiliar with the structure and parsing it is a pain in the ass. Well, let’s see what SQL Server offers to load it anyways.

 

Let’s use this simple XML as an example:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<items>
  <item id="1">
    <name>Item 1</name>
    <comment>This is test item one</comment>
    <quantity>10</quantity>
  </item>
  <item id="2">
    <name>Item 2</name>
    <comment>This is the second test item</comment>
    <quantity>5</quantity>
  </item>
</items>

This example represents the data of 2 items. Each item has 3 data elements and one attribute. It is important to know the difference between elements and attributes as they are parsed differently. To keep it simple this example does not use a namespace.

One thing you will notice quickly when dealing with XML is that you need to know the structure as well as the datatypes, you will not be able to do much without that knowledge.

The code below assigns our sample data to a variable with the datatype XML, a SELECT statement displays the assigned value of the variable:

/* declare variable with XML datatype */ 
DECLARE @XML XML; 

/* assign sample XML to variable */ 
SET @XML = '<?xml version="1.0" encoding="UTF-8" standalone="no" ?> 
			<items> 
				<item id="1"> 
					<name>Item 1</name> 
					<comment>This is test item one</comment> 
					<quantity>10</quantity> 
				</item> 
				<item id="2"> 
					<name>Item 2</name> 
					<comment>This is the second test item</comment> 
					<quantity>5</quantity> 
				</item> 
			</items>'; 

/* display variable */ 
SELECT @XML XML;

 

XML output columns usually are represented as hyperlinks in SSMS (XML columns can – just like VARBINARY – hold quite a lot of data, so think twice before including them in your queries directly). If the XML is not too big you can click on it and a new tab will pop up with the XML code inside:

XML_in_SSMS

XML displayed in SSMS

 

Notice something? SQL Server got rid of the XML declaration. Keep this in mind when dealing with XML files, sometimes SQL Server refuses to convert a string to XML because of the declaration. In those cases you will have to get rid of it. A simple statement using STUFF in combination with PATINDEX will do the trick:

/* trim XML declaration */
SELECT @XML = STUFF(@String,1, PATINDEX('%?>%', @String) + 1, '')

 

No how do we parse our variable? SQL Server offers different ways to parse XML in SQL server, see 2 different approaches below.

 

1. Using CROSS APPLY and the value() method:

/* parse @XML using CROSS APPLY */
SELECT cb.item_node.value('./@id'						, 'INT')				AS id
	 , cb.item_node.value('(./name/text())[1]'			, 'VARCHAR(200)')		AS name
	 , cb.item_node.value('(./comment/text())[1]'		, 'VARCHAR(200)')		AS comment
	 , cb.item_node.value('(./quantity/text())[1]'		, 'INT')				AS quantity
	 , XML_Data.value('(/items/item/@id)[1]'			, 'INT')				AS id_without_crossapply
	 , XML_Data.value('(/items/item/name/text())[1]'	, 'VARCHAR(200)')		AS name_without_crossapply
	 , ca.root_node.value('(./item/name/text())[1]'		, 'VARCHAR(200)')		AS name_with_one_crossapply
FROM (VALUES (@XML)) xx(XML_Data)
CROSS APPLY xx.XML_Data.nodes('/items') ca(root_node)
CROSS APPLY ca.root_node.nodes('./item') cb(item_node);

If you execute this statement and have a look at the result you will hopefully see why more than one CROSS APPLY is needed to actually parse the data. Value() requires to specify an ordinal number to ensure it will return a singleton, this is done in square brackets. [1] will return the first available value matching the expression used, [2] the second and so on. To flatten the output a CROSS APPLY for each child element with multiple elements will do the trick, if only the root node is referenced the query will always return the first value taken from the first match for all results (take a look at the output on the columns id_without_crossaply / name_without_crossapply / name_with_one_crossapply).

As mentioned earlier attributes and elements are treated differently. To retrieve the value of an attribute you will add the @ sign to the name of the attribute. Elements will require the exact path, their value is retrieved using text().

 

2. Using OPENXML:

/* parse @XML using sp_xml_preparedocument */
DECLARE @iDoc int;

EXEC sp_xml_preparedocument @iDoc OUTPUT, @XML;

SELECT id, name, comment, quantity
FROM OPENXML (@iDoc, '/items/item',1)
WITH (
	  id			INT				'./@id',
	  name			VARCHAR(200)	'./name/text()',
	  comment		VARCHAR(200)	'./comment/text()',
	  quantity		INT				'./quantity/text()'
	 );


EXEC sp_xml_removedocument @iDoc;

This approach is quite different. The system stored procedure sp_xml_preparedocument is used to create an object, referenced by an integer that is stored in a variable. This integer is required when using OPENXML to reference the right rowset provider. To access the data a WITH clause is used to define the output columns, allowing the use of aliases or the given name of attributes and elements. If you chose to use the same name for attributes as used in the XML you do not need to specify the path to the attribute (‘./@id’ is not required in this case). Elements however require a path and text() is used once again to retrieve the value.

Note that you should always destroy the object using sp_xml_removedocument once the XML is no longer required to free system resources.

In this example we chose to directly use the /items/item path, which allows to flatten the XML directly. See how the result changes if you change the path to /items (adding /item to the path of each element in the WITH clause): only the first row will be displayed.

 

Which approach is better? I don’t know. Each XML will be different, these example will not work for every XML out there. But at least now you have the basic means to start tackling XML.

 

Getting started with XML in SQL Server

Part 2 – Import XML files into your SQL Server

Part 3 (not available yet) will use SSIS to achieve the same results as in Part 1 & 2

Part 4 (not available yet) will show you how to use XQuery and create XML files out of SQL Server