Identifying SSIS execution errors in SQL Server 2012+

First of, if you are not using the project deployment model in SSIS 2012+ this blog post is not for you – sorry. The techniques described below assume you are storing your packages in SSISDB. However, you don’t necessarily need to execute your packages through SQL Agent jobs.

Okay, let’s get right into it. After getting too frustrated with waiting for SSIS execution reports out of SSMS, which are not only pretty slow but also give you a lot of information you don’t really need, I figured there has to be a better way to retrieve relevant information directly. What better place to look than tapping directly into SSISDB, the system database which is created automatically when you create an Integration Services Catalog?

Even though this is far from a deep dive into the table structure of SSISDB, I’d like to give you a little more background information on the tables we care about for this script (note that most of the documentation refers to objects with the catalog schema (Microsoft Documentation), those are views that sit on top of the internal schema tables which we use here):

  • internal.executions: lots of helpful information in this table, in particular the [package_name], which account was used to execute the package ([executed_as_name]), additional information about the project the package is part of ([project_name]) and if an environment was used to schedule the package we get information about this as well ([environment_name])
  • internal.operations: this table will tell us when exactly the package was executed ([start_time] and [end_time])
  • internal.operation_messages: holds the error message ([message]) we came here for in the first place. This table holds a LOT of information, so we will have to limit our query to those lines that hold errors, we use the [message_type] column to identify what we want to bring back. It might also help to know the exact time the error occurred ([message_time])
  • internal.event_messages: tells us which step in the SSIS package failed, given you name each component in your package ([message_source_name]), otherwise you’ll see a lot of [Data Flow Task]’s and [Execute SQL Task]’s here
  • internal.execution_parameter_values: all package parameters are listed in this table ([parameter_name] & [parameter_value])
  • internal.event_message_context: last but not least the probably second most useful piece of information is right here, execution-time variable and connection string values ([context_source_name]/[package_path] & [property_value])

To tie it all together there are 2 key identifiers we have to use. Unfortunately Microsoft decided to use [execution_id] on some tables as the primary key and [operation_id] on others, just like you find [event_message_id] here and [operation_message_id] there. Confusing at times and doesn’t make it easy to get a grasp of the structure at first but I am sure they have good reasons for doing so…?!

Now lets have a look at the script (finally!) and I’ll walk you through it:

DECLARE @StartDate DATETIME, @Offset INT = -24;

SET @StartDate = DATEADD(HOUR, @Offset, GETDATE())

SELECT ex.execution_id
	 , ex.package_name
	 , CASE 
           WHEN (om.message_source_type = 10) THEN 'ISServerExec' 
           WHEN (om.message_source_type = 20) THEN 'Transact-SQL stored procedure'
           ELSE em.message_source_name
       END AS task_name
	 , CAST(om.message_time AS DATETIME2(0)) AS ssis_run_datetime
	 , CAST(DATEADD(SECOND, DATEDIFF(SECOND, o.start_time, o.end_time), 0) AS TIME(0)) AS ssis_run_duration
	 , CAST(om.message AS XML) AS ssis_error_message
	 , CAST(ob.ssis_properties AS XML) ssis_properties
	 , CAST(oa.ssis_parameters AS XML) ssis_parameters
	 , ex.executed_as_name
	 , ex.project_name
	 , ex.environment_name
FROM SSISDB.internal.event_messages em WITH (NOLOCK)
INNER JOIN SSISDB.internal.operation_messages om WITH (NOLOCK)
	ON em.event_message_id = om.operation_message_id
INNER JOIN SSISDB.internal.operations o WITH (NOLOCK) 
	ON em.operation_id = o.operation_id
INNER JOIN SSISDB.internal.executions ex WITH (NOLOCK)
	ON o.operation_id = ex.execution_id
OUTER APPLY (SELECT parameter_name AS [name]
			 	  , parameter_value AS [value]
			 FROM ssisdb.internal.execution_parameter_values WITH (NOLOCK)
			 WHERE execution_id = ex.execution_id
			 	 AND object_type < 50
			 	 AND parameter_name NOT LIKE 'CM.%'
			 	 AND sensitive = 0
			 FOR XML RAW('parameter'), ROOT('parameters')) oa(ssis_parameters)
OUTER APPLY (SELECT CASE context_type WHEN 70 THEN context_source_name ELSE REPLACE(STUFF(package_path,1,CHARINDEX('[',package_path) ,''),']','') END [name]
			 	  , CASE context_type WHEN 70 THEN 'Variable' ELSE 'Connection' END [type]
			 	  , property_value [value]
			 FROM ssisdb.internal.event_message_context WITH (NOLOCK)
			 WHERE operation_id = ex.execution_id
				 AND event_message_id = em.event_message_id
			 	 AND ((context_type = 70 AND context_depth = 1)
					OR (context_type = 80 AND property_name = 'ConnectionString'))
			 ORDER BY 2,1
			 FOR XML RAW('property'), ROOT('properties')) ob(ssis_properties)
WHERE CAST(om.message_time AS DATETIME) >= @StartDate
	AND om.message_type = 120
GROUP BY ex.execution_id
		, ex.project_name
		, ex.package_name
		, ex.environment_name
		, ex.executed_as_name
		, om.message_source_type
		, em.message_source_name
		, CAST(om.message_time AS DATETIME2(0))
		, CAST(DATEADD(SECOND, DATEDIFF(SECOND, o.start_time, o.end_time), 0) AS TIME(0))
		, om.message
		, oa.ssis_parameters
		, ob.ssis_properties 
ORDER BY CAST(om.message_time AS DATETIME2(0)) DESC
OPTION(RECOMPILE);

Since you might want to limit your output to a specific time window (I usually only care about the last ~3 hours) we have the variable @Offset to allow us to define how many hours you want to go back in time. The script itself is pretty straight forward, a few INNER JOINs to link all the tables together and two OUTER APPLYs to generate XML outputs which come in handy when you try to identify if the error is caused by a wrong connection string, incorrect package parameter or variable. The WHERE clause limits the output to anything within the specified time window based on [message_time] and [message_type] = 120 ensures that we only return errors. I like to see most recent errors on top, but feel free to remove the ORDER BY in case you don’t care. Just go ahead and remove it!

If you check the code within the OUTER APPLYs you might notice that I limit the output somewhat, for example I am not returning any sensitive package parameters as those will be scrambled anyways and won’t help you figuring out anything. Once again, feel free to add or remove WHERE clauses so it works best for you!

Since I am lazy I chose to cast the error message as XML so you can quickly click on it and see it in a new window for convenience. You might ask what in the devils name made me use WITH (NOLOCK) all over the place, but in a busy environment you might find yourself waiting for quite some time as SSISDB tables are often used heavily, the logging is quite verbose but of course it mostly depends on your SSIS usage. I also threw in an OPTION(RECOMPILE) for good measure as the number of records you want to retrieve can vary quite a bit and more often than not it does more good than harm for this specific query.

Now let’s not stop here. Say we also want to gather any information we can get about the SQL Agent job which was actually executing the SSIS package (you are executing packages that way, RIGHT?!). Back in the old days™ you were able to use the SQL Agent job history to troubleshoot SSIS errors, but nowadays all you get is a generic error message:

Executed as user: [USERACCOUNT HERE]. Microsoft (R) SQL Server Execute Package Utility Version xx.x.xxxx.x for 64-bit Copyright (C) Microsoft Corporation. All rights reserved. Started: xx:xx:xx PM Package execution on IS Server failed. Execution ID: 12345, Execution Status:4. To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report Started: xx:xx:xx PM Finished: xx:xx:xx PM Elapsed: xx.xxx seconds. The package execution failed. The step failed.

They tell you to open the reports for more information. But we don’t want to do that anymore, so I am eternally grateful that the error message contains the [execution_id], we will use it to tie everything back to SSISDB (hooray!). All we need to do now is join a few system tables in msdb and as if by magic we know the job name and step along with the command to execute the package (or whatever else you might want to retrieve):

DECLARE @StartDateINT INT, @StartDate DATETIME, @Offset INT = -24;

SET @StartDate = DATEADD(HOUR, @Offset, GETDATE())
SET @StartDateINT = CAST(CONVERT(CHAR(8), DATEADD(HOUR, @Offset, GETDATE()), 112) AS INT);

IF OBJECT_ID('tempdb..#jobs') IS NOT NULL DROP TABLE #jobs;

CREATE TABLE #jobs(name sysname, step_name sysname, message VARCHAR(MAX), run_status VARCHAR(20), run_datetime DATETIME, execution_id INT, command VARCHAR(MAX))

INSERT INTO #jobs(name, step_name, message, run_status, run_datetime, execution_id, command)
SELECT jj.name
		, jh.step_name
		, jh.message
		, CASE jh.run_status
			WHEN 0 THEN 'Failed'
			WHEN 1 THEN 'Succeeded'
			WHEN 2 THEN 'Retry'
			WHEN 3 THEN 'Canceled'
		END run_status
		, msdb.dbo.agent_datetime(jh.run_date, jh.run_time) run_datetime
		, IIF(jh.message LIKE '%execution id:%'
				, CAST(STUFF(STUFF(jh.message, 1, PATINDEX('%Execution ID:%', jh.message) + 13, ''), PATINDEX('%[^0-9]%', STUFF(jh.message, 1, PATINDEX('%Execution ID:%', jh.message) + 13, '')), LEN(jh.message),'') AS BIGINT)
				, NULL
			) execution_id
		, js.command
FROM msdb.dbo.sysjobhistory jh
INNER JOIN msdb.dbo.sysjobs jj
	ON jh.job_id = jj.job_id
INNER JOIN msdb.dbo.sysjobsteps js
	ON jh.job_id = js.job_id
	AND jh.step_id = js.step_id
WHERE js.subsystem = 'SSIS'
	AND jh.run_date >= @StartDateINT;

SELECT ex.execution_id
	 , ex.package_name
	 , CASE 
           WHEN (om.message_source_type = 10) THEN 'ISServerExec' 
           WHEN (om.message_source_type = 20) THEN 'Transact-SQL stored procedure'
           ELSE em.message_source_name
       END AS task_name
	 , CAST(om.message_time AS DATETIME2(0)) AS ssis_run_datetime
	 , CAST(DATEADD(SECOND, DATEDIFF(SECOND, o.start_time, o.end_time), 0) AS TIME(0)) AS ssis_run_duration
	 , CAST(om.message AS XML) AS ssis_error_message
	 , CAST(ob.ssis_properties AS XML) ssis_properties
	 , CAST(oa.ssis_parameters AS XML) ssis_parameters
	 , ex.executed_as_name
	 , ex.project_name
	 , ex.environment_name
	 , jh.name AS job_name
	 , jh.step_name AS job_step_name
	 , jh.run_datetime AS job_run_datetime
	 , jh.run_status AS job_run_status
	 , jh.message AS job_message
	 , jh.command AS job_command
FROM SSISDB.internal.event_messages em WITH (NOLOCK)
INNER JOIN SSISDB.internal.operation_messages om WITH (NOLOCK)
	ON em.event_message_id = om.operation_message_id
INNER JOIN SSISDB.internal.operations o WITH (NOLOCK) 
	ON em.operation_id = o.operation_id
INNER JOIN SSISDB.internal.executions ex WITH (NOLOCK)
	ON o.operation_id = ex.execution_id
OUTER APPLY (SELECT parameter_name AS [name]
			 	  , parameter_value AS [value]
			 FROM ssisdb.internal.execution_parameter_values WITH (NOLOCK)
			 WHERE execution_id = ex.execution_id
			 	 AND object_type < 50
			 	 AND parameter_name NOT LIKE 'CM.%'
			 	 AND sensitive = 0
			 FOR XML RAW('parameter'), ROOT('parameters')) oa(ssis_parameters)
OUTER APPLY (SELECT CASE context_type WHEN 70 THEN context_source_name ELSE REPLACE(STUFF(package_path,1,CHARINDEX('[',package_path) ,''),']','') END [name]
			 	  , CASE context_type WHEN 70 THEN 'Variable' ELSE 'Connection' END [type]
			 	  , property_value [value]
			 FROM ssisdb.internal.event_message_context WITH (NOLOCK)
			 WHERE operation_id = ex.execution_id
				 AND event_message_id = em.event_message_id
			 	 AND ((context_type = 70 AND context_depth = 1)
					OR (context_type = 80 AND property_name = 'ConnectionString'))
			 ORDER BY 2,1
			 FOR XML RAW('property'), ROOT('properties')) ob(ssis_properties)
LEFT JOIN #jobs jh
	ON ex.execution_id = jh.execution_id
WHERE CAST(om.message_time AS DATETIME) >= @StartDate
	AND om.message_type = 120 
GROUP BY ex.execution_id
		, ex.project_name
		, ex.package_name
		, ex.environment_name
		, ex.executed_as_name
		, om.message_source_type
		, em.message_source_name
		, CAST(om.message_time AS DATETIME2(0))
		, CAST(DATEADD(SECOND, DATEDIFF(SECOND, o.start_time, o.end_time), 0) AS TIME(0))
		, om.message
		, oa.ssis_parameters
		, ob.ssis_properties
		, jh.name
	    , jh.step_name
	    , jh.run_datetime
	    , jh.run_status
	    , jh.message
	    , jh.command 
ORDER BY CAST(om.message_time AS DATETIME2(0)) DESC
OPTION(RECOMPILE);

And there you have it. Let me know if it helped you troubleshoot your painful SSIS issues and how you would improve it. Who knows, maybe I even got it wrong somewhere?

 

 

Okay, you are still here, ey? I got one more for you then. Here is a bonus index for you to improve query performance:

USE [SSISDB]
GO
SET ANSI_PADDING ON
GO
CREATE NONCLUSTERED INDEX [x1_message_type_message_time_i_operation_message_id_operation_id_message_source_type_message] ON [internal].[operation_messages]
(
	[message_type] ASC,
	[message_time] ASC
)
INCLUDE ([operation_message_id],
	[operation_id],
	[message_source_type],
	[message])

That’s all folks!

Leave a Reply

Your email address will not be published. Required fields are marked *