DWMaintenance Stalled Processes Kill the SCSM 2012 Data Warehouse

Anyone using SCSM 2012 has probably suffered through some type of challenge with the ETL data warehouse and cube processing jobs.  After a few days of troubleshooting, I think I’ve discovered why.

Symptoms
– All Data Warehouse jobs disabled.
– DWMaintenance job never finishes or runs a realllllllly long time.
– Reports don’t show updated data.
– Failed cube processing
– You tried Travis Wright’s awesome ETL running script and get an “Exiting since job is in an unexpected status.” (…if you haven’t tried this yet, you should. It’s the quickest way to tell you have a stalled process).
– Processing is just stalled
– DWMaintenance is stalled on the ManageCubePartitions/ManageCubeTranslations Step
– The data warehouse is just dead.

Root Cause

– The DWMaintenance job has a stalled process. It’s a statusID of 7 (waiting) instead of 3 (not started). Unless you reset this status, the Data Warehouse jobs won’t run and your report data is stale.

Solution –
Review the paragraph “DWMaintenance is Hung on the ManageCubePartitions/ManageCubeTranslations Step ” on Danny Chen’s blog exhaustive article detailing cube and SCDW troubleshooting. You can stop here or read on to see how I troubleshot the process to get to his solution.

Troubleshooting Steps – How I Solved It

Don’t try to troubleshoot via the SCSM console. 

Use PowerShell on your Data Warehouse server

Also, when the MPSync or DWMaintenance jobs start, they disable all of the ETL jobs. You will see the Is Enabled column set to False for each of the ETL jobs. This means that even if the ETL job status shows it is running, it actually is not running. When the MPSync or DWMaintenance job completes, the ETL jobs are automatically enabled and resume processing. Enabled and Not Running is a good thing. Means the job ran.

  1. Load the Datawarehouse cmdlets – Use one of these depending on your install path

import-module “C:\Program Files\Microsoft System Center 2012\Service Manager\Microsoft.EnterpriseManagement.Warehouse.Cmdlets”

or

Import-Module ‘%ProgramFiles%\Microsoft System Center 2012\Service Manager\Microsoft.EnterpriseManagement.Warehouse.Cmdlets.psd1’

  1. Run Get-SCDWJob and note the status of the jobs. Is DWMaintenance running. Stop it with Stop-SCDWJob –Jobname DWMaintenance. Do the same for MPSyncJob or any other running job.
  2. Check for Job Module errors. This is important to help us find stalled processes in the jobs. The easiest way to do this is to export the SCDW job modules to a text file – Get-SCDWJobModule >> c:\scripts\jobs.txt – When I exported mine I noticed ModuleID 6 was waiting with an error:

Job Name: DWMaintenance, Batch Id: 20154

ModuleID     Module Name                                                      Status                Error Message
——–           ———–             ——                                         ————-
18                   ResetFailedSystemDerivedMPWorkItems    Not Started
3                     ApplyUpdatesToCube                                        Not Started
6                     DisableProcesses                                                 Waiting <Errors><Error
7                     EnableProcesses                                                   Not Started

there were other modules too, but all were in a good status.

Now using Danny Chen’s blog, I ran these SQL scripts against the DWStagingAndConfig database.

select processId from infra.process where processname = ‘DWMaintenance’

Returns a value of 1. So, the ProcessID for the DWMaintenance is 1.

select batchid from infra.batch where processId = 1

Returns a batchID of 20161 and 20162. Starting with the first one, I updated the status to 3 to clear the stalled process. I also updated the second one.

select * from infra.workitem(nolock) where BatchId = 20161
select * from infra.workitem(nolock) where BatchId = 20162

Use the BatchID to determine the stalled process (moduleID 6 previously). You’ll get a long list of WorkItemID’s. Look for the one with a status of 7. This is the problem. In my case it was 7429423.

update infra.workitem set statusid = 3 where workitemId = 7429423

I then double-checked the status an the WAITING status of 7 was gone, replaced with statusid of 3.

  1. Once I finished this, I manually ran my MPSyncJob using powershell – start-SCDWJob -jobname MPSyncJob
  2. Follow the progress of each of your jobs with the Get-scdwjobmodule -jobname MPSyncJob
  3. Now follow this process for each of your ETL jobs after MPSyncJob finishes successfully. If MPSyncJob fails, there are a number of troubleshooting blogs to correct this.
  4. start-SCDWJob -jobname (Enter your name here) and start-SCDWJob -jobname (Enter your job’s name) – Wait for these to finish.
  5. start-SCDWJob –jobname Transform.Common – Wait for the job to finish.
  6. start-SCDWJob –jobname Load.Common

All of my ETL jobs finished in under 30-minutes, but yours might vary. Checking my reports showed updated data and all of my SCDW jobs were Enabled and displayed a status Not Running in the management console.

Good to go. Hope this helps and leave me a comment if you need more details or help.

My cubes, however, are still down and I’m tackling those next. More to follow…

DS

Advertisements

About Doug Sigmon

IT Helpdesk manager in southern California. Love technology, gadgets, and golf.
This entry was posted in SCDW, SCSM 2012 and tagged , . Bookmark the permalink.

7 Responses to DWMaintenance Stalled Processes Kill the SCSM 2012 Data Warehouse

  1. Geert says:

    Experiencing the very same problem. My workaround to temporarily solve the issue (works for X days)
    1. shut down the DW server
    2. stop/start analysis services on SQL Server on the SCSM AS DB
    3. connect to the Analysis Services DB (through eg SQL Mgt Studio) and process the DWDataMart from there
    4. restart the DW Server
    5. run the maintenance job
    6. run the ETL jobs
    7. run the cubes

  2. Pingback: SCSM Data Warehouse Job Pains: Coming to terms with those pains and fixing them! | Netivia Consulting

  3. It’s a pity you don’t have a donate button! I’d most certainly donate to this brilliant blog! I guess for now i’ll settle for bookmarking and adding your RSS feed to
    my Google account. I look forward to brand new updates and will share this blog
    with my Facebook group. Talk soon!

  4. Pingback: SCSM DW jobs query for stalled processes.

  5. Pingback: SCSM: ETL jobs stuck after upgrade to R2 | A Cloud Adventure

  6. Pingback: Monitors and Metrics | Useful Blog for SCSM DWH Troubleshooting

  7. Pingback: SCSM: ETL jobs stuck after upgrade to R2 - System Center PFE SA blog - Site Home - TechNet Blogs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s