Although I could stabilize my cube processing through the SQL Server Analysis Server, I was not able to successfully stabilize the SCSM 2012 environment and see my nightly runs complete. I could probably get to this state if I re-installed the SCSM 2012 Data Warehouse, but this situation will happen again and reinstallation is not a long term solution for each outage.
After I corrected my nightly run schedule issue (discussed here), the DWMaintenance, MPSyncJob, and the ETL jobs and all but one cube ran as expected. The SystemConfigItem job failed with “Object reference not set to an instance of an object.”
Figure 1 Object reference not set to an instance of an object
Opening a ticket with Microsoft we reviewed these SCSM 2012 Data Warehouse server’s Operations Manager log errors:
Event ID 33526 Incorrect user information detected. The domain name, user name or password were not valid. Ignoring the given credentials and proceeding with the default workflow account:
User information detected:
Per the SCSM Engineers, this is not a problem and can be ignored since the default workflow account has the necessary rights.
Event ID 33566 – An Exception was encountered while trying to process a cube. Cube Name: SystemCenterConfigItemCube Exception Message: Object reference not set to an instance of an object. Stack Trace: at Microsoft.SystemCenter.Warehouse.Olap.OlapCube.GetPartitionsToProcess(Cube asCube) at Microsoft.SystemCenter.Warehouse.Olap.OlapCube.Process(ManagementPackCube mpCube).
The key to the error is the “GetPartitionsToProcess.” One of my ConfigItem Cube partitions was missing an “object.” To validate SCSM Partitions for a particular cube, I ran this query against DWDataMart
select * from etl.cubepartition where CubeName =‘SystemCenterConfigItemCube’
This listed all of the associated partitions. Reviewing each one, I noticed that all but one had FACTS for each month since June (installation date), except for ComputerHostNetworkAdapater monthly partition, which was missing September. I scripted out the missing FACT using one of the other existing month FACTs.
I then re-ran the cube job successfully via PowerShell.
After further research, my theory is that the DWMaintenance Job fails or stalls and is stopped. This would explain why some of the why I’m missing all the FACTs in a DIM. This happened to me again and I solved it by recreating the FACTs.
Event ID 33545 – MP Sync did not associate the lower version of the Management Pack
This was due to an out-of-sync third party MP that I corrected with the vendor.
Event ID 4000 – A monitoring host is unresponsive or has crashed. The status code for the host failure was 2164195371.
This was occurring every 25-minutes. Under the guidance of a Microsoft engineer, we ran C:\Program Files\Microsoft System Center 2012\Service Manager\Tools\SMTracing tool and I provided the resulting capture to the engineer after recording the event occurrence. If this is fixed, I’ll post the solution with a separate post. This is a very difficult issue to fix and I’m still working with Microsoft to find a solution.
Root cause – The System Center Orchestration Configuration Library never finishes. Check and verify the Deployment Status. It’s probably stuck in a running state.
The fix is to open a ticket with Microsoft Premier Support and ask them to fix the Deploy Sequence in the database. I don’t fully understand what was completed, so I don’t want to post anything until I understand. It’s a quick fix, but let the MS professionals walk you through it.
Event ID 11366 – The Microsoft Operations Manager Scheduler Data Source Module failed to initialize because some window has no day of the week set.
The fix for this is discussed here.
Here’s an overview of the process I followed to solve problems with my cubes. In a nutshell, I didn’t find a holistic solution to the failing process and each experience is probably different. My goal is to provide you quicker access to the varying solutions located throughout the web.
How I Measured Success – A Goal of Processed Cubes
- Able to run Travis Wright’s script successfully end-to-end (this can take over an hour depending on how long it’s been since a good run).
See the following via the \Data Warehouse\Data Warehouse\Data Warehouse Jobs display for each job:
- Enabled – Yes
- Status – Not Started
See the following via the \Data Warehouse\Data Warehouse\Cubes display for each cube:
- Status – PROCESSED
- Schema Pending Changes – NO
- Last Processed Date – Current or accurate with the time your processed the cube.
- The SCSM Data Warehouse Jobs that process the cubes successfully run each at the local time, as scheduled
Note: Criteria Item 1 is probably the most important as it provides a more realistic end-to-end run on what you’ll see from your automated processing.
Things You Should Know Before Troubleshooting SCSM 2012 Cubes (IMHO)
- Manually processing the cubes from the Analysis Server and then running the SCSM Data Warehouse jobs is not going to cure the problem for the long term and is not recommended as an approach by the experts I talked to.
- The Data Warehouse server PowerShell command (get-SCDWJobSchedule) reflects job schedules in GMT versus local time.
- Log on to the SCSM servers with the same service account you use as the data warehouse RUN AS account.
- Just because a Data Warehouse job is stopped doesn’t mean the module is. Verify the job status with get-scdwjobmodule –jobname “NAME OF JOB” (i.e. get-scdwjobmodule –jobname “Process.SystemCenterWorkItemsCube”). You Data Warehouse job will stall in a “running” state If the module is still running when you re-start the process.
- SCSM 2012 checks the version of your SQL Server Analysis Server. If it’s the Enterprise version, SCSM 2012 will process the cubes incrementally (PROCESS INCREMENTAL) if the cubes have already been processed fully (PROCESS FULL). This will keep your processed jobs from future failures. This functionality isn’t available in Standard.
- Read Danny Chen’s 8-part blog and become an expert on how this process works. 1, 2, 3, 4, 5, 6, 7, and 8.
- If a Cube job starts via the console and runs longer than an hour or two, this is not normal. Run the cube manually at the SSAS and you’ll probably find a bad DIM. Unprocess bad DIMs, then process them FULL.
Here are the paraphrased steps I followed to fix my cubes (details are in the section below) after they failed:
As with any code, you should run this in a test environment first.
Ensured I was running SQL 2008 Enterprise.
Looked at the Ops Manager event log for error ID 33573 – “The operation has been cancelled due to memory pressure.” If you see this on your SSAS, add more RAM. I have 32 GB of RAM, so I haven’t seen this issue, but there are number of posts about it.
Apply any pending schema changes (see below for details) via \Data Warehouse\Data Warehouse\Cubes — Apply Schema Changes
RUN-ETL.ps1 – Successful? Yes? Then your cubes should run as scheduled. FAIL? Follow the steps detailed in the below section “Manually Verifying SCSM 2012 Cubes”
- Let your jobs run overnight and verify everything is working as expected.
Manually verifying SCSM 2012 Cubes
How to ensure your cubes will even process.
1. Via the OLAP server, PROCESS FULL all the cubes (process all or just the failed cube)
Failures here were associated with dimensions missing key data. The error messages show the offending object and you can correct it this by processing the individual dimensions separately (see below). My failed DIMS show a state of Process Update so I always Unprocess, and then Process Full. Eventually, all your cubes will process.
2. Unprocess the cubes at the SSAS.
This places the cubes in a Process Full requirement for the next processing and if your watermark batchID is 0, SCSM will complete a full process updating the watermark. Check the properties of the cube for an Unprocessed state.
3. Confirm that DWDataMart Database watermarks are setup for a Process Full.
When you manually process the cubes and then un-process them before your nightly runs, the watermark batchids are usually reset. To confirm this:
select * from etl.cubepartition
This returns a result set showing either a batchID from a completed process or a batchID of 0. SCSM uses this watermark to determine the type of run for the next schedule event.
Anything other than a 0 will result in a incremental build, while a 0 runs a full process. To have all of your cubes reset, simply update the batchID to 0 after un-processing all your cubes manually.
update etl.CubePartition set WatermarkBatchId = 0 where CubeName = ‘SystemCenterCubeNameHereCube’
4. Start the appropriate cube processing job for the failed cube and wait for the results.
I’ve spent a lot of time working through this issue and may have left off a process or tip, so comment below if you need more guidance.