Troubleshoot data warehouse issues

Knowledge Checkpoint

A grasp of these concepts will help you understand this documentation better:

caution

Note that we do not cover SQL Optimization in this guide. Head over to Best practices to improve Holistics reporting performance for optimization tips.

This article provides general troubleshooting methods for customers who are encountering performance issues from bottlenecks in their data warehouses while using Holistics. It should be used in conjuction with Troubleshoot slow-loading reports for the best results.

Investigation Methods

Identify long-running jobs in Job Monitoring

See Cancel long-running jobs.

Check if there are many failed jobs with the same errors

Description

If you have multiple failed jobs within a short timeframe, it may be because there has been a systemic failure from a common cause, most likely in the data warehouse layer. Some common issues are:

DB fails to write Persisted Models
DB out of Memory
DB connection refused
DB connection timed out
...

Instructions

Open Job Monitoring Dashboard.
Check if there are many jobs failed with the same failures. For example, these jobs below failed because of issues from the Data Warehouse's side.

Check if CPU Utilization of your data warehouse is high

info

Not every data warehouse offers CPU Utilization monitoring tools. If yours does not support this feature, consider research on the Internet for workarounds on how to investigate this metric.

Description

If your database’s CPU Utilization is high, SQL queries may take a long time to process. Commands such as cancelling job are unable to be executed due to the database being unresponsive.

Common causes for high CPU Utilization in data warehouses are:

High number of logical reads due to:
- Queries that lack indexes
- Datawarehouse use outdated index statistics
- Inefficient queries
Increase in workload

If your data warehouse has high CPU Utilization usage, it is necessary to identify what queries are causing this issue.

Instructions

Depending on your data warehouse, you can select the appropriate guide to check the CPU Utilization metric:
- Use AWS CloudWatch to monitor CPU Utilization of AWS RDS
- Monitoring Warehouse Load in Snowflake
If from step 1, you frequently find this metric to be abnormally high, proceed to list out long-running queries or processes, especially during peak CPU usage.
Send these queries to Holistics support.

Run EXPLAIN ANALYZE on slow/suspected queries

Description

EXPLAIN ANALYZE on a SQL query reveals the query plan of that query. It is useful to identify the bottlenecks of the query, i.e which parts take up a long time to complete.

Instructions

General Instructions

In Holistics, get the generated SQL of your slow job
Go to Job Monitoring Dashboard and find job logs of slow-running jobs. You should be able to see the generated SQL. Consider copy this SQL to re-run it for troubleshooting.
Note that SQLs generated by Holistics jobs will have a /* - Job ID: xxxxxxxxxx* */ comment on the top.
Append EXPLAIN ANALYZE to the query and run it against your data warehouse
Copy the generated SQL from step 1 and append EXPLAIN ANALYZE to it. Then, run the whole query in your DB Engine. Send it to Holistics if you are not sure on how to interpret the result.
To visualize the query plan and see which parts take up the most time, you can paste the query plan into a visualization tool like explain.depesz.com.

Snowflake-specific Instructions

Follow the same instruction as step 1 of the General Instructions.
Then, find the full query in Snowflake UI.
- You can search for it in History Page using the SQL text parameters.
- Or you can re-run the SQL in Holistics Editor.
Then, go to Snowflake’s History Page and look for the latest query executed.
Get detailed information about your SQL.
info
If you are sending troubleshooting information to Holistics, please include a Query Detail and all Query Profile screenshots on your email.
Click on the SQL, which should take you to Query Detail view.
Then, navigate to the Query Profile tab, which will show you in detail what operations get executed for each step. There can be many profile tabs, so be sure to check out all of them.

Mitigation methods

Cancel long-running jobs in Holistics

See Cancel long-running jobs.

Manually cancel jobs in your data warehouse

Description

For the jobs that cannot be cancelled using Job Monitoring Dashboard (i.e field suggestion jobs), you can cancel them manually from within your data warehouse.

Instructions

Get the pid of the long-running processes. Please make sure to check the query column of the returned processes:

select * from pg_stat_activity
where query_start < now() - interval '1 hour'

Cancel the long-running processes

select pg_cancel_backend(the_pid_collected_from_step_1)

Improvement methods

Add table indexes/clustering keys where relevant

Description

Indices can greatly improve query speed and reduce high CPU Usage issue in your data warehouse.

Instructions

Add the indexes where necessary based on the Query Plan.
- Add indexes in PostgreSQL databases
- Add clustering keys for Snowflake
Run EXPLAIN ANALYZE again to see if the indexes have been properly applied.

Please consider carefully before adding indexes. It will boost READ operations speed, but slow down INSERT operations. There is also a need for additional disk space to store them. For more information, refer to this article.

Increase Data Warehouse computing resources

Depending on the Data Warehouse (DW) that you are using, there will be a variety of methods to scale the DW up and increase its computing resources (CPU, memory, etc.).

Here are a few of the methods:

Redshift:
- Resizing clusters
- Concurrency scaling

Investigation Methods​

Identify long-running jobs in Job Monitoring​

Check if there are many failed jobs with the same errors​

Description​

Instructions​

Check if CPU Utilization of your data warehouse is high​

Description​

Instructions​

Run EXPLAIN ANALYZE on slow/suspected queries​

Description​

Instructions​

General Instructions​

Snowflake-specific Instructions​

Mitigation methods​

Cancel long-running jobs in Holistics​

Manually cancel jobs in your data warehouse​

Description​

Instructions​

Improvement methods​

Add table indexes/clustering keys where relevant​

Description​

Instructions​

Increase Data Warehouse computing resources​

Investigation Methods

Identify long-running jobs in Job Monitoring

Check if there are many failed jobs with the same errors

Description

Instructions

Check if CPU Utilization of your data warehouse is high

Description

Instructions

Run EXPLAIN ANALYZE on slow/suspected queries

Description

Instructions

General Instructions

Snowflake-specific Instructions

Mitigation methods

Cancel long-running jobs in Holistics

Manually cancel jobs in your data warehouse

Description

Instructions

Improvement methods

Add table indexes/clustering keys where relevant

Description

Instructions

Increase Data Warehouse computing resources