Connect to Databricks
Introduction
This documentation provides a comprehensive guide on connecting Databricks to Holistics as a data source. In the following sections, we will walk you through the necessary steps:
- Creating a personal access token and granting permissions for Holistics to interact with Databricks.
- Establishing the connection between Databricks and Holistics as a data source.
Create Databricks personal access token for Holistics
Create a personal access token for a Databricks user following the Databricks Personal Access Tokens Documentation
Assign specific privileges to your
DATABRICKS_USERNAME
as outlined below:
Grant USE CATALOG privilege on your catalog. This permits the use of a catalog. Refer to the Catalogs Guide for more information.
Grant USE SCHEMA privilege on your schema. This permits the use of a schema. Find additional details in the Schemas Guide.
Grant SELECT privilege on the tables, it is required for users to query a table. Refer to the Tables Guide for more information.
For further insights, we recommened reading about Databricks permission.
Connect Databricks to Holistics as a data source
From Holistics app's header bar, open Organization Settings menu -> Data Sources.
Click New Data Source button, and select Databricks as the database type. Then fill in the form, you can find this connection details by following Databricks’s Retrieve Connection Details doc.
Display Name - Provide a name for the connection.
Host - Enter the Databricks hostname without the http prefix.
For example, if your host is
https://dbc-a1b2345c-d6e7.cloud.databricks.com
, you should input:dbc-a1b2345c-d6e7.cloud.databricks.com
.Port - The default is 443.
Usersname - Databricks username for dedicated Holistics user.
Password - Databricks password for dedicated Holistics user.
HTTP Path - Databricks compute resources URL
Catalog - Databricks catalog
Click on Test connection to ensure the connection works, and then Save to complete the process.
Now you can start using your Databricks data source within Holistics.
FAQs
Does Holistics support legacy Hive metastore?
While Holistics can execute queries on data warehouse in a legacy Hive metastore, it does not provide support for fetching the database schema from it.
Suggested Solutions/Workarounds:
- Databricks recommends migrating tables managed by the Hive metastore to the Unity Catalog metastore. Guidelines for this process can be found here.
- Although fetching the database schema is not possible, users can construct the modeling by creating Transform Models. However, creating Table Models will not be feasible without the database schema.
How to query tables from multiple catalogs in Databricks?
Databricks supports querying across multiple catalogs using a three-level namespace that includes the catalog, schema, and table. However, Holistics Modeling primarily supports the schema and table levels, making direct querying of tables from multiple databases currently impossible.
Workaround:
Assuming your current catalog (in DataSource connection form) is catalog1, and you want to join with data in catalog2, create a Query Model that refers to tables in Catalog2. Create a Query Model instead of a 'table model' to link to your database table:
Model catalog2 {
type: 'query'
data_source_name: 'databricks'
dimension field1 {
type: 'number'
}
...
query: @sql select * from catalog2.public.users ;;
}
This model fetches data from catalog2.public.users
(Catalog2) rather than the current Catalog1. You can treat this model like a standard table model and create relationships to connect it with other models in Catalog1.
Note: Ensure you have USE CATALOG permission on all catalogs.
Why does my query/report occasionally take a very long time to respond?
There are some common reasons:
- Databricks has a feature to terminate the database/cluster after it goes idle. After termination, a new query may take a few minutes to restart the database before execution.html#cluster-autostart-for-jobs-and-jdbcodbc-queries) the database first before actually getting executed.
- You can also check out Common reasons that make your report job slow documentation.