Databricks Integration Guide
Guide to integrating with Databricks using Direct Connect, including best practices and optimization tips.
Steps to Integrate with Databricks
-
Create a Serverless SQL Warehouse
Begin by setting up a Serverless SQL Warehouse. The size should match your data volume requirements. -
Establish a Service Principal
Set up a Service Principal with these permissions:- SELECT Permission: Grant access to the schema containing views for Kubit.
- Warehouse Access: Allow execution of analyses.
- Query History Access: Enable access to the Query History for troubleshooting.
Ensure a separate setup for development to prevent production impact.
Best Practices for Optimization
Partitioning
Utilize table partitions to enhance query speed and cost efficiency. For optimal performance, use an event date column as the partition key in your fact tables. Alternatively, you can use year, month, and day as partition keys without needing a separate date column.
Configuring Your Warehouse
Consider these four parameters when configuring your warehouse:
- Cluster Size: Start small and adjust based on Query History insights. If performance issues arise, use Query Profile to check for disk spills and increase size if necessary.
- Auto Stop: Set the auto-stop to minimize idle time costs. We recommend
5 minutesvia the UI or1 minuteusing the warehouse SQL API. - Scaling: Begin with a minimum and maximum of
1. Scale only when running 10 queries concurrently frequently. - Type: Opt for Serverless unless continuous 24/7 operation is required, where Pro might be more cost-effective.
Monitoring and Cost Management
Leverage Databricks' monitoring tools to manage costs:
- Monitor a SQL warehouse
- Account console usage monitoring
- Cost Attribution Queries
- Query history reference
Important Note
A stopped Serverless Warehouse may start without a query in these cases:
- A connection from a JDBC/ODBC interface.
- Opening a dashboard linked to a dashboard-level warehouse.
For more details, see Start a SQL Warehouse.
Updated 20 days ago