Apache Superset is an open-source business intelligence and data visualization platform. It lets you connect to databases, write SQL, build charts, and assemble dashboards - all through a web UI. No coding required for end users, but it is deeply configurable for engineers.
Why Superset Exists
Most data lives in databases. Getting insights out of that data traditionally meant either writing raw SQL queries or paying for expensive BI tools like Tableau or Looker.
Superset fills the middle ground:
| Raw SQL | Tableau / Looker | Superset | |
|---|---|---|---|
| Cost | Free | $$$$ | Free (open-source) |
| Requires SQL knowledge | Yes | No | Optional |
| Connects to any DB | Yes | Partial | Yes (50+ connectors) |
| Shareable dashboards | No | Yes | Yes |
| Self-hosted | Yes | No | Yes |
| Custom charts | Yes | Limited | Yes |
Superset is also built on top of standard SQL - every chart is backed by a real SQL query you can inspect and edit. There is no proprietary query language to learn.
Alternatives
- Metabase: simpler, easier to set up, less powerful
- Grafana: excellent for time-series/metrics data, weaker for business analytics
- Redash: SQL-first, simpler dashboards
- Tableau: enterprise, expensive, most polished
- Looker: enterprise, expensive, semantic layer focus
Core Concepts
Superset
├── Databases <- connection to a DB (Postgres, MySQL, BigQuery, etc.)
├── Datasets <- a table or custom SQL query from a Database
├── Charts <- a visualization built on a Dataset
└── Dashboards <- collection of Charts arranged on a canvas
Database: a connection string to a data source. One Superset instance can connect to many databases.
Dataset: a pointer to data inside a Database. Either a physical table or a virtual dataset (custom SQL). This is where you define metrics and calculated columns.
Chart: a single visualization (bar chart, line chart, pie, table, map, etc.) built from a Dataset.
Dashboard: a canvas where you arrange multiple Charts, add filters, and share with others.
Installation
Option 1: Docker Compose (recommended)
git clone https://github.com/apache/superset.git
cd superset
docker compose -f docker-compose-image-tag.yml up
Open http://localhost:8088. Default credentials: admin / admin.
Option 2: pip (for development)
Requirements: Python 3.9+, pip
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install Superset
pip install apache-superset
# Initialize the database
superset db upgrade
# Create admin user
superset fab create-admin \
--username admin \
--firstname Admin \
--lastname User \
--email admin@example.com \
--password admin
# Load example data (optional)
superset load_examples
# Initialize default roles
superset init
# Run dev server
superset run -p 8088 --with-threads --reload --debugger
Option 3: Production with Docker
For production, Superset needs:
- PostgreSQL or MySQL as its metadata database (stores users, charts, dashboards)
- Redis as a cache and Celery broker
- Celery workers for async queries
# docker-compose.yml (simplified production setup)
version: "3"
services:
superset:
image: apache/superset:latest
ports:
- "8088:8088"
environment:
- DATABASE_URL=postgresql+psycopg2://superset:password@db/superset
- REDIS_URL=redis://redis:6379/0
- SECRET_KEY=your-secret-key-here
depends_on:
- db
- redis
db:
image: postgres:16
environment:
POSTGRES_USER: superset
POSTGRES_PASSWORD: password
POSTGRES_DB: superset
redis:
image: redis:7
Configuration
Superset is configured via a Python file. Create superset_config.py:
# superset_config.py
# Secret key - change this in production
SECRET_KEY = 'your-very-secret-key-change-in-production'
# Metadata database (where Superset stores dashboards, users, etc.)
SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://user:pass@localhost/superset'
# Cache (Redis recommended)
CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': 300,
'CACHE_KEY_PREFIX': 'superset_',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}
# Celery for async queries
class CeleryConfig:
broker_url = 'redis://localhost:6379/0'
imports = ('superset.sql_lab',)
result_backend = 'redis://localhost:6379/0'
worker_prefetch_multiplier = 1
task_acks_late = False
CELERY_CONFIG = CeleryConfig
# Feature flags
FEATURE_FLAGS = {
'ENABLE_TEMPLATE_PROCESSING': True, # Jinja templates in SQL
'DASHBOARD_NATIVE_FILTERS': True, # Better filter UX
'ALERT_REPORTS': True, # Scheduled reports
}
# Allow embedding dashboards
ENABLE_CORS = True
Point Superset to this file:
export SUPERSET_CONFIG_PATH=/path/to/superset_config.py
Connecting a Database
Go to Settings > Database Connections > + Database.
Superset uses SQLAlchemy connection strings:
# PostgreSQL
postgresql+psycopg2://user:password@host:5432/dbname
# MySQL
mysql+mysqlconnector://user:password@host:3306/dbname
# SQLite
sqlite:////path/to/file.db
# BigQuery
bigquery://project-id
# Snowflake
snowflake://user:password@account/database/schema
# Amazon Redshift
redshift+redshift_connector://user:password@host:5439/database
# ClickHouse
clickhousedb://user:password@host:8123/database
You can also install additional database drivers:
# MySQL
pip install mysqlclient
# BigQuery
pip install sqlalchemy-bigquery
# Snowflake
pip install snowflake-sqlalchemy
# ClickHouse
pip install clickhouse-connect
Datasets
A dataset is a table or SQL query that charts are built from.
Physical Dataset
Go to Datasets > + Dataset, select a database and table. Superset reads the schema automatically.
Virtual Dataset (Custom SQL)
A virtual dataset wraps a custom SQL query. Useful for pre-joining tables or defining business logic:
SELECT
u.id,
u.name,
u.email,
COUNT(p.id) AS post_count,
MAX(p.created_at) AS last_post_at
FROM users u
LEFT JOIN posts p ON p.author_id = u.id
GROUP BY u.id, u.name, u.email
Metrics and Calculated Columns
Inside a dataset you can define reusable metrics and calculated columns so chart builders do not need to write SQL.
Metric example:
-- Name: total_revenue
-- Expression:
SUM(amount)
Calculated column example:
-- Name: full_name
-- Expression:
first_name || ' ' || last_name
These appear as drag-and-drop fields in the chart builder.
SQL Lab
SQL Lab is Superset’s built-in SQL editor. Go to SQL > SQL Lab.
Features:
- Connect to any registered database
- Write and run SQL queries
- View results in a table
- Save queries for reuse
- Export results to CSV
- Create a dataset from a query result (one click)
- Async query execution for long-running queries
-- Example: find top 10 users by post count
SELECT
u.name,
COUNT(p.id) AS posts,
MAX(p.created_at) AS last_active
FROM users u
JOIN posts p ON p.author_id = u.id
WHERE p.published = TRUE
GROUP BY u.name
ORDER BY posts DESC
LIMIT 10;
Jinja Templating in SQL Lab
When ENABLE_TEMPLATE_PROCESSING is true, you can use Jinja2 in SQL:
-- Dynamic date filter
SELECT * FROM events
WHERE event_date >= '{{ since }}' AND event_date < '{{ until }}'
-- Current user filter
SELECT * FROM orders
WHERE sales_rep = '{{ current_username() }}'
Charts
Go to Charts > + Chart, select a dataset, then pick a chart type.
Common Chart Types
| Chart | Use case |
|---|---|
| Table | Raw data with sorting and pagination |
| Bar Chart | Compare values across categories |
| Line Chart | Trends over time |
| Area Chart | Volume/trends over time |
| Pie / Donut | Part-to-whole proportions |
| Scatter Plot | Correlation between two metrics |
| Heatmap | Two-dimensional value distribution |
| Big Number | Single KPI metric |
| Funnel | Conversion rates through stages |
| Map (deck.gl) | Geographic data |
| Pivot Table | Multi-dimensional aggregation |
Chart Builder
The chart builder has three sections:
Query section:
- Dimensions (Group By): categorical columns to group data
- Metrics: aggregated values (COUNT, SUM, AVG, or custom)
- Filters: WHERE conditions
- Time column + range: for time-series charts
- Sort: how to order results
- Row limit: max rows to fetch
Customize section:
- Colors, labels, legend placement, axis formatting
Annotations:
- Mark specific events on time-series charts (deployments, incidents)
Dashboards
Go to Dashboards > + Dashboard.
Layout
Drag charts from the right panel onto the canvas. Resize by dragging corners. Add:
- Charts: visualizations
- Headers: text headings
- Markdown: rich text blocks
- Dividers: visual separators
- Tabs: organize charts into tabs
Filters
Dashboard-level filters let viewers slice all charts simultaneously without touching individual chart settings.
Go to Filters > + Filter, configure:
- Filter type: value list, time range, time grain, numerical range
- Target charts: which charts the filter applies to
- Default value
Native filters connect directly to dataset columns. When a user picks a value, Superset adds a WHERE clause to all targeted chart queries.
Sharing and Embedding
Share link: generate a URL with the current filter state.
Embedding: embed dashboards in other apps using an iframe. Requires configuring CORS and optionally guest tokens for auth.
# superset_config.py
FEATURE_FLAGS = {
'EMBEDDABLE_CHARTS': True,
'DASHBOARD_RBAC': True,
}
# Allow embedding from your domain
TALISMAN_ENABLED = False # or configure CSP headers properly
Row-Level Security (RLS)
RLS restricts which rows a user can see in a dataset, based on their role or username.
Go to Settings > Row Level Security > + Rule.
-- Rule clause (appended as a WHERE condition to every query on this dataset)
-- This restricts results to only the current user's own data:
user_id = {{ current_user_id() }}
-- Restrict by role:
department = '{{ current_user().department }}'
-- Restrict by a user attribute stored in Superset:
region IN ({{ ",".join(["'%s'" % r for r in current_user().regions]) }})
Assign the rule to specific roles and datasets. Users in those roles only ever see filtered data, even in SQL Lab.
Alerts and Reports
Superset can send scheduled reports and threshold-based alerts via email or Slack. Requires Celery + a mail server or Slack webhook.
# superset_config.py
FEATURE_FLAGS = {'ALERT_REPORTS': True}
SMTP_HOST = 'smtp.gmail.com'
SMTP_PORT = 587
SMTP_USER = 'your@email.com'
SMTP_PASSWORD = 'app-password'
SMTP_MAIL_FROM = 'superset@yourcompany.com'
SLACK_API_TOKEN = 'xoxb-your-slack-token'
Report: send a dashboard screenshot or CSV data on a cron schedule.
Alert: send a notification when a metric crosses a threshold (e.g., error rate > 5%).
How Superset Works Under the Hood
Architecture
Browser
|
Superset Web Server (Flask / Gunicorn)
| |
| Celery Workers (async query execution)
| |
SQLAlchemy (database abstraction layer)
|
Data Databases (PostgreSQL, MySQL, BigQuery, etc.)
Superset Metadata DB (PostgreSQL/MySQL) <- stores users, charts, dashboards
Redis <- cache + Celery broker
Request Flow for a Chart Load
- Browser requests chart data
- Flask receives the request, checks auth and permissions
- Superset builds a SQL query from the chart config (metrics, dimensions, filters, time range)
- SQLAlchemy executes the query against the target database
- Results are optionally cached in Redis
- Response is JSON, rendered by the React frontend
For async queries (long-running SQL):
- Flask submits the query to a Celery worker via Redis
- Worker executes the query, stores result in Redis
- Browser polls for the result
Query Generation
When you configure a chart with dimensions, metrics, and filters, Superset generates SQL automatically. You can always click “View query” in any chart to see the exact SQL it runs:
-- Generated for a bar chart: posts per user, last 30 days
SELECT
users.name AS name,
COUNT(*) AS count
FROM posts
JOIN users ON posts.author_id = users.id
WHERE posts.created_at >= '2026-04-01'
AND posts.published = TRUE
GROUP BY users.name
ORDER BY count DESC
LIMIT 100;
Caching
Superset caches query results to avoid re-running expensive queries on every page load. Cache duration is configurable per dataset.
# Cache all queries for 5 minutes by default
CACHE_DEFAULT_TIMEOUT = 300
# Per-dataset cache timeout set in Dataset settings UI
Force-refresh a chart by holding Shift while reloading the dashboard.
Metadata Store
Superset stores all configuration (databases, datasets, charts, dashboards, users, roles, RLS rules) in its own metadata database (separate from your data databases). Back this up regularly. It is the source of truth for everything in Superset.
Roles and Permissions
Superset has a role-based access control system.
Built-in roles:
- Admin: full access to everything
- Alpha: can create and edit charts/dashboards, cannot manage users or database connections
- Gamma: view-only, sees only what is explicitly shared with them
- sql_lab: access to SQL Lab
- Public: unauthenticated access (if enabled)
Custom roles let you grant fine-grained permissions:
- Access to specific databases
- Access to specific datasets
- Access to specific dashboards
- Row-level security rules
Best Practices
1. Use Virtual Datasets for Complex Joins
Do not join tables inside the chart builder. Create a virtual dataset with the joins pre-done. Charts built on it stay simple and the SQL is easier to audit.
2. Define Metrics at the Dataset Level
Reusable metrics (Total Revenue, MAU, Churn Rate) belong in the dataset, not in individual charts. Every chart on that dataset can reuse them without rewriting the SQL.
3. Enable Caching
Without caching, every chart load hits the database. Set sensible cache timeouts per dataset based on how frequently the data updates.
Real-time data -> 30-60 seconds
Hourly aggregates -> 1 hour
Daily reports -> 24 hours
4. Use a Dedicated Read Replica
Point Superset at a read replica, not the primary database. BI queries are often large scans that would interfere with production traffic.
5. Back Up the Metadata Database
Everything (charts, dashboards, user configs) lives in the metadata DB. Set up automated backups. Losing it means rebuilding all dashboards from scratch.
6. Export Dashboards as JSON
Superset lets you export dashboards as JSON for version control:
Dashboard > ... > Export
Import them on other Superset instances. Keep these exports in a git repo.
7. Set Row Limits
Always configure row limits on charts and datasets to prevent users from accidentally running full table scans. Default is 10,000 rows in the chart builder.
8. Use RLS for Multi-Tenant Data
If multiple teams or customers share one Superset instance and one database, use Row-Level Security to ensure each group sees only their own data.
9. Use Async Execution for Long Queries
Enable Celery workers and set queries above a threshold to run asynchronously. This prevents long queries from blocking the web server.
# superset_config.py
SQLLAB_ASYNC_TIME_LIMIT_SEC = 60 * 60 * 6 # 6 hour max
SQLLAB_TIMEOUT = 30 # switch to async after 30 seconds
10. Separate Superset from Production DBs with a Firewall Rule
Superset only needs SELECT access to your data databases. Create a read-only user and restrict its network access to the Superset server IP only.
-- PostgreSQL: create read-only user for Superset
CREATE USER superset_reader WITH PASSWORD 'password';
GRANT CONNECT ON DATABASE mydb TO superset_reader;
GRANT USAGE ON SCHEMA public TO superset_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO superset_reader;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO superset_reader;