Apache Superset - Complete Guide

Apache Superset is an open-source business intelligence and data visualization platform. It lets you connect to databases, write SQL, build charts, and assemble dashboards - all through a web UI. No coding required for end users, but it is deeply configurable for engineers.

Why Superset Exists

Most data lives in databases. Getting insights out of that data traditionally meant either writing raw SQL queries or paying for expensive BI tools like Tableau or Looker.

Superset fills the middle ground:

	Raw SQL	Tableau / Looker	Superset
Cost	Free	$$$$	Free (open-source)
Requires SQL knowledge	Yes	No	Optional
Connects to any DB	Yes	Partial	Yes (50+ connectors)
Shareable dashboards	No	Yes	Yes
Self-hosted	Yes	No	Yes
Custom charts	Yes	Limited	Yes

Superset is also built on top of standard SQL - every chart is backed by a real SQL query you can inspect and edit. There is no proprietary query language to learn.

Alternatives

Metabase: simpler, easier to set up, less powerful
Grafana: excellent for time-series/metrics data, weaker for business analytics
Redash: SQL-first, simpler dashboards
Tableau: enterprise, expensive, most polished
Looker: enterprise, expensive, semantic layer focus

Core Concepts

Superset
├── Databases      <- connection to a DB (Postgres, MySQL, BigQuery, etc.)
├── Datasets       <- a table or custom SQL query from a Database
├── Charts         <- a visualization built on a Dataset
└── Dashboards     <- collection of Charts arranged on a canvas

Database: a connection string to a data source. One Superset instance can connect to many databases.

Dataset: a pointer to data inside a Database. Either a physical table or a virtual dataset (custom SQL). This is where you define metrics and calculated columns.

Chart: a single visualization (bar chart, line chart, pie, table, map, etc.) built from a Dataset.

Dashboard: a canvas where you arrange multiple Charts, add filters, and share with others.

Installation

Option 1: Docker Compose (recommended)

git clone https://github.com/apache/superset.git
cd superset
docker compose -f docker-compose-image-tag.yml up

Open http://localhost:8088. Default credentials: admin / admin.

Option 2: pip (for development)

Requirements: Python 3.9+, pip

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Superset
pip install apache-superset

# Initialize the database
superset db upgrade

# Create admin user
superset fab create-admin \
  --username admin \
  --firstname Admin \
  --lastname User \
  --email admin@example.com \
  --password admin

# Load example data (optional)
superset load_examples

# Initialize default roles
superset init

# Run dev server
superset run -p 8088 --with-threads --reload --debugger

Option 3: Production with Docker

For production, Superset needs:

PostgreSQL or MySQL as its metadata database (stores users, charts, dashboards)
Redis as a cache and Celery broker
Celery workers for async queries

# docker-compose.yml (simplified production setup)
version: "3"
services:
  superset:
    image: apache/superset:latest
    ports:
      - "8088:8088"
    environment:
      - DATABASE_URL=postgresql+psycopg2://superset:password@db/superset
      - REDIS_URL=redis://redis:6379/0
      - SECRET_KEY=your-secret-key-here
    depends_on:
      - db
      - redis

  db:
    image: postgres:16
    environment:
      POSTGRES_USER: superset
      POSTGRES_PASSWORD: password
      POSTGRES_DB: superset

  redis:
    image: redis:7

Configuration

Superset is configured via a Python file. Create superset_config.py:

# superset_config.py

# Secret key - change this in production
SECRET_KEY = 'your-very-secret-key-change-in-production'

# Metadata database (where Superset stores dashboards, users, etc.)
SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://user:pass@localhost/superset'

# Cache (Redis recommended)
CACHE_CONFIG = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_DEFAULT_TIMEOUT': 300,
    'CACHE_KEY_PREFIX': 'superset_',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}

# Celery for async queries
class CeleryConfig:
    broker_url = 'redis://localhost:6379/0'
    imports = ('superset.sql_lab',)
    result_backend = 'redis://localhost:6379/0'
    worker_prefetch_multiplier = 1
    task_acks_late = False

CELERY_CONFIG = CeleryConfig

# Feature flags
FEATURE_FLAGS = {
    'ENABLE_TEMPLATE_PROCESSING': True,   # Jinja templates in SQL
    'DASHBOARD_NATIVE_FILTERS': True,     # Better filter UX
    'ALERT_REPORTS': True,                # Scheduled reports
}

# Allow embedding dashboards
ENABLE_CORS = True

Point Superset to this file:

export SUPERSET_CONFIG_PATH=/path/to/superset_config.py

Connecting a Database

Go to Settings > Database Connections > + Database.

Superset uses SQLAlchemy connection strings:

# PostgreSQL
postgresql+psycopg2://user:password@host:5432/dbname

# MySQL
mysql+mysqlconnector://user:password@host:3306/dbname

# SQLite
sqlite:////path/to/file.db

# BigQuery
bigquery://project-id

# Snowflake
snowflake://user:password@account/database/schema

# Amazon Redshift
redshift+redshift_connector://user:password@host:5439/database

# ClickHouse
clickhousedb://user:password@host:8123/database

You can also install additional database drivers:

# MySQL
pip install mysqlclient

# BigQuery
pip install sqlalchemy-bigquery

# Snowflake
pip install snowflake-sqlalchemy

# ClickHouse
pip install clickhouse-connect

Datasets

A dataset is a table or SQL query that charts are built from.

Physical Dataset

Go to Datasets > + Dataset, select a database and table. Superset reads the schema automatically.

Virtual Dataset (Custom SQL)

A virtual dataset wraps a custom SQL query. Useful for pre-joining tables or defining business logic:

SELECT
  u.id,
  u.name,
  u.email,
  COUNT(p.id) AS post_count,
  MAX(p.created_at) AS last_post_at
FROM users u
LEFT JOIN posts p ON p.author_id = u.id
GROUP BY u.id, u.name, u.email

Metrics and Calculated Columns

Inside a dataset you can define reusable metrics and calculated columns so chart builders do not need to write SQL.

Metric example:

-- Name: total_revenue
-- Expression:
SUM(amount)

Calculated column example:

-- Name: full_name
-- Expression:
first_name || ' ' || last_name

These appear as drag-and-drop fields in the chart builder.

SQL Lab

SQL Lab is Superset’s built-in SQL editor. Go to SQL > SQL Lab.

Features:

Connect to any registered database
Write and run SQL queries
View results in a table
Save queries for reuse
Export results to CSV
Create a dataset from a query result (one click)
Async query execution for long-running queries

-- Example: find top 10 users by post count
SELECT
  u.name,
  COUNT(p.id) AS posts,
  MAX(p.created_at) AS last_active
FROM users u
JOIN posts p ON p.author_id = u.id
WHERE p.published = TRUE
GROUP BY u.name
ORDER BY posts DESC
LIMIT 10;

Jinja Templating in SQL Lab

When ENABLE_TEMPLATE_PROCESSING is true, you can use Jinja2 in SQL:

-- Dynamic date filter
SELECT * FROM events
WHERE event_date >= '{{ since }}' AND event_date < '{{ until }}'

-- Current user filter
SELECT * FROM orders
WHERE sales_rep = '{{ current_username() }}'

Charts

Go to Charts > + Chart, select a dataset, then pick a chart type.

Common Chart Types

Chart	Use case
Table	Raw data with sorting and pagination
Bar Chart	Compare values across categories
Line Chart	Trends over time
Area Chart	Volume/trends over time
Pie / Donut	Part-to-whole proportions
Scatter Plot	Correlation between two metrics
Heatmap	Two-dimensional value distribution
Big Number	Single KPI metric
Funnel	Conversion rates through stages
Map (deck.gl)	Geographic data
Pivot Table	Multi-dimensional aggregation

Chart Builder

The chart builder has three sections:

Query section:

Dimensions (Group By): categorical columns to group data
Metrics: aggregated values (COUNT, SUM, AVG, or custom)
Filters: WHERE conditions
Time column + range: for time-series charts
Sort: how to order results
Row limit: max rows to fetch

Customize section:

Colors, labels, legend placement, axis formatting

Annotations:

Mark specific events on time-series charts (deployments, incidents)

Dashboards

Go to Dashboards > + Dashboard.

Layout

Drag charts from the right panel onto the canvas. Resize by dragging corners. Add:

Charts: visualizations
Headers: text headings
Markdown: rich text blocks
Dividers: visual separators
Tabs: organize charts into tabs

Filters

Dashboard-level filters let viewers slice all charts simultaneously without touching individual chart settings.

Go to Filters > + Filter, configure:

Filter type: value list, time range, time grain, numerical range
Target charts: which charts the filter applies to
Default value

Native filters connect directly to dataset columns. When a user picks a value, Superset adds a WHERE clause to all targeted chart queries.

Share link: generate a URL with the current filter state.

Embedding: embed dashboards in other apps using an iframe. Requires configuring CORS and optionally guest tokens for auth.

# superset_config.py
FEATURE_FLAGS = {
    'EMBEDDABLE_CHARTS': True,
    'DASHBOARD_RBAC': True,
}

# Allow embedding from your domain
TALISMAN_ENABLED = False  # or configure CSP headers properly

Row-Level Security (RLS)

RLS restricts which rows a user can see in a dataset, based on their role or username.

Go to Settings > Row Level Security > + Rule.

-- Rule clause (appended as a WHERE condition to every query on this dataset)
-- This restricts results to only the current user's own data:
user_id = {{ current_user_id() }}

-- Restrict by role:
department = '{{ current_user().department }}'

-- Restrict by a user attribute stored in Superset:
region IN ({{ ",".join(["'%s'" % r for r in current_user().regions]) }})

Assign the rule to specific roles and datasets. Users in those roles only ever see filtered data, even in SQL Lab.

Alerts and Reports

Superset can send scheduled reports and threshold-based alerts via email or Slack. Requires Celery + a mail server or Slack webhook.

# superset_config.py
FEATURE_FLAGS = {'ALERT_REPORTS': True}

SMTP_HOST = 'smtp.gmail.com'
SMTP_PORT = 587
SMTP_USER = 'your@email.com'
SMTP_PASSWORD = 'app-password'
SMTP_MAIL_FROM = 'superset@yourcompany.com'

SLACK_API_TOKEN = 'xoxb-your-slack-token'

Report: send a dashboard screenshot or CSV data on a cron schedule.

Alert: send a notification when a metric crosses a threshold (e.g., error rate > 5%).

How Superset Works Under the Hood

Architecture

Browser
   |
Superset Web Server (Flask / Gunicorn)
   |          |
   |     Celery Workers (async query execution)
   |          |
SQLAlchemy (database abstraction layer)
   |
Data Databases (PostgreSQL, MySQL, BigQuery, etc.)

Superset Metadata DB (PostgreSQL/MySQL) <- stores users, charts, dashboards
Redis <- cache + Celery broker

Request Flow for a Chart Load

Browser requests chart data
Flask receives the request, checks auth and permissions
Superset builds a SQL query from the chart config (metrics, dimensions, filters, time range)
SQLAlchemy executes the query against the target database
Results are optionally cached in Redis
Response is JSON, rendered by the React frontend

For async queries (long-running SQL):

Flask submits the query to a Celery worker via Redis
Worker executes the query, stores result in Redis
Browser polls for the result

Query Generation

When you configure a chart with dimensions, metrics, and filters, Superset generates SQL automatically. You can always click “View query” in any chart to see the exact SQL it runs:

-- Generated for a bar chart: posts per user, last 30 days
SELECT
  users.name AS name,
  COUNT(*) AS count
FROM posts
JOIN users ON posts.author_id = users.id
WHERE posts.created_at >= '2026-04-01'
  AND posts.published = TRUE
GROUP BY users.name
ORDER BY count DESC
LIMIT 100;

Caching

Superset caches query results to avoid re-running expensive queries on every page load. Cache duration is configurable per dataset.

# Cache all queries for 5 minutes by default
CACHE_DEFAULT_TIMEOUT = 300

# Per-dataset cache timeout set in Dataset settings UI

Force-refresh a chart by holding Shift while reloading the dashboard.

Metadata Store

Superset stores all configuration (databases, datasets, charts, dashboards, users, roles, RLS rules) in its own metadata database (separate from your data databases). Back this up regularly. It is the source of truth for everything in Superset.

Roles and Permissions

Superset has a role-based access control system.

Built-in roles:

Admin: full access to everything
Alpha: can create and edit charts/dashboards, cannot manage users or database connections
Gamma: view-only, sees only what is explicitly shared with them
sql_lab: access to SQL Lab
Public: unauthenticated access (if enabled)

Custom roles let you grant fine-grained permissions:

Access to specific databases
Access to specific datasets
Access to specific dashboards
Row-level security rules

Best Practices

1. Use Virtual Datasets for Complex Joins

Do not join tables inside the chart builder. Create a virtual dataset with the joins pre-done. Charts built on it stay simple and the SQL is easier to audit.

2. Define Metrics at the Dataset Level

Reusable metrics (Total Revenue, MAU, Churn Rate) belong in the dataset, not in individual charts. Every chart on that dataset can reuse them without rewriting the SQL.

3. Enable Caching

Without caching, every chart load hits the database. Set sensible cache timeouts per dataset based on how frequently the data updates.

Real-time data    -> 30-60 seconds
Hourly aggregates -> 1 hour
Daily reports     -> 24 hours

4. Use a Dedicated Read Replica

Point Superset at a read replica, not the primary database. BI queries are often large scans that would interfere with production traffic.

5. Back Up the Metadata Database

Everything (charts, dashboards, user configs) lives in the metadata DB. Set up automated backups. Losing it means rebuilding all dashboards from scratch.

6. Export Dashboards as JSON

Superset lets you export dashboards as JSON for version control:

Dashboard > ... > Export

Import them on other Superset instances. Keep these exports in a git repo.

7. Set Row Limits

Always configure row limits on charts and datasets to prevent users from accidentally running full table scans. Default is 10,000 rows in the chart builder.

8. Use RLS for Multi-Tenant Data

If multiple teams or customers share one Superset instance and one database, use Row-Level Security to ensure each group sees only their own data.

9. Use Async Execution for Long Queries

Enable Celery workers and set queries above a threshold to run asynchronously. This prevents long queries from blocking the web server.

# superset_config.py
SQLLAB_ASYNC_TIME_LIMIT_SEC = 60 * 60 * 6  # 6 hour max
SQLLAB_TIMEOUT = 30  # switch to async after 30 seconds

10. Separate Superset from Production DBs with a Firewall Rule

Superset only needs SELECT access to your data databases. Create a read-only user and restrict its network access to the Superset server IP only.

-- PostgreSQL: create read-only user for Superset
CREATE USER superset_reader WITH PASSWORD 'password';
GRANT CONNECT ON DATABASE mydb TO superset_reader;
GRANT USAGE ON SCHEMA public TO superset_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO superset_reader;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO superset_reader;