PostgreSQL is an open-source relational database. It stores data in tables with rows and columns, enforces strict types and constraints, and lets you query that data with SQL. It has been in active development since 1986 and is one of the most feature-complete databases available.
Why PostgreSQL Over the Alternatives
SQLite
SQLite is a file-based database. Great for local development and mobile apps. Not designed for multiple concurrent writers or network access. Not suitable for production web servers.
MySQL / MariaDB
MySQL is fast for simple read-heavy workloads. PostgreSQL beats it on:
- Standards compliance (PostgreSQL follows SQL standards closely)
- Advanced data types (JSONB, arrays, ranges, custom types)
- Complex queries (CTEs, window functions, lateral joins)
- MVCC implementation (MySQL’s is less robust under heavy concurrency)
NoSQL (MongoDB, etc.)
NoSQL gives you flexibility: no schema, nested documents. The tradeoff is you lose joins, transactions, and referential integrity. Most apps that start with NoSQL for “flexibility” end up reimplementing what relational databases give you for free.
PostgreSQL supports JSONB columns, so you can store schemaless data inside a relational database when you genuinely need it.
Installation
macOS
brew install postgresql@16
brew services start postgresql@16
Ubuntu / Debian
sudo apt install postgresql postgresql-contrib
sudo systemctl start postgresql
sudo systemctl enable postgresql
Docker (recommended for dev)
docker run --name pg-dev \
-e POSTGRES_USER=myuser \
-e POSTGRES_PASSWORD=mypassword \
-e POSTGRES_DB=mydb \
-p 5432:5432 \
-d postgres:16
Connect
# Connect as postgres superuser
psql -U postgres
# Connect to specific DB
psql -U myuser -d mydb -h localhost
Core Concepts
Databases, Schemas, Tables
PostgreSQL Server
└── Database (mydb)
└── Schema (public) <- default schema
├── Table (users)
├── Table (posts)
└── Table (comments)
A single PostgreSQL server can host many databases. Each database has schemas (namespaces). The default schema is public. Tables live inside schemas.
Creating a Database
CREATE DATABASE mydb;
\c mydb -- connect to it in psql
Data Types
Common Types
| Type | Use case |
|---|---|
INTEGER / INT | Whole numbers |
BIGINT | Large whole numbers (IDs at scale) |
SERIAL | Auto-incrementing integer (shorthand) |
BIGSERIAL | Auto-incrementing bigint |
NUMERIC(p, s) | Exact decimal (money, measurements) |
FLOAT / REAL | Approximate decimal (scientific data) |
VARCHAR(n) | Variable-length string, max n chars |
TEXT | Unlimited-length string |
BOOLEAN | true / false |
DATE | Date only (2025-01-15) |
TIMESTAMP | Date + time, no timezone |
TIMESTAMPTZ | Date + time + timezone (store everything as this) |
UUID | 128-bit unique identifier |
JSONB | Binary JSON (indexed, queryable) |
JSON | Raw JSON text (no indexing) |
ARRAY | Array of any type |
Type Notes
Use TIMESTAMPTZ over TIMESTAMP. It stores UTC internally and converts to local time on read. TIMESTAMP has no timezone info, which causes bugs.
Use TEXT over VARCHAR(255). PostgreSQL stores them identically. VARCHAR without a limit is the same as TEXT. The limit only makes sense if you genuinely need to enforce a max length.
Use NUMERIC for money, never FLOAT. Floats have rounding errors:
SELECT 0.1 + 0.2; -- 0.30000000000000004 (FLOAT)
SELECT 0.1::NUMERIC + 0.2; -- 0.3 (NUMERIC)
Tables
Creating Tables
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT,
role TEXT NOT NULL DEFAULT 'user',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE posts (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT,
published BOOLEAN NOT NULL DEFAULT FALSE,
author_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Constraints
-- NOT NULL: column must have a value
email TEXT NOT NULL
-- UNIQUE: no two rows can have the same value
email TEXT UNIQUE
-- PRIMARY KEY: NOT NULL + UNIQUE, identifies each row
id BIGSERIAL PRIMARY KEY
-- FOREIGN KEY: must reference an existing row in another table
author_id BIGINT REFERENCES users(id)
-- CHECK: custom condition must be true
age INT CHECK (age >= 0)
price NUMERIC CHECK (price > 0)
-- Table-level constraints (more readable for multi-column)
CONSTRAINT unique_user_email UNIQUE (email)
CONSTRAINT valid_price CHECK (price > 0 AND price < 100000)
ON DELETE behavior for foreign keys
-- CASCADE: delete posts when user is deleted
author_id BIGINT REFERENCES users(id) ON DELETE CASCADE
-- SET NULL: set author_id to NULL when user is deleted
author_id BIGINT REFERENCES users(id) ON DELETE SET NULL
-- RESTRICT: prevent deleting user if they have posts (default)
author_id BIGINT REFERENCES users(id) ON DELETE RESTRICT
Modifying Tables
-- Add column
ALTER TABLE users ADD COLUMN avatar TEXT;
-- Drop column
ALTER TABLE users DROP COLUMN avatar;
-- Rename column
ALTER TABLE users RENAME COLUMN name TO full_name;
-- Change type
ALTER TABLE users ALTER COLUMN age TYPE BIGINT;
-- Add constraint
ALTER TABLE users ADD CONSTRAINT check_age CHECK (age >= 0);
-- Drop table
DROP TABLE users;
DROP TABLE IF EXISTS users; -- no error if doesn't exist
CRUD Operations
INSERT
-- Single row
INSERT INTO users (email, name) VALUES ('alice@example.com', 'Alice');
-- Multiple rows
INSERT INTO users (email, name) VALUES
('bob@example.com', 'Bob'),
('carol@example.com', 'Carol');
-- Return the created row (very useful)
INSERT INTO users (email, name)
VALUES ('dave@example.com', 'Dave')
RETURNING id, email, created_at;
-- Upsert (insert or update on conflict)
INSERT INTO users (email, name)
VALUES ('alice@example.com', 'Alice Updated')
ON CONFLICT (email)
DO UPDATE SET name = EXCLUDED.name, updated_at = NOW();
-- Ignore duplicates
INSERT INTO users (email, name)
VALUES ('alice@example.com', 'Alice')
ON CONFLICT (email) DO NOTHING;
SELECT
-- All columns
SELECT * FROM users;
-- Specific columns
SELECT id, email, name FROM users;
-- With alias
SELECT id, email AS user_email FROM users;
-- Filter
SELECT * FROM users WHERE role = 'admin';
SELECT * FROM users WHERE created_at > '2025-01-01';
SELECT * FROM users WHERE name IS NULL;
SELECT * FROM users WHERE name IS NOT NULL;
SELECT * FROM users WHERE id IN (1, 2, 3);
SELECT * FROM users WHERE email LIKE '%@example.com';
SELECT * FROM users WHERE email ILIKE '%example%'; -- case-insensitive
-- Sort
SELECT * FROM users ORDER BY created_at DESC;
SELECT * FROM users ORDER BY name ASC, created_at DESC;
-- Limit and offset (pagination)
SELECT * FROM users ORDER BY id LIMIT 10 OFFSET 20;
-- Distinct
SELECT DISTINCT role FROM users;
UPDATE
-- Update one field
UPDATE users SET name = 'Alice Smith' WHERE id = 1;
-- Update multiple fields
UPDATE users
SET name = 'Alice Smith', updated_at = NOW()
WHERE id = 1;
-- Update with condition based on another table
UPDATE posts
SET published = TRUE
WHERE author_id IN (SELECT id FROM users WHERE role = 'admin');
-- Return updated rows
UPDATE users SET name = 'Alice Smith' WHERE id = 1
RETURNING id, name, updated_at;
DELETE
DELETE FROM users WHERE id = 1;
DELETE FROM posts WHERE published = FALSE;
-- Return deleted rows
DELETE FROM users WHERE id = 1 RETURNING *;
-- Delete all rows (faster: TRUNCATE)
TRUNCATE TABLE users;
TRUNCATE TABLE users RESTART IDENTITY; -- also resets SERIAL counter
Joins
-- INNER JOIN: rows that match in both tables
SELECT u.name, p.title
FROM users u
INNER JOIN posts p ON p.author_id = u.id;
-- LEFT JOIN: all users, including those with no posts
SELECT u.name, p.title
FROM users u
LEFT JOIN posts p ON p.author_id = u.id;
-- Multiple joins
SELECT u.name, p.title, c.content AS comment
FROM users u
JOIN posts p ON p.author_id = u.id
JOIN comments c ON c.post_id = p.id
WHERE p.published = TRUE;
-- Self join (e.g., employees and their managers)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
Aggregations
-- COUNT, SUM, AVG, MIN, MAX
SELECT COUNT(*) FROM users;
SELECT COUNT(*) FROM users WHERE role = 'admin';
SELECT AVG(price) FROM products;
SELECT SUM(amount) FROM orders WHERE status = 'paid';
-- GROUP BY
SELECT role, COUNT(*) AS total
FROM users
GROUP BY role;
-- HAVING (filter after grouping, like WHERE but for aggregates)
SELECT author_id, COUNT(*) AS post_count
FROM posts
GROUP BY author_id
HAVING COUNT(*) > 5;
-- GROUP BY with JOIN
SELECT u.name, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.author_id = u.id
GROUP BY u.id, u.name
ORDER BY post_count DESC;
Indexes
Indexes speed up reads by letting PostgreSQL find rows without scanning the whole table. The tradeoff is slower writes and extra disk space.
Creating Indexes
-- Basic index (B-tree, the default)
CREATE INDEX idx_posts_author_id ON posts(author_id);
-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Composite index (useful when filtering by multiple columns together)
CREATE INDEX idx_posts_author_published ON posts(author_id, published);
-- Partial index (index only a subset of rows)
CREATE INDEX idx_posts_published ON posts(created_at) WHERE published = TRUE;
-- Index on expression
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
When to Add Indexes
Add indexes on:
- Foreign key columns (
author_id,user_id, etc.) - PostgreSQL does NOT auto-index these - Columns you filter on frequently (
WHERE status = ...,WHERE email = ...) - Columns you sort on frequently (
ORDER BY created_at) - Columns used in joins
Do not index:
- Small tables (full scan is faster)
- Columns with very low cardinality (e.g., a boolean column - not worth it usually)
- Columns rarely used in queries
Checking Index Usage
-- See all indexes on a table
\d posts
-- Check if a query uses an index
EXPLAIN ANALYZE SELECT * FROM posts WHERE author_id = 1;
Look for Index Scan in the output. Seq Scan means no index was used.
Transactions
A transaction groups multiple SQL statements into one atomic unit. Either all succeed or all fail.
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT; -- apply changes
-- or
ROLLBACK; -- undo all changes
ACID Properties
- Atomicity: all statements in a transaction succeed or none do
- Consistency: database stays in a valid state before and after
- Isolation: concurrent transactions do not interfere with each other
- Durability: committed data survives crashes
Savepoints
BEGIN;
INSERT INTO orders (user_id, total) VALUES (1, 99.99);
SAVEPOINT after_order;
INSERT INTO order_items (order_id, product_id) VALUES (1, 999);
-- this fails - product 999 doesn't exist
ROLLBACK TO after_order; -- undo only the failed part
INSERT INTO order_items (order_id, product_id) VALUES (1, 5);
COMMIT;
Advanced Features
CTEs (Common Table Expressions)
CTEs let you break complex queries into named parts. Think of them as temporary named result sets.
-- Basic CTE
WITH active_users AS (
SELECT id, name
FROM users
WHERE last_login > NOW() - INTERVAL '30 days'
)
SELECT * FROM active_users WHERE name ILIKE 'a%';
-- Multiple CTEs
WITH
published_posts AS (
SELECT author_id, COUNT(*) AS count
FROM posts
WHERE published = TRUE
GROUP BY author_id
),
prolific_authors AS (
SELECT author_id FROM published_posts WHERE count > 10
)
SELECT u.name, pp.count
FROM users u
JOIN published_posts pp ON pp.author_id = u.id
WHERE u.id IN (SELECT author_id FROM prolific_authors);
Window Functions
Window functions compute values across a set of rows related to the current row, without collapsing them into groups like GROUP BY does.
-- Row number within each group
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
-- Running total
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
-- Lag/lead (access previous/next row)
SELECT
date,
revenue,
LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
revenue - LAG(revenue) OVER (ORDER BY date) AS change
FROM daily_revenue;
JSONB
JSONB stores JSON as binary and supports indexing and querying inside the document.
-- Store JSON
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
data JSONB NOT NULL
);
INSERT INTO events (data) VALUES
('{"type": "click", "user_id": 1, "element": "button"}'),
('{"type": "view", "user_id": 2, "page": "/home"}');
-- Access fields
SELECT data->>'type' AS event_type FROM events;
SELECT data->>'user_id' AS user_id FROM events WHERE data->>'type' = 'click';
-- Nested access
SELECT data->'meta'->>'source' FROM events;
-- Filter by JSON field
SELECT * FROM events WHERE data->>'type' = 'click';
SELECT * FROM events WHERE (data->>'user_id')::INT = 1;
-- Index a JSONB field
CREATE INDEX idx_events_type ON events((data->>'type'));
-- GIN index for all keys (slower writes, fast for any-key lookups)
CREATE INDEX idx_events_data ON events USING GIN(data);
Arrays
CREATE TABLE posts (
id BIGSERIAL PRIMARY KEY,
tags TEXT[]
);
INSERT INTO posts (tags) VALUES ('{"javascript", "web", "tutorial"}');
-- Query array contains
SELECT * FROM posts WHERE 'javascript' = ANY(tags);
-- Array overlap
SELECT * FROM posts WHERE tags && ARRAY['javascript', 'python'];
-- Unnest array into rows
SELECT id, UNNEST(tags) AS tag FROM posts;
Full-Text Search
-- Basic text search
SELECT * FROM posts
WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('english', 'postgresql & indexing');
-- With index (much faster)
ALTER TABLE posts ADD COLUMN search_vector TSVECTOR;
UPDATE posts SET search_vector =
to_tsvector('english', title || ' ' || COALESCE(content, ''));
CREATE INDEX idx_posts_search ON posts USING GIN(search_vector);
SELECT title FROM posts WHERE search_vector @@ to_tsquery('english', 'postgresql');
How PostgreSQL Works Under the Hood
Storage: Pages and Tuples
PostgreSQL stores all data in 8KB pages on disk. Each table is a file of pages. Each row is called a tuple. Pages also store indexes, free space maps, and visibility maps.
Table file on disk:
[ Page 0 (8KB) ] [ Page 1 (8KB) ] [ Page 2 (8KB) ] ...
|
v
[ Header | Item pointers | ... free space ... | Tuples ]
MVCC (Multi-Version Concurrency Control)
MVCC is how PostgreSQL lets multiple transactions read and write concurrently without blocking each other.
When you UPDATE a row, PostgreSQL does not overwrite the old row. It:
- Marks the old row as “deleted” (sets
xmaxto the current transaction ID) - Inserts a new row with the updated values (sets
xminto the current transaction ID)
Before UPDATE:
Row: { id: 1, name: "Alice", xmin: 100, xmax: 0 }
After UPDATE name to "Alice Smith" in transaction 200:
Row: { id: 1, name: "Alice", xmin: 100, xmax: 200 } <- old version
Row: { id: 1, name: "Alice Smith", xmin: 200, xmax: 0 } <- new version
Readers see different versions depending on when their transaction started. A reader that started before transaction 200 still sees “Alice”. A reader that started after sees “Alice Smith”. No locks needed for reads.
This is why reads never block writes and writes never block reads in PostgreSQL.
Dead Tuples and VACUUM
Because old row versions stick around after updates and deletes, tables accumulate dead tuples. VACUUM cleans them up so disk space can be reused.
-- Manual vacuum (usually automatic)
VACUUM posts;
-- Vacuum + reclaim disk space (locks table briefly)
VACUUM FULL posts;
-- Check dead tuple count
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
autovacuum runs in the background automatically. Do not disable it.
WAL (Write-Ahead Log)
Before PostgreSQL writes any change to the actual data files, it writes a record of the change to the WAL (a sequential log file). If the server crashes mid-write, PostgreSQL replays the WAL on restart to recover to a consistent state.
WAL is also the mechanism behind replication. Replicas stream the WAL from the primary and apply the same changes locally.
Write path:
Client INSERT -> WAL entry written -> ACK to client
-> Data page written (async)
The WAL write is fast (sequential). The data page write happens later. This is why PostgreSQL can be fast and durable at the same time.
Query Planner
When you run a query, PostgreSQL does not execute it directly. It:
- Parses the SQL into a query tree
- Rewrites the query (views, rules expansion)
- Plans multiple possible execution strategies and estimates their cost
- Executes the cheapest plan
-- See the query plan
EXPLAIN SELECT * FROM posts WHERE author_id = 1;
-- See plan + actual runtime stats
EXPLAIN ANALYZE SELECT * FROM posts WHERE author_id = 1;
The planner uses table statistics (row counts, column value distributions) to estimate costs. You can update statistics manually:
ANALYZE posts;
PostgreSQL runs ANALYZE automatically via autovacuum, but after large bulk inserts it can help to run it manually.
Connection Model
PostgreSQL uses a process-per-connection model. Each client connection spawns a separate OS process. This is heavier than thread-per-connection (like MySQL). The practical limit is around 100-300 connections before performance degrades from context switching and memory overhead.
For production, put a connection pooler (PgBouncer) in front of PostgreSQL. PgBouncer maintains a small pool of actual DB connections and multiplexes many application connections onto them.
Useful psql Commands
\l -- list databases
\c mydb -- connect to database
\dt -- list tables
\d users -- describe table (columns, indexes, constraints)
\di -- list indexes
\du -- list users/roles
\timing -- toggle query timing
\e -- open query in editor
\q -- quit
-- Run a file
\i /path/to/file.sql
Best Practices
1. Always Use TIMESTAMPTZ
Store all timestamps with timezone. Always store in UTC, display in local time in the application layer.
2. Use BIGSERIAL for IDs, Not SERIAL
SERIAL is a 32-bit integer (max ~2 billion rows). BIGSERIAL is 64-bit. Start with BIGSERIAL and you will never hit the limit.
Or use UUIDs for distributed systems where IDs are generated across multiple servers:
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
...
);
3. Index Foreign Keys
PostgreSQL does not auto-create indexes on foreign key columns. Add them manually or every join and cascade operation does a full table scan.
CREATE INDEX idx_posts_author_id ON posts(author_id);
CREATE INDEX idx_comments_post_id ON comments(post_id);
4. Use RETURNING Instead of a Second Query
-- Instead of INSERT then SELECT
INSERT INTO users (email) VALUES ('x@x.com') RETURNING id;
UPDATE users SET name = 'Bob' WHERE id = 1 RETURNING *;
5. Use Connection Pooling in Production
Do not let your app connect to PostgreSQL directly with hundreds of connections. Use PgBouncer or a managed pooler (like Supabase’s PgBouncer or Neon’s connection pooling).
6. Avoid SELECT *
Fetch only what you need. SELECT * wastes bandwidth and breaks code when columns are added or reordered.
7. Use Transactions for Multi-Step Writes
Any operation that touches multiple tables should be wrapped in a transaction. Otherwise a crash halfway through leaves the database in a partially updated state.
8. EXPLAIN ANALYZE Slow Queries
When a query is slow, always check the plan first before adding indexes blindly.
EXPLAIN ANALYZE SELECT * FROM posts WHERE published = TRUE ORDER BY created_at DESC LIMIT 10;
Look for: Seq Scan on large tables, high actual rows vs estimated rows mismatch, sort operations without an index.
9. Avoid Long-Running Transactions
Long transactions hold locks and prevent VACUUM from cleaning dead tuples. Keep transactions short. Do not do external API calls inside a transaction.
10. Use CHECK Constraints for Data Integrity
Put invariants in the database, not just in application code:
ALTER TABLE products ADD CONSTRAINT price_positive CHECK (price > 0);
ALTER TABLE users ADD CONSTRAINT valid_email CHECK (email LIKE '%@%.%');