What is a Database?
A database is software that imposes structure and rules on raw, messy disk storage so data can be accessed efficiently and reliably.
Databases are modular systems consisting of multiple software components:
- Transport layer: Accepts requests
- Query processor: Determines the most efficient way to run queries
- Execution engine: Carries out the operations
- Storage engine: Manages data persistence
Client/Server Model
- Database instances (servers): Long-running processes (postgres, mysqld, mongod) that include:
- CPU threads
- Memory (buffer cache)
- Open files (data files, WAL, indexes)
- Network listeners (ports like 5432, 3306)
- Application instances (clients): Request data from the database (Node.js APIs, mobile apps, etc.)
Architectural Layers of Database instances

The layered architecture of a database management system, showing how queries flow from transport through query processor, execution engine, to storage engine.
Transport / Communication Subsystem
- Handles connection protocol (e.g., TCP/IP, Unix Sockets)
- Manages handshake and receives the raw SQL string
- Receives queries from clients
- Hands off to the query processor
- Performs access control after parsing, ensuring security based on the interpreted request
Query Processor
- Query Parser:
- Syntax analysis: Checks grammar (e.g., missing semicolons, misspelled SELECT)
- Semantic analysis: Checks validity (e.g., does the table exist? Does the column belong to it?)
- Authorization check: Checks permissions (e.g., does the user have READ access?)
- Query Optimizer:
- Statistics analysis: Examines data distribution and index information
- Plan generation: Decides the most efficient execution strategy
- Should we use an index or scan the table?
- In what order should we join tables?
- Redundancy elimination: Removes impossible and redundant query parts
- Optimization techniques:
- Index ordering selection
- Cardinality estimation
- Access method selection
- Cost-based decision making (considers transfer costs across cluster nodes)
- Output: Physical execution plan (sequence of operations)
Execution Engine
-
Receives the physical execution plan from the Query Optimizer
-
Executes operators in the plan:
- Table scan: Sequentially read all tuples from a table
- Index scan: Use an index to find specific tuples
- Join: Combine data from multiple tables (Nested Loop / Hash / Merge)
- Filter: Apply WHERE conditions
- Projection: Select specific columns
- Aggregation: Compute GROUP BY, SUM, COUNT, etc.
-
Controls execution flow:
- Tuple-by-tuple execution: Process one row at a time (traditional iterator model)
- Batch-based execution: Process multiple rows together for better performance
-
Requests data from the Storage & Transaction Subsystem
-
Coordinates with:
- Transaction manager: Ensures proper isolation levels
- Lock manager: Manages concurrent access to data
-
Streams result tuples upward to the communication subsystem
Storage & Transaction Subsystem
- Stores, retrieves, and manages data in memory and on disk
- Acts as the persistent, long-term memory of each node
- Provides a simple API for creating, updating, deleting, and retrieving records
- DBMSs build on storage engines to add features like schemas, query languages, indexing, and transactions
- Transaction/Lock Managers:
- Ensure ACID properties (Atomicity, Consistency, Isolation, Durability)
- Handle concurrency so two people don't overwrite the same data simultaneously
- Buffer Manager:
- Keeps frequently accessed data in RAM for performance, reducing slow disk reads
- Access Methods:
- Code that knows how to read specific file formats (like B-Trees or Heap files)
- Recovery Manager:
- Maintains logs and restores system state
Query Flow
Client
|
| SQL Query
v
Transport / Communication Subsystem
|
v
Query Parser
| - Syntax analysis
| - Semantic analysis
| - Authorization check
v
Query Optimizer
| - Statistics analysis
| - Cost-based planning
| - Physical plan generation
v
Execution Engine (Execution Layer)
| - Receives physical plan
| - Executes operators (scan, join, filter, etc.)
| - Coordinates with transaction/lock managers
| - Requests data from the Storage & Transaction Subsystem
|
v
Storage & Transaction Subsystem
| - Transaction Manager
| - Lock Manager
| - Access Methods
| - Buffer Manager
| - Recovery Manager
|
| (data pages/tuples)
v
Execution Engine
| - Aggregates results
| - Streams result tuples upward
v
Transport Subsystem
|
v
Client