Matrix Multiplication: Standard to Strassen

Standard (Naive) Matrix Multiplication

The Problem Setup

We are multiplying two $n \times n$ matrices, $A$ and $B$ , to produce a result matrix $C$ .

Input Matrices:

$A = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}, \quad B = \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{pmatrix}$

Result Matrix:

$C = \begin{pmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \end{pmatrix}$

Calculation of Elements (Dot Product)

To find each element of matrix $C$ , we calculate the dot product of the corresponding row from $A$ and column from $B$ .

$c_{11}$ (Row 1 of A · Col 1 of B): $c_{11} = a_{11} \cdot b_{11} + a_{12} \cdot b_{21}$

$c_{12}$ (Row 1 of A · Col 2 of B): $c_{12} = a_{11} \cdot b_{12} + a_{12} \cdot b_{22}$

$c_{21}$ (Row 2 of A · Col 1 of B): $c_{21} = a_{21} \cdot b_{11} + a_{22} \cdot b_{21}$

$c_{22}$ (Row 2 of A · Col 2 of B): $c_{22} = a_{21} \cdot b_{12} + a_{22} \cdot b_{22}$

The General Formula

Mathematically, the value of any cell $c_{ij}$ is the summation of the products:

$c_{ij} = \sum_{k=1}^{n} a_{ik} \cdot b_{kj}$

Algorithm Implementation

// Standard Matrix Multiplication - O(n³) Time Complexity
for (i = 0; i < n; i++) {
    // Iterate through columns of B
    for (j = 0; j < n; j++) {
        C[i][j] = 0; // Initialize cell
        // Iterate through common dimension
        for (k = 0; k < n; k++) {
            C[i][j] += A[i][k] * B[k][j];
        }
    }
}

Time Complexity Analysis

Outer loop (i): Runs $n$ times
Middle loop (j): Runs $n$ times
Inner loop (k): Runs $n$ times
Total Operations: $n \times n \times n$ multiplications

Complexity: $O(n^3)$

Divide & Conquer Strategy (Naive Approach)

Instead of iterating row-by-row, we divide matrices $A$ and $B$ into four $\frac{n}{2} \times \frac{n}{2}$ sub-matrices.

Structure

$A = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}, \quad B = \begin{pmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{pmatrix}, \quad C = \begin{pmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{pmatrix}$

The Algorithm `MM(A, B, n)`

Base Case: If $n = 1$ , use standard scalar multiplication
Recursive Step: Calculate the 4 quadrants of result matrix $C$ by making 8 recursive calls:

$C_{11} = MM(A_{11}, B_{11}, n/2) + MM(A_{12}, B_{21}, n/2)$

$C_{12} = MM(A_{11}, B_{12}, n/2) + MM(A_{12}, B_{22}, n/2)$

$C_{21} = MM(A_{21}, B_{11}, n/2) + MM(A_{22}, B_{21}, n/2)$

$C_{22} = MM(A_{21}, B_{12}, n/2) + MM(A_{22}, B_{22}, n/2)$

Complexity Analysis (Naive Recursive)

Using the Master Theorem on the recurrence relation:

$T(n) = 8T(n/2) + O(n^2)$

$a = 8$ (number of subproblems)
$b = 2$ (division factor)
$d = 2$ (cost of combining solutions)
Result: $T(n) = O(n^3)$ (Same as standard loop method)

Strassen’s Matrix Multiplication

Introduction

Strassen’s algorithm optimizes the Divide and Conquer approach by reducing the number of recursive multiplications from 8 to 7. This reduction significantly improves the time complexity.

While the naive divide-and-conquer approach is $O(n^3)$ , Strassen’s approach achieves:

$T(n) = O(n^{\log_2 7}) \approx O(n^{2.81})$

The 7 Intermediate Matrices (M₁ to M₇)

Instead of calculating the 4 quadrants of $C$ directly, Strassen computes these 7 intermediate values first:

$M_1 = (A_{11} + A_{22})(B_{11} + B_{22})$

$M_2 = (A_{21} + A_{22})B_{11}$

$M_3 = A_{11}(B_{12} - B_{22})$

$M_4 = A_{22}(B_{21} - B_{11})$

$M_5 = (A_{11} + A_{12})B_{22}$

$M_6 = (A_{21} - A_{11})(B_{11} + B_{12})$

$M_7 = (A_{12} - A_{22})(B_{21} + B_{22})$

Operations Count: 7 Matrix Multiplications, 18 Additions/Subtractions

Calculating the Final Result (C)

The result matrix is derived from these 7 products:

$C_{11} = M_1 + M_4 - M_5 + M_7$

$C_{12} = M_3 + M_5$

$C_{21} = M_2 + M_4$

$C_{22} = M_1 - M_2 + M_3 + M_6$

Complexity Analysis (Strassen)

Using the Master Theorem on the recurrence relation:

$T(n) = 7T(n/2) + O(n^2)$

$a = 7$ (number of subproblems)
$b = 2$ (division factor)
$d = 2$ (cost of combining solutions)
Result: $T(n) = O(n^{\log_2 7}) \approx O(n^{2.81})$

Complexity Comparison

Algorithm	Time Complexity	Matrix Multiplications
Standard (Naive Loop)	$O(n^3)$	1 (iterative)
Divide & Conquer (Naive)	$O(n^3)$	8
Strassen	$O(n^{2.81})$	7

Drawbacks & Practical Considerations

While Strassen is theoretically superior, real-world implementation reveals significant challenges:

Performance Issues

Recursion Overhead: Requires additional stack storage for deep recursion trees, leading to increased memory usage and function call overhead
Cache Inefficiency: The irregular memory access patterns cause frequent cache misses, which can actually hurt performance compared to optimized loop-tiling approaches in standard multiplication
Thread Efficiency: Constant factor overhead reduces thread efficiency, making parallelization less effective
Numerical Stability: Floating-point errors accumulate faster due to the numerous subtractions in computing intermediate matrices

The Crossover Point

Strassen’s overhead makes it impractical for small matrices. The optimal strategy depends on matrix size:

Matrix Size	Recommended Algorithm	Reason
< 512 × 512	Standard (Naive)	Low overhead, better cache locality, less numerical error
512 - 2048	Depends on implementation	Crossover point varies by hardware and optimization
> 2048 × 2048	Strassen	Theoretical speedup outweighs overhead

Optimization Strategy

The critical parameter for optimization is the recursion truncation point—the threshold below which we switch from Strassen to the standard method. For example:

def strassen(A, B, n, threshold=512):
    if n <= threshold:
        return standard_multiply(A, B)  # Use naive method
    else:
        return strassen_recursive(A, B, n)  # Use Strassen

This hybrid approach balances theoretical advantages with practical efficiency.

When to Use Each Algorithm

Use Standard Multiplication When

Matrices are small (typically < 512×512)
Simplicity and readability are priorities
Memory usage is constrained
Numerical stability is critical
Working with sparse matrices

Use Divide & Conquer (Naive) When

You need to understand the recursive decomposition principle
Teaching or learning recursive algorithms
Matrix size is moderate and you want a cleaner algorithmic structure

Use Strassen’s Algorithm When

Multiplying large matrices (> 2048×2048)
Time efficiency is the primary concern
You can afford extra space complexity and have sufficient memory
Cache-aware optimizations can be applied (combining with loop tiling)

Master Theorem Quick Reference

For analyzing recursive divide-and-conquer algorithms like matrix multiplication:

$T(n) = aT(n/b) + O(n^d)$

Three Cases:

If $\log_b a > d$ : $T(n) = O(n^{\log_b a})$
If $\log_b a = d$ : $T(n) = O(n^d \log n)$
If $\log_b a < d$ : $T(n) = O(n^d)$

Application to Our Algorithms:

Algorithm	$a$	$b$	$d$	$\log_b a$	Result
Naive D&C	8	2	2	3	$O(n^3)$
Strassen	7	2	2	2.81	$O(n^{2.81})$