Skip to main content
Bytes & Beyond

Matrix Multiplication: Standard to Strassen

Learn matrix multiplication starting from the naive O(n³) approach and progressing to Strassen's optimized O(n²·⁸¹) algorithm.

Standard (Naive) Matrix Multiplication

The Problem Setup

We are multiplying two n×nn \times n matrices, AA and BB, to produce a result matrix CC.

Input Matrices:

A=(a11a12a21a22),B=(b11b12b21b22)A = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}, \quad B = \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{pmatrix}

Result Matrix:

C=(c11c12c21c22)C = \begin{pmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \end{pmatrix}

Calculation of Elements (Dot Product)

To find each element of matrix CC, we calculate the dot product of the corresponding row from AA and column from BB.

c11c_{11} (Row 1 of A · Col 1 of B): c11=a11b11+a12b21c_{11} = a_{11} \cdot b_{11} + a_{12} \cdot b_{21}

c12c_{12} (Row 1 of A · Col 2 of B): c12=a11b12+a12b22c_{12} = a_{11} \cdot b_{12} + a_{12} \cdot b_{22}

c21c_{21} (Row 2 of A · Col 1 of B): c21=a21b11+a22b21c_{21} = a_{21} \cdot b_{11} + a_{22} \cdot b_{21}

c22c_{22} (Row 2 of A · Col 2 of B): c22=a21b12+a22b22c_{22} = a_{21} \cdot b_{12} + a_{22} \cdot b_{22}

The General Formula

Mathematically, the value of any cell cijc_{ij} is the summation of the products:

cij=k=1naikbkjc_{ij} = \sum_{k=1}^{n} a_{ik} \cdot b_{kj}

Algorithm Implementation

// Standard Matrix Multiplication - O(n³) Time Complexity
for (i = 0; i < n; i++) {
    // Iterate through columns of B
    for (j = 0; j < n; j++) {
        C[i][j] = 0; // Initialize cell
        // Iterate through common dimension
        for (k = 0; k < n; k++) {
            C[i][j] += A[i][k] * B[k][j];
        }
    }
}

Time Complexity Analysis

  • Outer loop (i): Runs nn times
  • Middle loop (j): Runs nn times
  • Inner loop (k): Runs nn times
  • Total Operations: n×n×nn \times n \times n multiplications

Complexity: O(n3)O(n^3)


Divide & Conquer Strategy (Naive Approach)

Instead of iterating row-by-row, we divide matrices AA and BB into four n2×n2\frac{n}{2} \times \frac{n}{2} sub-matrices.

Structure

A=(A11A12A21A22),B=(B11B12B21B22),C=(C11C12C21C22)A = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}, \quad B = \begin{pmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{pmatrix}, \quad C = \begin{pmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{pmatrix}

The Algorithm MM(A, B, n)

  • Base Case: If n=1n = 1, use standard scalar multiplication
  • Recursive Step: Calculate the 4 quadrants of result matrix CC by making 8 recursive calls:

C11=MM(A11,B11,n/2)+MM(A12,B21,n/2)C_{11} = MM(A_{11}, B_{11}, n/2) + MM(A_{12}, B_{21}, n/2)

C12=MM(A11,B12,n/2)+MM(A12,B22,n/2)C_{12} = MM(A_{11}, B_{12}, n/2) + MM(A_{12}, B_{22}, n/2)

C21=MM(A21,B11,n/2)+MM(A22,B21,n/2)C_{21} = MM(A_{21}, B_{11}, n/2) + MM(A_{22}, B_{21}, n/2)

C22=MM(A21,B12,n/2)+MM(A22,B22,n/2)C_{22} = MM(A_{21}, B_{12}, n/2) + MM(A_{22}, B_{22}, n/2)

Complexity Analysis (Naive Recursive)

Using the Master Theorem on the recurrence relation:

T(n)=8T(n/2)+O(n2)T(n) = 8T(n/2) + O(n^2)

  • a=8a = 8 (number of subproblems)
  • b=2b = 2 (division factor)
  • d=2d = 2 (cost of combining solutions)
  • Result: T(n)=O(n3)T(n) = O(n^3) (Same as standard loop method)

Strassen’s Matrix Multiplication

Introduction

Strassen’s algorithm optimizes the Divide and Conquer approach by reducing the number of recursive multiplications from 8 to 7. This reduction significantly improves the time complexity.

While the naive divide-and-conquer approach is O(n3)O(n^3), Strassen’s approach achieves:

T(n)=O(nlog27)O(n2.81)T(n) = O(n^{\log_2 7}) \approx O(n^{2.81})

The 7 Intermediate Matrices (M₁ to M₇)

Instead of calculating the 4 quadrants of CC directly, Strassen computes these 7 intermediate values first:

M1=(A11+A22)(B11+B22)M_1 = (A_{11} + A_{22})(B_{11} + B_{22})

M2=(A21+A22)B11M_2 = (A_{21} + A_{22})B_{11}

M3=A11(B12B22)M_3 = A_{11}(B_{12} - B_{22})

M4=A22(B21B11)M_4 = A_{22}(B_{21} - B_{11})

M5=(A11+A12)B22M_5 = (A_{11} + A_{12})B_{22}

M6=(A21A11)(B11+B12)M_6 = (A_{21} - A_{11})(B_{11} + B_{12})

M7=(A12A22)(B21+B22)M_7 = (A_{12} - A_{22})(B_{21} + B_{22})

Operations Count: 7 Matrix Multiplications, 18 Additions/Subtractions

Calculating the Final Result (C)

The result matrix is derived from these 7 products:

C11=M1+M4M5+M7C_{11} = M_1 + M_4 - M_5 + M_7

C12=M3+M5C_{12} = M_3 + M_5

C21=M2+M4C_{21} = M_2 + M_4

C22=M1M2+M3+M6C_{22} = M_1 - M_2 + M_3 + M_6

Complexity Analysis (Strassen)

Using the Master Theorem on the recurrence relation:

T(n)=7T(n/2)+O(n2)T(n) = 7T(n/2) + O(n^2)

  • a=7a = 7 (number of subproblems)
  • b=2b = 2 (division factor)
  • d=2d = 2 (cost of combining solutions)
  • Result: T(n)=O(nlog27)O(n2.81)T(n) = O(n^{\log_2 7}) \approx O(n^{2.81})

Complexity Comparison

AlgorithmTime ComplexityMatrix Multiplications
Standard (Naive Loop)O(n3)O(n^3)1 (iterative)
Divide & Conquer (Naive)O(n3)O(n^3)8
StrassenO(n2.81)O(n^{2.81})7

Drawbacks & Practical Considerations

While Strassen is theoretically superior, real-world implementation reveals significant challenges:

Performance Issues

  1. Recursion Overhead: Requires additional stack storage for deep recursion trees, leading to increased memory usage and function call overhead

  2. Cache Inefficiency: The irregular memory access patterns cause frequent cache misses, which can actually hurt performance compared to optimized loop-tiling approaches in standard multiplication

  3. Thread Efficiency: Constant factor overhead reduces thread efficiency, making parallelization less effective

  4. Numerical Stability: Floating-point errors accumulate faster due to the numerous subtractions in computing intermediate matrices

The Crossover Point

Strassen’s overhead makes it impractical for small matrices. The optimal strategy depends on matrix size:

Matrix SizeRecommended AlgorithmReason
< 512 × 512Standard (Naive)Low overhead, better cache locality, less numerical error
512 - 2048Depends on implementationCrossover point varies by hardware and optimization
> 2048 × 2048StrassenTheoretical speedup outweighs overhead

Optimization Strategy

The critical parameter for optimization is the recursion truncation point—the threshold below which we switch from Strassen to the standard method. For example:

def strassen(A, B, n, threshold=512):
    if n <= threshold:
        return standard_multiply(A, B)  # Use naive method
    else:
        return strassen_recursive(A, B, n)  # Use Strassen

This hybrid approach balances theoretical advantages with practical efficiency.


When to Use Each Algorithm

Use Standard Multiplication When

  • Matrices are small (typically < 512×512)
  • Simplicity and readability are priorities
  • Memory usage is constrained
  • Numerical stability is critical
  • Working with sparse matrices

Use Divide & Conquer (Naive) When

  • You need to understand the recursive decomposition principle
  • Teaching or learning recursive algorithms
  • Matrix size is moderate and you want a cleaner algorithmic structure

Use Strassen’s Algorithm When

  • Multiplying large matrices (> 2048×2048)
  • Time efficiency is the primary concern
  • You can afford extra space complexity and have sufficient memory
  • Cache-aware optimizations can be applied (combining with loop tiling)

Master Theorem Quick Reference

For analyzing recursive divide-and-conquer algorithms like matrix multiplication:

T(n)=aT(n/b)+O(nd)T(n) = aT(n/b) + O(n^d)

Three Cases:

  1. If logba>d\log_b a > d: T(n)=O(nlogba)T(n) = O(n^{\log_b a})
  2. If logba=d\log_b a = d: T(n)=O(ndlogn)T(n) = O(n^d \log n)
  3. If logba<d\log_b a < d: T(n)=O(nd)T(n) = O(n^d)

Application to Our Algorithms:

Algorithmaabbddlogba\log_b aResult
Naive D&C8223O(n3)O(n^3)
Strassen7222.81O(n2.81)O(n^{2.81})

Further Reading