1.11 — Eigendecomposition & Diagonalization
Date: 2026-03-02 | Block: 1 — Linear Algebra
The idea in plain English
If a matrix has enough eigenvectors, you can completely rewrite it in terms of those eigenvectors and eigenvalues. This reveals that even a complicated-looking matrix is doing something surprisingly simple: just scaling each eigenvector direction independently.
The intuition
From Topic 1.10: a matrix A has eigenvectors v₁, v₂, ... and eigenvalues λ₁, λ₂, ...
Now here's the insight: in the eigenvector coordinate system, the matrix is trivially simple — it just multiplies each axis by its eigenvalue. All the complexity comes from the change of coordinates.
Think of it like this: a complicated 2D transformation looks messy in x-y coordinates. But if you rotate your coordinate axes to align with the eigenvectors, suddenly the transformation is just "scale the first axis by λ₁, scale the second by λ₂." Simple independent scaling.
Eigendecomposition literally writes the matrix as: "change to eigenvector coordinates → do simple scaling → change back."
A = V · Λ · V⁻¹
V⁻¹: rotate into eigenvector world
Λ: scale independently along each axis (trivially simple — it's diagonal)
V: rotate back to standard world
The math
The decomposition:
A = V · Λ · V⁻¹
Where:
- V = matrix with eigenvectors as columns: [v₁ | v₂ | ... | vₙ]
- Λ = diagonal matrix of eigenvalues: diag(λ₁, λ₂, ..., λₙ)
Derivation: write out A·V column by column:
A·V = [Av₁|Av₂|...|Avₙ] = [λ₁v₁|λ₂v₂|...|λₙvₙ] = V·Λ
→ A = V·Λ·V⁻¹ (multiply both sides by V⁻¹ on the right)
For symmetric matrices (like covariance matrices), eigenvectors are orthogonal, so V⁻¹ = Vᵀ:
A = Q · Λ · Qᵀ ← cleaner, more stable, no explicit inverse needed
Rank-1 expansion (another way to write it):
A = λ₁·v₁v₁ᵀ + λ₂·v₂v₂ᵀ + ... + λₙ·vₙvₙᵀ
A is a weighted sum of projections onto each eigenvector. Large λ = important direction.
A worked example — matrix powers
The real payoff: A^k = V·Λ^k·V⁻¹, and raising a diagonal matrix to a power is trivial:
Λ^k = diag(λ₁^k, λ₂^k, ..., λₙ^k)
For A with eigenvalues λ₁=2, λ₂=0.5:
- A^10 in the first eigenvector direction: 2^10 = 1024 — massively amplified
- A^10 in the second eigenvector direction: 0.5^10 ≈ 0.001 — completely vanished
This happens without doing 10 matrix multiplications — just three (V, Λ^k, V⁻¹).
Why this matters for ML
RNN long-term dependencies: after t time steps, hₜ = Wᵗ·h₀ = V·Λᵗ·V⁻¹·h₀. The eigenvalues of W determine what happens to information over time. Components along eigenvectors with |λ|>1 explode; with |λ|<1 vanish. The network can only remember the directions with |λ|≈1. This is the mathematical root of why vanilla RNNs fail at long sequences, and why LSTMs were invented to control these eigenvalues.
PCA: the covariance matrix C = QΛQᵀ. The eigendecomposition directly gives you the principal components (columns of Q) and the variance in each direction (diagonal of Λ). Drop the small eigenvalue directions → dimensionality reduction.
Spectral methods in Graph Neural Networks: the eigendecomposition of the graph Laplacian defines the "frequencies" of the graph. Spectral GNNs convolve in this eigenvector basis — exactly the same idea as Fourier transforms but on graph structure.
The one thing to remember
A = VΛV⁻¹ says: every transformation is just scaling along its natural axes (the eigenvectors), wrapped in a change of coordinates. Eigendecomposition makes matrix powers and long-run behavior trivial to compute.