1.11 — Eigendecomposition & Diagonalization

Date: 2026-03-02 | Block: 1 — Linear Algebra

The idea in plain English

If a matrix has enough eigenvectors, you can completely rewrite it in terms of those eigenvectors and eigenvalues. This reveals that even a complicated-looking matrix is doing something surprisingly simple: just scaling each eigenvector direction independently.

The intuition

From Topic 1.10: a matrix A has eigenvectors v₁, v₂, ... and eigenvalues λ₁, λ₂, ...

Now here's the insight: in the eigenvector coordinate system, the matrix is trivially simple — it just multiplies each axis by its eigenvalue. All the complexity comes from the change of coordinates.

Think of it like this: a complicated 2D transformation looks messy in x-y coordinates. But if you rotate your coordinate axes to align with the eigenvectors, suddenly the transformation is just "scale the first axis by λ₁, scale the second by λ₂." Simple independent scaling.

Eigendecomposition literally writes the matrix as: "change to eigenvector coordinates → do simple scaling → change back."

A = V · Λ · V⁻¹

V⁻¹: rotate into eigenvector world
Λ:   scale independently along each axis (trivially simple — it's diagonal)
V:   rotate back to standard world

The math

The decomposition:

A = V · Λ · V⁻¹

Where: - V = matrix with eigenvectors as columns: [v₁ | v₂ | ... | vₙ] - Λ = diagonal matrix of eigenvalues: diag(λ₁, λ₂, ..., λₙ)

Derivation: write out A·V column by column:

A·V = [Av₁|Av₂|...|Avₙ] = [λ₁v₁|λ₂v₂|...|λₙvₙ] = V·Λ
→ A = V·Λ·V⁻¹   (multiply both sides by V⁻¹ on the right)

For symmetric matrices (like covariance matrices), eigenvectors are orthogonal, so V⁻¹ = Vᵀ:

A = Q · Λ · Qᵀ     ← cleaner, more stable, no explicit inverse needed

Rank-1 expansion (another way to write it):

A = λ₁·v₁v₁ᵀ + λ₂·v₂v₂ᵀ + ... + λₙ·vₙvₙᵀ

A is a weighted sum of projections onto each eigenvector. Large λ = important direction.

A worked example — matrix powers

The real payoff: A^k = V·Λ^k·V⁻¹, and raising a diagonal matrix to a power is trivial:

Λ^k = diag(λ₁^k, λ₂^k, ..., λₙ^k)

For A with eigenvalues λ₁=2, λ₂=0.5: - A^10 in the first eigenvector direction: 2^10 = 1024 — massively amplified - A^10 in the second eigenvector direction: 0.5^10 ≈ 0.001 — completely vanished

This happens without doing 10 matrix multiplications — just three (V, Λ^k, V⁻¹).

Why this matters for ML

RNN long-term dependencies: after t time steps, hₜ = Wᵗ·h₀ = V·Λᵗ·V⁻¹·h₀. The eigenvalues of W determine what happens to information over time. Components along eigenvectors with |λ|>1 explode; with |λ|<1 vanish. The network can only remember the directions with |λ|≈1. This is the mathematical root of why vanilla RNNs fail at long sequences, and why LSTMs were invented to control these eigenvalues.

PCA: the covariance matrix C = QΛQᵀ. The eigendecomposition directly gives you the principal components (columns of Q) and the variance in each direction (diagonal of Λ). Drop the small eigenvalue directions → dimensionality reduction.

Spectral methods in Graph Neural Networks: the eigendecomposition of the graph Laplacian defines the "frequencies" of the graph. Spectral GNNs convolve in this eigenvector basis — exactly the same idea as Fourier transforms but on graph structure.

The one thing to remember

A = VΛV⁻¹ says: every transformation is just scaling along its natural axes (the eigenvectors), wrapped in a change of coordinates. Eigendecomposition makes matrix powers and long-run behavior trivial to compute.