1.9 — Projections and Orthogonality

Date: 2026-03-02 | Block: 1 — Linear Algebra

The idea in plain English

A projection is the "shadow" of one vector cast along another direction. If you shine a light straight down on an arrow, the shadow on the floor is the projection onto the horizontal. This simple geometric idea turns out to be the hidden geometry behind linear regression, PCA, and least-squares problems.

The intuition

You have a vector b and a direction a. You want the closest point to b that lies on the line through a. That closest point is the projection.

                b
               /|
              / |  ← this gap is the residual (error)
             /  |    it is always perpendicular to a
            /   |
───────────*────*──────────→  direction a
                ↑
           projection of b onto a
           (the "shadow")

The residual (the gap between b and its shadow) is always orthogonal to the direction a. This is what makes it the closest point — if it weren't perpendicular, you could get even closer by sliding along a.

This picture is hiding inside linear regression. The predicted values ŷ = Xw are the projection of y onto the column space of X. The residuals y − ŷ are always perpendicular to every feature column.

The math

Derivation: we want p = c·a (lies on a) such that (b − p) ⊥ a:

a · (b − c·a) = 0
a·b − c(a·a) = 0
c = (a·b) / ‖a‖²          ← scalar projection (how far along a)

Vector projection:

proj_a(b) = c · a = (a·b / ‖a‖²) · a

Projection matrix (turns projection into a matrix operation):

P = a·aᵀ / ‖a‖²     →     proj_a(b) = P·b

Idempotent property: P² = P. Projecting twice = projecting once — once you're on the line, you stay there.

Projection onto a subspace (column space of A):

P = A(AᵀA)⁻¹Aᵀ

The residual (b − Pb) is orthogonal to every column of A.

A worked example

a = [2, 1],   b = [3, 3]

c = (a·b) / ‖a‖² = (2·3 + 1·3) / (4+1) = 9/5

projection = (9/5)·[2,1] = [3.6, 1.8]
residual   = [3-3.6, 3-1.8] = [-0.6, 1.2]

Check orthogonality: a · residual = 2·(-0.6) + 1·(1.2) = 0 ✓

Why this matters for ML

Linear regression IS a projection. You want Xw = y, but y might not lie in the column space of X (no exact fit). The best you can do is project y onto the column space of X. That projection is ŷ = X(XᵀX)⁻¹Xᵀy, which gives the normal equations w = (XᵀX)⁻¹Xᵀy. This is not a coincidence — regression is projection, derived from geometry, no calculus needed.

Residuals are always perpendicular to features. In any linear regression, (y − Xw) ⊥ X — the errors are orthogonal to every input feature. This is the optimality condition — you can't reduce the error further without changing the model.

PCA uses projection. When you "project data onto the first 2 principal components," you are doing exactly Z = X̃·Vₖ and reconstructing with Z·Vₖᵀ — a projection onto the principal subspace, exactly the formula from today.

The one thing to remember

The projection is the closest point on a line (or subspace) to your target. The residual is always perpendicular to the line. This geometry IS linear regression.