1.7 — Inverse Matrices and When They Exist
Date: 2026-03-01 | Block: 1 — Linear Algebra
The idea in plain English
The inverse of a matrix is the "undo" transformation. If A rotates space 90° clockwise, then A⁻¹ rotates it 90° counterclockwise. Applying A then A⁻¹ gets you back exactly where you started — a net effect of doing nothing.
The intuition
Every invertible transformation can be reversed. You rotate → you can un-rotate. You stretch → you can un-stretch. But what if you squash the entire plane down to a single line? You've thrown away information — you can't reconstruct a plane from a line. No inverse exists.
This is the key: an inverse exists if and only if no information was destroyed.
Invertible: Not invertible:
plane → plane plane → line
(reversible) (information lost — can't go back)
The math
Definition: A⁻¹ is the unique matrix satisfying:
A · A⁻¹ = I and A⁻¹ · A = I
Formula for 2×2:
A = [ a b ] A⁻¹ = 1/det(A) · [ d -b ]
[ c d ] [ -c a ]
Steps: (1) swap diagonal elements, (2) negate off-diagonal, (3) divide by det(A).
If det(A) = 0, this formula blows up (division by zero) → no inverse exists.
When the inverse exists — all five say the same thing:
A is invertible
⟺ det(A) ≠ 0
⟺ rank(A) = n
⟺ null space = {0}
⟺ columns are linearly independent
⟺ no dimension is destroyed
A worked example
A = [ 3 1 ] det = 3·1 − 1·2 = 1
[ 2 1 ]
A⁻¹ = (1/1) · [ 1 -1 ] = [ 1 -1 ]
[ -2 3 ] [ -2 3 ]
Verify A·A⁻¹:
[ 3 1 ] · [ 1 -1 ] = [ 3-2 -3+3 ] = [ 1 0 ] ✓
[ 2 1 ] [ -2 3 ] [ 2-2 -2+3 ] [ 0 1 ]
Why this matters for ML
Linear regression normal equations: the formula w = (XᵀX)⁻¹Xᵀy requires (XᵀX) to be invertible. When features are correlated (linearly dependent), XᵀX becomes singular and has no inverse — the equation breaks down.
Never explicitly compute the inverse in code. inv(A) @ b is slower and numerically less stable than np.linalg.solve(A, b). Always use a solver — it gets you A⁻¹b without ever forming A⁻¹.
Ridge regression adds λI to fix invertibility: w = (XᵀX + λI)⁻¹Xᵀy. Adding λI to the diagonal guarantees det ≠ 0 for any λ > 0. This is both a numerical fix and (it turns out) a Bayesian prior on the weights.
Pseudoinverse handles non-square matrices: A⁺ = VΣ⁺Uᵀ (via SVD). This finds the best approximate solution when an exact inverse doesn't exist — which is exactly what least-squares regression computes.
The one thing to remember
The inverse "undoes" a transformation. It exists if and only if nothing was squashed — when det ≠ 0 and rank = n.