1.7 — Inverse Matrices and When They Exist

Date: 2026-03-01 | Block: 1 — Linear Algebra

The idea in plain English

The inverse of a matrix is the "undo" transformation. If A rotates space 90° clockwise, then A⁻¹ rotates it 90° counterclockwise. Applying A then A⁻¹ gets you back exactly where you started — a net effect of doing nothing.

The intuition

Every invertible transformation can be reversed. You rotate → you can un-rotate. You stretch → you can un-stretch. But what if you squash the entire plane down to a single line? You've thrown away information — you can't reconstruct a plane from a line. No inverse exists.

This is the key: an inverse exists if and only if no information was destroyed.

Invertible:          Not invertible:

plane → plane        plane → line
(reversible)         (information lost — can't go back)

The math

Definition: A⁻¹ is the unique matrix satisfying:

A · A⁻¹ = I    and    A⁻¹ · A = I

Formula for 2×2:

A = [ a  b ]       A⁻¹ = 1/det(A) · [  d  -b ]
    [ c  d ]                          [ -c   a ]

Steps: (1) swap diagonal elements, (2) negate off-diagonal, (3) divide by det(A).

If det(A) = 0, this formula blows up (division by zero) → no inverse exists.

When the inverse exists — all five say the same thing:

A is invertible
  ⟺ det(A) ≠ 0
  ⟺ rank(A) = n
  ⟺ null space = {0}
  ⟺ columns are linearly independent
  ⟺ no dimension is destroyed

A worked example

A = [ 3  1 ]    det = 3·1 − 1·2 = 1
    [ 2  1 ]

A⁻¹ = (1/1) · [  1  -1 ]  =  [  1  -1 ]
               [ -2   3 ]     [ -2   3 ]

Verify A·A⁻¹:
[ 3  1 ] · [  1  -1 ]  =  [ 3-2   -3+3 ]  =  [ 1  0 ]  ✓
[ 2  1 ]   [ -2   3 ]     [ 2-2   -2+3 ]     [ 0  1 ]

Why this matters for ML

Linear regression normal equations: the formula w = (XᵀX)⁻¹Xᵀy requires (XᵀX) to be invertible. When features are correlated (linearly dependent), XᵀX becomes singular and has no inverse — the equation breaks down.

Never explicitly compute the inverse in code. inv(A) @ b is slower and numerically less stable than np.linalg.solve(A, b). Always use a solver — it gets you A⁻¹b without ever forming A⁻¹.

Ridge regression adds λI to fix invertibility: w = (XᵀX + λI)⁻¹Xᵀy. Adding λI to the diagonal guarantees det ≠ 0 for any λ > 0. This is both a numerical fix and (it turns out) a Bayesian prior on the weights.

Pseudoinverse handles non-square matrices: A⁺ = VΣ⁺Uᵀ (via SVD). This finds the best approximate solution when an exact inverse doesn't exist — which is exactly what least-squares regression computes.

The one thing to remember

The inverse "undoes" a transformation. It exists if and only if nothing was squashed — when det ≠ 0 and rank = n.