mirror of
https://github.com/calofmijuck/blog.git
synced 2026-02-05 01:23:53 +00:00
feat: matmul as tensors post
This commit is contained in:
385
_posts/Mathematics/algebra/2026-01-02-matmul-as-tensors.md
Normal file
385
_posts/Mathematics/algebra/2026-01-02-matmul-as-tensors.md
Normal file
@@ -0,0 +1,385 @@
|
||||
---
|
||||
share: true
|
||||
toc: true
|
||||
math: true
|
||||
categories:
|
||||
- Mathematics
|
||||
- Algebra
|
||||
path: _ posts/mathematics
|
||||
tags:
|
||||
- math
|
||||
- algebra
|
||||
title: Matrix Multiplication as $(1, 1)$-Tensors
|
||||
date: 2026-01-02
|
||||
github _ title: 2026-01-02-matmul-as-tensors
|
||||
image:
|
||||
path: /assets/img/posts/mathematics/algebra/matmul-tensor.png
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
Matrices have various applications throughout many fields of science. However, when we first learn matrices and their operations, the most non-intuitive concept is the **multiplication** of matrices.
|
||||
|
||||
Compared to addition and *scalar* multiplication, which are very intuitive, matrix multiplication has a rather strange definition.
|
||||
|
||||
> **Definition.** (Matrix Multiplication) For $A = (a _ {ij}) _ {n\times n}$ and $B = (b _ {ij}) _ {n\times n}$, their product is defined as $C = AB = (c _ {ij}) _ {n\times n}$ where
|
||||
>
|
||||
> $$
|
||||
> c _ {ij} = \sum _ {k=1}^n a _ {ik}b _ {kj}.
|
||||
> $$
|
||||
|
||||
This is often interpreted as "row times column", with the following diagram:
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
& \begin{bmatrix} & b _ {1j} & & & \\
|
||||
& b _ {2j} & & & \\
|
||||
& \vdots & & & \\
|
||||
& b _ {nj} & & & \\
|
||||
\end{bmatrix} \\
|
||||
\begin{bmatrix} \\
|
||||
a _ {i1} & a _ {i2} & \cdots & a _ {in} \\
|
||||
\\ \\
|
||||
\end{bmatrix} & \begin{bmatrix}
|
||||
& & & & \\
|
||||
& c _ {ij} & & & \\
|
||||
& & & & \\
|
||||
& & & &
|
||||
\end{bmatrix}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
But why is matrix multiplication defined this way, rather than multiplying element-wise just like addition? What is the meaning behind all this strange "row times column"? It is well-known that matrix multiplications can represent the **composition of linear transformations**, and we are pretty sure that one can derive the above formula so that the desired property holds. Still, why does it have this specific structure?
|
||||
|
||||
In this article, we explore the deep reason behind the definition of matrix multiplication using **tensors**.
|
||||
|
||||
We assume that all vector spaces are over a field $K$, and are *finite-dimensional*.
|
||||
|
||||
## Linearity
|
||||
|
||||
### Linear Maps
|
||||
|
||||
> **Definition.** Let $(V, \oplus, \odot)$ and $(W, \boxplus, \boxdot)$ be vector spaces over $K$. A map $f : V \to W$ is **linear** if for all $v _ 1, v _ 2 \in V$ and $\lambda \in K$,
|
||||
>
|
||||
> $$
|
||||
>f \paren{(\lambda \odot v _ 1) \oplus v _ 2} = (\lambda \boxdot f(v _ 1)) \boxplus f(v _ 2).
|
||||
> $$
|
||||
|
||||
We will drop the special notation for addition and scalar multiplication, and just write $+$ and $\cdot$. It should be clear from context and not cause any confusion. Then we can write the above definition in a familiar way.
|
||||
|
||||
> For all $v _ 1, v _ 2 \in V$ and $\lambda \in K$,
|
||||
>
|
||||
> $$
|
||||
>f(\lambda v _ 1 + v _ 2) = \lambda f(v _ 1) + f(v _ 2).
|
||||
> $$
|
||||
|
||||
Some definitions are in order.
|
||||
|
||||
> **Definition.** (Isomorphism) A bijective linear map is a **isomorphism** of vector spaces. If vector spaces $V$ and $W$ are isomorphic, we write $V \approx W$.
|
||||
|
||||
> **Definition.** The set of linear maps from $V$ to $W$ is denoted as $\rm{Hom}(V, W)$.
|
||||
|
||||
> **Definition.** (Endomorphism) An **endomorphism** of $V$ is a linear map from $V$ to itself. We write $\rm{End}(V) = \rm{Hom}(V, V)$.
|
||||
|
||||
### Bilinearity
|
||||
|
||||
A map is *bilinear* if it is linear in its respective argument. Formally,
|
||||
|
||||
> **Definition.** (Bilinearity) Let $V, W, Z$ be vector spaces over $K$. A map $f : V \times W \to Z$ is **bilinear** if
|
||||
> 1. For any $w \in W$, $f(\lambda v _ 1 + v _ 2, w) = \lambda f(v _ 1, w) + f(v _ 2, w)$ for all $v _ 1, v _ 2 \in V$ and $\lambda \in K$.
|
||||
> 2. For any $v \in V$, $f(v, \lambda w _ 1 + w _ 2) = \lambda f(v, w _ 1) + f(v, w _ 2)$ for all $w _ 1, w _ 2 \in W$ and $\lambda \in K$.
|
||||
|
||||
In other words,
|
||||
|
||||
1. The map $v \mapsto f(v, w)$ is linear for fixed $w \in W$.
|
||||
2. The map $w \mapsto f(v, w)$ is linear for fixed $v \in V$.
|
||||
|
||||
Note that this is *not* the same as linear maps in $V \times W \to Z$, since that would mean
|
||||
|
||||
$$
|
||||
f\paren{\lambda(v _ 1, w _ 1) + (v _ 2, w _ 2)} = \lambda f(v _ 1, w _ 1) + f(v _ 2, w _ 2)
|
||||
$$
|
||||
|
||||
for all $v _ 1, v _ 2 \in V$, $w _ 1, w _ 2 \in W$ and $\lambda \in K$. One can check that $(x, y) \mapsto x+ y$ is linear but not bilinear, while the map $(x, y) \mapsto xy$ is bilinear but not linear.
|
||||
|
||||
**Example.** Famous examples of bilinear maps.
|
||||
|
||||
- The standard inner product $\span{\cdot, \cdot} : \R^n \times \R^n \to \R$, where $\span{v, w} = v\trans w$.
|
||||
- Determinants of $2 \times 2$ matrices $\det _ 2 : \R^2 \times \R^2 \to \R$, where
|
||||
|
||||
$$\rm{det} _ 2 \paren{\begin{bmatrix} a \\ c\end{bmatrix}, \begin{bmatrix} b \\ d\end{bmatrix}} = \det \paren{\begin{bmatrix} a & b \\ c & d \end{bmatrix}} = ad - bc.$$
|
||||
|
||||
## Dual Spaces
|
||||
|
||||
### Dual Vector Space
|
||||
|
||||
To define tensors, we need *dual spaces*.
|
||||
|
||||
> **Definition.** (Dual Space) The **dual vector space** of $V$ is defined as
|
||||
>
|
||||
> $$
|
||||
> V^\ast = \rm{Hom}(V, K)
|
||||
> $$
|
||||
>
|
||||
> where $K$ is considered as a vector space over itself.
|
||||
|
||||
The dual vector space is the *vector space of linear maps from $V$ to the underlying field $K$*. Its elements are called **linear functionals**, **covectors**, or **one-forms** on $V$.
|
||||
|
||||
We can also consider the *double dual* of a vector space.
|
||||
|
||||
> **Definition.** The double dual vector space of $V$ is defined as
|
||||
>
|
||||
> $$
|
||||
>V^{\ast\ast} = \rm{Hom}(V^\ast, K).
|
||||
> $$
|
||||
|
||||
We can continue on, but it is unnecessary due to the following result: $V \approx V^{\ast\ast}$.
|
||||
|
||||
> **Theorem.** $V$ and $V^{\ast\ast}$ are *naturally isomorphic*.
|
||||
|
||||
*Proof*. Define $\psi : V \ra V^{\ast\ast}$ as
|
||||
|
||||
$$
|
||||
\psi(v)(f) = f(v)
|
||||
$$
|
||||
|
||||
for $v \in V$ and $f \in V^\ast$. Then $\psi$ is a natural isomorphism, independent of the choice of basis on $V$.
|
||||
|
||||
Furthermore, $V \approx V^\ast$ (finite-dimensional) but they are not naturally isomorphic since an isomorphism from $V \ra V^\ast$ requires a choice of basis. More about this in another article.
|
||||
|
||||
### Dual Basis
|
||||
|
||||
Since $V$, $V^\ast$ and $V^{\ast\ast}$ are all isomorphic, they all have the same dimension.
|
||||
|
||||
> **Lemma.** $\dim V = \dim V^\ast = \dim V^{\ast\ast}$.
|
||||
|
||||
Therefore, the bases of $V$ and $V^\ast$ have the same number of elements. We can construct a basis of $V^\ast$ from a basis of $V$.
|
||||
|
||||
> **Definition.** (Dual Basis) Let $V$ be a $d$-dimensional vector space with basis $\mc{B} = \braces{e _ 1, \dots, e _ d}$. The **dual basis** is the unique basis
|
||||
>
|
||||
> $$
|
||||
>\mc{B}' = \braces{f^1, \dots, f^d} \subseteq V^\ast
|
||||
> $$
|
||||
>
|
||||
> such that
|
||||
>
|
||||
> $$
|
||||
>f^i (e _ j) = \delta^i _ j = \begin{cases} 1 & (i = j) \\ 0 & (i \neq j) \end{cases}.
|
||||
> $$
|
||||
|
||||
Note that the superscripts are not exponents.
|
||||
|
||||
## Tensor Spaces
|
||||
|
||||
Tensors are not just high-dimensional arrays of numbers. We approach tensors from an algebraic perspective.
|
||||
|
||||
### Tensors
|
||||
|
||||
> **Definition.** (Tensor) Let $V$ be a vector space over $K$. A **$(p, q)$-tensor** $T$ on $V$ is a *multi-linear map*
|
||||
>
|
||||
> $$
|
||||
>T : \underbrace{V^\ast \times \cdots \times V^\ast} _ {p \text{ copies}} \times \underbrace{V \times \cdots \times V} _ {q \text{ copies}} \to K.
|
||||
> $$
|
||||
>
|
||||
> We write the set of $(p, q)$-tensors on $V$ as
|
||||
>
|
||||
> $$
|
||||
>T _ q^p V = \underbrace{V \otimes \cdots \otimes V} _ {p\text{ copies}} \otimes \underbrace{V^\ast \otimes \cdots \otimes V^\ast} _ {q\text{ copies}}.
|
||||
> $$
|
||||
|
||||
Thus, **tensors** are **multi-linear maps**.
|
||||
|
||||
**Remark.** The order of $V$ and $V^\ast$ are swapped in the definition of tensor and the set of tensors. Although this is just a matter of notation, it can be understood as follows.
|
||||
|
||||
A $(p, q)$-tensor $T$ eats $p$ covectors and $q$ vectors to map them to a scalar. Notice that
|
||||
|
||||
- An element of $V \approx V^{\ast\ast} = \rm{Hom}(V^\ast,K)$ eats a covector and maps it to a scalar.
|
||||
- An element of $V^\ast = \rm{Hom}(V, K)$ eats a vector and maps it to a scalar.
|
||||
|
||||
Thus, a $(p, q)$-tensor $T$ is sort of an element of $p$ copies of $V$ ($\approx V^{\ast\ast}$) and $q$ copies of $V^\ast$, which kind of justifies the notation in $T _ q^p V$.
|
||||
|
||||
### Tensor Spaces
|
||||
|
||||
The set $T _ q^pV$ can be equipped with a $K$-vector space structure by defining addition and multiplication component-wise, since tensors are linear maps. Now we have a **tensor space** as a $K$-vector space.
|
||||
|
||||
**Example.** Some famous examples of tensor spaces.
|
||||
|
||||
- $T _ 1^0 V = V^\ast$ is the set of linear maps $T : V \ra K$, which agrees with the definition of $V^\ast$.
|
||||
- $T _ 1^1(V) = V \otimes V^\ast$ is the set of bilinear maps $T : V^\ast \times V \to K$.
|
||||
|
||||
### Tensor Products
|
||||
|
||||
> **Definition.** (Tensor Product) Let $T \in T _ q^pV$ and $S \in T _ s^r V$. The **tensor product** of $T$ and $S$ is the tensor $T \otimes S \in T _ {q+s}^{p+r} V$ defined as
|
||||
>
|
||||
> $$
|
||||
>\begin{aligned}
|
||||
>&(T\otimes S)(w _ 1, \dots, w _ p, w _ {p+1}, \dots, w _ {p+r}, v _ 1, \dots, v _ q, v _ {q+1}, \dots, v _ {q+s}) \\
|
||||
>&\quad= T(w _ 1, \dots, w _ p, v _ 1, \dots, v _ q) \cdot S(w _ {p+1}, \dots, w _ {p+r}, v _ {q+1}, \dots, v _ {q+s}).
|
||||
> \end{aligned}
|
||||
> $$
|
||||
>
|
||||
> for $w _ i \in V^\ast$ and $v _ i \in V$.
|
||||
|
||||
The definition seems complicated but is actually very natural. $T \otimes S$ should eat $p+r$ covectors and $q+s$ vectors. Considering the arguments of $T$ and $S$, give $p$ covectors and $q$ vectors to $T$ and the remaining to $S$. Multiply the resulting scalar in $K$.
|
||||
|
||||
Note that the $\otimes$ here is different from $\otimes$ used in the definition of $T _ q^p V$.
|
||||
|
||||
### Tensors as Components
|
||||
|
||||
With these tools, we can completely determine a tensor with its *components*, just like how linear maps are completely determined by its values on the basis vectors.
|
||||
|
||||
> **Definition.** Let $V$ be a finite-dimensional $K$-vector space with basis $\mc{B} = \braces{e _ 1, \dots, e _ d}$ and dual basis $\mc{B}' = \braces{f^1, \dots, f^{d}}$.
|
||||
>
|
||||
> The **components** of $T \in T _ q^p V$ are defined to be the numbers
|
||||
>
|
||||
> $$
|
||||
>T^{a _ 1 \dots a _ p} _ {b _ 1\dots b _ q} = T(f^{a _ 1}, \dots, f^{a _ p}, e _ {b _ 1}, \dots, e _ {b _ q}) \in K
|
||||
> $$
|
||||
>
|
||||
> where $1 \leq a _ i, b _ j \leq d = \dim V$.
|
||||
|
||||
Notice that $a _ i, b _ j$ range from $1$ to $\dim V$. Since we have $p$ copies of $V^\ast$ and $q$ copies of $V$, the value of $T$ at *every possible combination of basis vectors* are the components.
|
||||
|
||||
We can reconstruct a tensor from its components by the tensor product of vectors and covectors from the basis and the dual basis:
|
||||
|
||||
$$
|
||||
T = \underbrace{\sum _ {a _ 1=1}^{\dim V} \cdots \sum _ {b _ q = 1}^{\dim V}} _ {p+q \text{ sums}} T^{a _ 1 \dots a _ p} _ {b _ 1\dots b _ q} e _ {a _ 1} \otimes \cdots \otimes e _ {a _ p} \otimes f^{b _ 1} \otimes \cdots \otimes f^{b _ q}.
|
||||
$$
|
||||
|
||||
Here, $e _ {a _ i}$ are considered as elements of $T _ 0^1 V \approx V$ and $f^{b _ j}$ as elements of $T _ {1}^0 V \approx V^\ast$.
|
||||
|
||||
## Vectors as Tensors
|
||||
|
||||
Now that we have components for tensors in $T _ q^p V$ using a basis for $V$ and dual basis of $V^\ast$, we can write vectors and matrices as tensors.
|
||||
|
||||
### Vectors
|
||||
|
||||
It is well-known that any vector can be represented as a linear combination of the basis vectors. i.e.,
|
||||
|
||||
$$
|
||||
v = v^1 e _ 1 + \cdots + v^d e _ d \in V
|
||||
$$
|
||||
|
||||
where $v^j$ are components of the vector $v$ with respect to each basis vector $e _ j$. We observe that vectors are indeed tensors in $T _ 0^1 V = V$.
|
||||
|
||||
Conventionally, we often write vectors as *columns* of numbers.
|
||||
|
||||
$$
|
||||
v = \sum v^i e _ i \quad \longleftrightarrow \quad v = \begin{bmatrix} v^1 \\ \vdots \\ v^d \end{bmatrix}
|
||||
$$
|
||||
|
||||
### Covectors
|
||||
|
||||
As for covectors $w \in V^\ast$, we can also represent them as a linear combination. i.e.,
|
||||
|
||||
$$
|
||||
w = w _ 1 f^1 + \cdots + w _ d f^d \in V^\ast
|
||||
$$
|
||||
|
||||
where $w _ i$ are components of the covector $w$ with respect to each dual basis vector $f^i$. We also observe that covectors are tensors in $T _ 1^0 V = V^\ast$.
|
||||
|
||||
Also, we often write covectors as *rows* of numbers.
|
||||
|
||||
$$
|
||||
w = \sum w _ i f^i \quad \longleftrightarrow \quad w = \begin{bmatrix} w _ 1 & \dots & w _ d \end{bmatrix}
|
||||
$$
|
||||
|
||||
## Matrices as Tensors
|
||||
|
||||
As for matrices we need to prove a few things beforehand. We limit the discussion to square matrices.
|
||||
|
||||
First, we know that **matrices are linear transformations**, so every matrix can be considered as a linear map $V \to V$, thus an element of $\rm{End}(V)$.
|
||||
|
||||
> **Lemma.** $T _ 1^1 V = V \otimes V^\ast \approx \rm{End}(V^\ast)$.
|
||||
|
||||
*Proof*. We must construct an invertible linear map between a tensor $T \in V \otimes V^\ast$ and a linear map $f : V^\ast \to V^\ast$.
|
||||
|
||||
For $w \in V^\ast$, simply define $f(w) = T(\cdot, w)$, then $f(w)$ is a map from $V$ to $K$, where $f$ and $f(w)$ can be shown to be linear by the bilinearity of $T$. Thus $f(w) \in V^\ast$ and $f \in \rm{End}(V^\ast)$.
|
||||
|
||||
This correspondence is invertible, since given a linear map $f : V^\ast \to V^\ast$, we can define $T$ as $T(v, w) = f(w)(v)$ for $v \in V$ and $w \in V^\ast$. Then $f(w) \in V^\ast$ and $f(w)(v) \in K$, so $T : V \times V^\ast \to K$. Linearity of $T$ follows directly from the linearity of $f$ and $f(w)$.
|
||||
|
||||
> **Corollary.** For finite dimensional $K$-vector space $V$, $T _ 1^1 V \approx \rm{End}(V)$.
|
||||
|
||||
This directly follows from the fact that $V\approx V^\ast$.
|
||||
|
||||
### Matrices as $(1, 1)$-Tensors
|
||||
|
||||
Finally, since matrices are elements of $\rm{End}(V)$, we can conclude that **square matrices are $(1, 1)$-tensors**. Therefore, we can write a tensor $\phi \in T _ 1^1 V$ with respect to the chosen basis as
|
||||
|
||||
$$
|
||||
\phi = \sum _ {i=1}^{\dim V} \sum _ {j=1}^{\dim V} \phi^i _ j \; e _ i \otimes f^j,
|
||||
$$
|
||||
|
||||
where $\phi^i _ j = \phi(f^i, e _ j)$. Now it is *very tempting* to think of $\phi^i _ j$ as a **square array of numbers**.
|
||||
|
||||
$$
|
||||
\phi = \sum _ {i=1}^{\dim V} \sum _ {j=1}^{\dim V} \phi^i _ j \; e _ i \otimes f^j \quad \longleftrightarrow \quad \phi =
|
||||
\begin{bmatrix}
|
||||
\phi _ 1^1 & \phi _ 2^1 & \cdots & \phi _ d^1 \\
|
||||
\phi _ 1^2 & \phi _ 2^2 & \cdots & \phi _ d^2 \\
|
||||
\vdots & \vdots & \ddots & \vdots \\
|
||||
\phi _ 1^d & \phi _ 2^d & \cdots & \phi _ d^d
|
||||
\end{bmatrix}
|
||||
$$
|
||||
|
||||
The convention is to consider the top index $i$ as a *row* index and bottom index $j$ as a *column* index.
|
||||
|
||||
### Matrix Multiplication
|
||||
|
||||
However, the above *arrangement* is a pure convention.
|
||||
|
||||
Consider $\phi \in \rm{End}(V)$ also as a tensor $\phi \in T _ 1^1 V$. Abusing the notation, we can write
|
||||
|
||||
$$
|
||||
\phi(w, v) = w(\phi(v))
|
||||
$$
|
||||
|
||||
for $v \in V$ and $w \in V^\ast$. For clarity, $\phi : V^\ast \times V \to K$ on the left hand side. As for the right hand side, $\phi : V \to V$. $w$ eats a vector $\phi(v) \in V$ and maps it to $K$ as it is an element of $V^\ast$.
|
||||
|
||||
Thus, the components of $\phi \in \rm{End}(V)$ are $\phi _ j^i = \phi(f^i, e _ j) = f^i(\phi(e _ j))$.
|
||||
|
||||
Since **matrix multiplication represents the composition of linear transformation**, for $\phi, \psi \in \rm{End}(V)$, let us consider the components of $\phi \circ \psi$ as a $(1,1)$-tensor. We have
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
(\phi \circ \psi) _ j^i &= (\phi \circ \psi)(f^i, e _ j) \\
|
||||
&= f^i\big( (\phi\circ\psi)(e _ j) \big) \\
|
||||
&= f^i\big( \phi(\psi (e _ j)) \big) \\
|
||||
&\overset{(\ast)}{=} f^i\paren{ \phi\paren{\sum _ k \psi^k _ j e _ k} } \\
|
||||
&\overset{(\star)}{=} \sum _ k \psi _ j^k f^i \big( \phi(e _ k) \big) \\
|
||||
&= \sum _ k \psi _ j^k \phi _ k^i.
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
- $(\ast)$ follows by $\psi(e _ j) = \sum _ k \psi _ j^k e _ k$. This holds since for $w \in V^\ast$,
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\psi(w, e _ j) &= \sum _ {k, l} \psi _ l^k \; (e _ k \otimes f^l)(w, e _ j) \\
|
||||
&= \sum _ {k, l} \psi _ l^k \; e _ k(w) \cdot f^l(e _ j) \\
|
||||
&= \sum _ {k} \psi _ j^k e _ k (w)
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
because $f^l (e _ j) = \delta _ {j}^l$, leaving only the terms on $l=j$.
|
||||
- $(\star)$ follows by the linearity of $f^i$ and $\phi$.
|
||||
|
||||
Doesn't this expression look familiar? Since $K$ is a field,
|
||||
|
||||
$$
|
||||
\sum _ {k} \psi _ j^k \phi _ k^i = \sum _ k \phi^i _ k \psi _ j^k,
|
||||
$$
|
||||
|
||||
which implies that the $(i, j)$-th component of $(1, 1)$-tensor $\phi \circ \psi$ exactly matches the $(i, j)$-th entry of the matrix product $\phi \psi$.
|
||||
|
||||
Now we know why matrix multiplication is defined as "row times column". Tensor spaces, tensor products and dual spaces were behind all this.
|
||||
|
||||
## Notes
|
||||
|
||||
- The above argument should be generalizable to rectangular matrices, although I haven't checked yet.
|
||||
- Now we know why matrix multiplication is defined in a strange way, but the question is now why tensors are defined in such a strange way, involving vector spaces and their duals.
|
||||
- (Not sure) Matrix transpose should be interpretable with tensors?
|
||||
|
||||
## References
|
||||
|
||||
- [Geometrical Anatomy of Theoretical Physics](https://youtube.com/playlist?list=PLPH7f _ 7ZlzxTi6kS4vCmv4ZKm9u8g5yic&si=dJj6nmoAc944YOgl), Lecture 8
|
||||
BIN
assets/img/posts/mathematics/algebra/matmul-tensor.png
Normal file
BIN
assets/img/posts/mathematics/algebra/matmul-tensor.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 67 KiB |
Reference in New Issue
Block a user