feat: matmul as tensors post

2026-02-05 01:23:53 +00:00 · 2026-01-02 16:24:05 +09:00
parent 16a91a8945
commit c4b9b32e16
2 changed files with 385 additions and 0 deletions
--- a/_posts/Mathematics/algebra/2026-01-02-matmul-as-tensors.md
+++ b/_posts/Mathematics/algebra/2026-01-02-matmul-as-tensors.md
@@ -0,0 +1,385 @@
+---
+share: true
+toc: true
+math: true
+categories:
+  - Mathematics
+  - Algebra
+path:  _ posts/mathematics
+tags:
+  - math
+  - algebra
+title: Matrix Multiplication as $(1, 1)$-Tensors
+date: 2026-01-02
+github _ title: 2026-01-02-matmul-as-tensors
+image:
+  path: /assets/img/posts/mathematics/algebra/matmul-tensor.png
+---
+
+## Introduction
+
+Matrices have various applications throughout many fields of science. However, when we first learn matrices and their operations, the most non-intuitive concept is the **multiplication** of matrices.
+
+Compared to addition and *scalar* multiplication, which are very intuitive, matrix multiplication has a rather strange definition.
+
+> **Definition.** (Matrix Multiplication) For $A = (a _ {ij}) _ {n\times n}$ and $B = (b _ {ij}) _ {n\times n}$, their product is defined as $C = AB = (c _ {ij}) _ {n\times n}$ where
+>
+> $$
+> c _ {ij} = \sum _ {k=1}^n a _ {ik}b _ {kj}.
+> $$
+
+This is often interpreted as "row times column", with the following diagram:
+
+$$
+\begin{aligned}
+& \begin{bmatrix} & b _ {1j} & & & \\
+& b _ {2j} & & & \\
+& \vdots & & & \\
+& b _ {nj} & & & \\
+\end{bmatrix} \\
+\begin{bmatrix} \\
+a _ {i1} & a _ {i2} & \cdots & a _ {in}  \\
+\\ \\
+\end{bmatrix} & \begin{bmatrix}
+& & & & \\
+& c _ {ij} & & &  \\
+& & & & \\
+& & & &
+\end{bmatrix}
+\end{aligned}
+$$
+
+But why is matrix multiplication defined this way, rather than multiplying element-wise just like addition? What is the meaning behind all this strange "row times column"? It is well-known that matrix multiplications can represent the **composition of linear transformations**, and we are pretty sure that one can derive the above formula so that the desired property holds. Still, why does it have this specific structure?
+
+In this article, we explore the deep reason behind the definition of matrix multiplication using **tensors**.
+
+We assume that all vector spaces are over a field $K$, and are *finite-dimensional*.
+
+## Linearity
+
+### Linear Maps
+
+> **Definition.** Let $(V, \oplus, \odot)$ and $(W, \boxplus, \boxdot)$ be vector spaces over $K$. A map $f : V \to W$ is **linear** if for all $v _ 1, v _ 2 \in V$ and $\lambda \in K$,
+>
+> $$
+>f \paren{(\lambda \odot v _ 1) \oplus v _ 2} = (\lambda \boxdot f(v _ 1)) \boxplus f(v _ 2).
+> $$
+
+We will drop the special notation for addition and scalar multiplication, and just write $+$ and $\cdot$. It should be clear from context and not cause any confusion. Then we can write the above definition in a familiar way.
+
+> For all $v _ 1, v _ 2 \in V$ and $\lambda \in K$,
+>
+> $$
+>f(\lambda v _ 1 + v _ 2) = \lambda f(v _ 1) + f(v _ 2).
+> $$
+
+Some definitions are in order.
+
+> **Definition.** (Isomorphism) A bijective linear map is a **isomorphism** of vector spaces. If vector spaces $V$ and $W$ are isomorphic, we write $V \approx W$.
+
+> **Definition.** The set of linear maps from $V$ to $W$ is denoted as $\rm{Hom}(V, W)$.
+
+> **Definition.** (Endomorphism) An **endomorphism** of $V$ is a linear map from $V$ to itself. We write $\rm{End}(V) = \rm{Hom}(V, V)$.
+
+### Bilinearity
+
+A map is *bilinear* if it is linear in its respective argument. Formally,
+
+> **Definition.** (Bilinearity) Let $V, W, Z$ be vector spaces over $K$. A map $f : V \times W \to Z$ is **bilinear** if
+> 1. For any $w \in W$, $f(\lambda v _ 1 + v _ 2, w) = \lambda f(v _ 1, w) + f(v _ 2, w)$ for all $v _ 1, v _ 2 \in V$ and $\lambda \in K$.
+> 2. For any $v \in V$, $f(v, \lambda w _ 1 + w _ 2) = \lambda f(v, w _ 1) + f(v, w _ 2)$ for all $w _ 1, w _ 2 \in W$ and $\lambda \in K$.
+
+In other words,
+
+1. The map $v \mapsto f(v, w)$ is linear for fixed $w \in W$.
+2. The map $w \mapsto f(v, w)$ is linear for fixed $v \in V$.
+
+Note that this is *not* the same as linear maps in $V \times W \to Z$, since that would mean
+
+$$
+f\paren{\lambda(v _ 1, w _ 1) + (v _ 2, w _ 2)} = \lambda f(v _ 1, w _ 1) + f(v _ 2, w _ 2)
+$$
+
+for all $v _ 1, v _ 2 \in V$, $w _ 1, w _ 2 \in W$ and $\lambda \in K$. One can check that $(x, y) \mapsto x+ y$ is linear but not bilinear, while the map $(x, y) \mapsto xy$ is bilinear but not linear.
+
+**Example.** Famous examples of bilinear maps.
+
+ - The standard inner product $\span{\cdot, \cdot} : \R^n \times \R^n \to \R$, where $\span{v, w} = v\trans w$.
+ - Determinants of $2 \times 2$ matrices $\det _ 2 : \R^2 \times \R^2 \to \R$, where
+
+	$$\rm{det} _ 2 \paren{\begin{bmatrix} a \\ c\end{bmatrix}, \begin{bmatrix} b \\ d\end{bmatrix}} = \det \paren{\begin{bmatrix} a & b \\ c & d \end{bmatrix}} = ad - bc.$$
+
+## Dual Spaces
+
+### Dual Vector Space
+
+To define tensors, we need *dual spaces*.
+
+> **Definition.** (Dual Space) The **dual vector space** of $V$ is defined as
+>
+> $$
+> V^\ast = \rm{Hom}(V, K)
+> $$
+>
+> where $K$ is considered as a vector space over itself.
+
+The dual vector space is the *vector space of linear maps from $V$ to the underlying field $K$*. Its elements are called **linear functionals**, **covectors**, or **one-forms** on $V$.
+
+We can also consider the *double dual* of a vector space.
+
+> **Definition.** The double dual vector space of $V$ is defined as
+>
+> $$
+>V^{\ast\ast} = \rm{Hom}(V^\ast, K).
+> $$
+
+We can continue on, but it is unnecessary due to the following result: $V \approx V^{\ast\ast}$.
+
+> **Theorem.** $V$ and $V^{\ast\ast}$ are *naturally isomorphic*.
+
+*Proof*. Define $\psi : V \ra V^{\ast\ast}$ as
+
+$$
+\psi(v)(f) = f(v)
+$$
+
+for $v \in V$ and $f \in V^\ast$. Then $\psi$ is a natural isomorphism, independent of the choice of basis on $V$.
+
+Furthermore, $V \approx V^\ast$ (finite-dimensional) but they are not naturally isomorphic since an isomorphism from $V \ra V^\ast$ requires a choice of basis. More about this in another article.
+
+### Dual Basis
+
+Since $V$, $V^\ast$ and $V^{\ast\ast}$ are all isomorphic, they all have the same dimension.
+
+> **Lemma.** $\dim V = \dim V^\ast = \dim V^{\ast\ast}$.
+
+Therefore, the bases of $V$ and $V^\ast$ have the same number of elements. We can construct a basis of $V^\ast$ from a basis of $V$.
+
+> **Definition.** (Dual Basis) Let $V$ be a $d$-dimensional vector space with basis $\mc{B} = \braces{e _ 1, \dots, e _ d}$. The **dual basis** is the unique basis
+>
+> $$
+>\mc{B}' = \braces{f^1, \dots, f^d} \subseteq V^\ast
+> $$
+>
+> such that
+>
+> $$
+>f^i (e _ j) = \delta^i _ j = \begin{cases} 1 & (i = j) \\ 0  & (i \neq j) \end{cases}.
+> $$
+
+Note that the superscripts are not exponents.
+
+## Tensor Spaces
+
+Tensors are not just high-dimensional arrays of numbers. We approach tensors from an algebraic perspective.
+
+### Tensors
+
+> **Definition.** (Tensor) Let $V$ be a vector space over $K$. A **$(p, q)$-tensor** $T$ on $V$ is a *multi-linear map*
+>
+> $$
+>T : \underbrace{V^\ast \times \cdots \times V^\ast} _ {p \text{ copies}} \times \underbrace{V \times \cdots \times V} _ {q \text{ copies}} \to K.
+> $$
+>
+> We write the set of $(p, q)$-tensors on $V$ as
+>
+> $$
+>T _ q^p V = \underbrace{V \otimes \cdots \otimes V} _ {p\text{ copies}} \otimes \underbrace{V^\ast \otimes \cdots \otimes V^\ast} _ {q\text{ copies}}.
+> $$
+
+Thus, **tensors** are **multi-linear maps**.
+
+**Remark.** The order of $V$ and $V^\ast$ are swapped in the definition of tensor and the set of tensors. Although this is just a matter of notation, it can be understood as follows.
+
+A $(p, q)$-tensor $T$ eats $p$ covectors and $q$ vectors to map them to a scalar. Notice that
+
+- An element of $V \approx V^{\ast\ast} = \rm{Hom}(V^\ast,K)$ eats a covector and maps it to a scalar.
+- An element of $V^\ast = \rm{Hom}(V, K)$ eats a vector and maps it to a scalar.
+
+Thus, a $(p, q)$-tensor $T$ is sort of an element of $p$ copies of $V$ ($\approx V^{\ast\ast}$) and $q$ copies of $V^\ast$, which kind of justifies the notation in $T _ q^p V$.
+
+### Tensor Spaces
+
+The set $T _ q^pV$ can be equipped with a $K$-vector space structure by defining addition and multiplication component-wise, since tensors are linear maps. Now we have a **tensor space** as a $K$-vector space.
+
+**Example.** Some famous examples of tensor spaces.
+
+- $T _ 1^0 V = V^\ast$ is the set of linear maps $T : V \ra K$, which agrees with the definition of $V^\ast$.
+- $T _ 1^1(V) = V \otimes V^\ast$ is the set of bilinear maps $T : V^\ast \times V \to K$.
+
+### Tensor Products
+
+> **Definition.** (Tensor Product) Let $T \in T _ q^pV$ and $S \in T _ s^r V$. The **tensor product** of $T$ and $S$ is the tensor $T \otimes S \in T _ {q+s}^{p+r} V$ defined as
+>
+> $$
+>\begin{aligned}
+>&(T\otimes S)(w _ 1, \dots, w _ p, w _ {p+1}, \dots, w _ {p+r}, v _ 1, \dots, v _ q, v _ {q+1}, \dots, v _ {q+s})  \\
+>&\quad= T(w _ 1, \dots, w _ p, v _ 1, \dots, v _ q) \cdot S(w _ {p+1}, \dots, w _ {p+r}, v _ {q+1}, \dots, v _ {q+s}).
+> \end{aligned}
+> $$
+>
+> for $w _ i \in V^\ast$ and $v _ i \in V$.
+
+The definition seems complicated but is actually very natural. $T \otimes S$ should eat $p+r$ covectors and $q+s$ vectors. Considering the arguments of $T$ and $S$, give $p$ covectors and $q$ vectors to $T$ and the remaining to $S$. Multiply the resulting scalar in $K$.
+
+Note that the $\otimes$ here is different from $\otimes$ used in the definition of $T _ q^p V$.
+
+### Tensors as Components
+
+With these tools, we can completely determine a tensor with its *components*, just like how linear maps are completely determined by its values on the basis vectors.
+
+> **Definition.** Let $V$ be a finite-dimensional $K$-vector space with basis $\mc{B} = \braces{e _ 1, \dots, e _ d}$ and dual basis $\mc{B}' = \braces{f^1, \dots, f^{d}}$.
+>
+> The **components** of $T \in T _ q^p V$ are defined to be the numbers
+>
+> $$
+>T^{a _ 1 \dots a _ p}  _ {b _ 1\dots b _ q} = T(f^{a _ 1}, \dots, f^{a _ p}, e _ {b _ 1}, \dots, e _ {b _ q}) \in K
+> $$
+>
+> where $1 \leq a _ i, b _ j \leq d = \dim V$.
+
+Notice that $a _ i, b _ j$ range from $1$ to $\dim V$. Since we have $p$ copies of $V^\ast$ and $q$ copies of $V$, the value of $T$ at *every possible combination of basis vectors* are the components.
+
+We can reconstruct a tensor from its components by the tensor product of vectors and covectors from the basis and the dual basis:
+
+$$
+T = \underbrace{\sum _ {a _ 1=1}^{\dim V} \cdots \sum _ {b _ q = 1}^{\dim V}} _ {p+q \text{ sums}} T^{a _ 1 \dots a _ p}  _ {b _ 1\dots b _ q} e _ {a _ 1} \otimes \cdots \otimes e _ {a _ p} \otimes f^{b _ 1} \otimes \cdots \otimes f^{b _ q}.
+$$
+
+Here, $e _ {a _ i}$ are considered as elements of $T _ 0^1 V \approx V$ and $f^{b _ j}$ as elements of $T _ {1}^0 V \approx V^\ast$.
+
+## Vectors as Tensors
+
+Now that we have components for tensors in $T _ q^p V$ using a basis for $V$ and dual basis of $V^\ast$, we can write vectors and matrices as tensors.
+
+### Vectors
+
+It is well-known that any vector can be represented as a linear combination of the basis vectors. i.e.,
+
+$$
+v = v^1 e _ 1 + \cdots + v^d e _ d \in V
+$$
+
+where $v^j$ are components of the vector $v$ with respect to each basis vector $e _ j$. We observe that vectors are indeed tensors in $T _ 0^1 V = V$.
+
+Conventionally, we often write vectors as *columns* of numbers.
+
+$$
+v = \sum v^i e _ i \quad \longleftrightarrow \quad v = \begin{bmatrix} v^1 \\ \vdots \\ v^d \end{bmatrix}
+$$
+
+### Covectors
+
+As for covectors $w \in V^\ast$, we can also represent them as a linear combination. i.e.,
+
+$$
+w = w _ 1 f^1 + \cdots + w _ d f^d \in V^\ast
+$$
+
+where $w _ i$ are components of the covector $w$ with respect to each dual basis vector $f^i$. We also observe that covectors are tensors in $T _ 1^0 V = V^\ast$.
+
+Also, we often write covectors as *rows* of numbers.
+
+$$
+w = \sum w _ i f^i \quad \longleftrightarrow \quad w = \begin{bmatrix} w _ 1 & \dots & w _ d \end{bmatrix}
+$$
+
+## Matrices as Tensors
+
+As for matrices we need to prove a few things beforehand. We limit the discussion to square matrices.
+
+First, we know that **matrices are linear transformations**, so every matrix can be considered as a linear map $V \to V$, thus an element of $\rm{End}(V)$.
+
+> **Lemma.** $T _ 1^1 V = V \otimes V^\ast \approx \rm{End}(V^\ast)$.
+
+*Proof*. We must construct an invertible linear map between a tensor $T \in V \otimes V^\ast$ and a linear map $f : V^\ast \to V^\ast$.
+
+For $w \in V^\ast$, simply define $f(w) = T(\cdot, w)$, then $f(w)$ is a map from $V$ to $K$, where $f$ and $f(w)$ can be shown to be linear by the bilinearity of $T$. Thus $f(w) \in V^\ast$ and $f \in \rm{End}(V^\ast)$.
+
+This correspondence is invertible, since given a linear map $f : V^\ast \to V^\ast$, we can define $T$ as $T(v, w) = f(w)(v)$ for $v \in V$ and $w \in V^\ast$. Then $f(w) \in V^\ast$ and $f(w)(v) \in K$, so $T : V \times V^\ast \to K$. Linearity of $T$ follows directly from the linearity of $f$ and $f(w)$.
+
+> **Corollary.** For finite dimensional $K$-vector space $V$, $T _ 1^1 V \approx \rm{End}(V)$.
+
+This directly follows from the fact that $V\approx V^\ast$.
+
+### Matrices as $(1, 1)$-Tensors
+
+Finally, since matrices are elements of $\rm{End}(V)$, we can conclude that **square matrices are $(1, 1)$-tensors**. Therefore, we can write a tensor $\phi \in T _ 1^1 V$ with respect to the chosen basis as
+
+$$
+\phi = \sum _ {i=1}^{\dim V} \sum _ {j=1}^{\dim V} \phi^i _ j \; e _ i \otimes f^j,
+$$
+
+where $\phi^i _ j = \phi(f^i, e _ j)$. Now it is *very tempting* to think of $\phi^i _ j$ as a **square array of numbers**.
+
+$$
+\phi = \sum _ {i=1}^{\dim V} \sum _ {j=1}^{\dim V} \phi^i _ j \; e _ i \otimes f^j \quad \longleftrightarrow \quad \phi =
+\begin{bmatrix}
+\phi _ 1^1 & \phi _ 2^1 & \cdots & \phi _ d^1 \\
+\phi _ 1^2 & \phi _ 2^2 & \cdots & \phi _ d^2 \\
+\vdots & \vdots & \ddots & \vdots \\
+\phi _ 1^d & \phi _ 2^d & \cdots & \phi _ d^d
+\end{bmatrix}
+$$
+
+The convention is to consider the top index $i$ as a *row* index and bottom index $j$ as a *column* index.
+
+### Matrix Multiplication
+
+However, the above *arrangement* is a pure convention.
+
+Consider $\phi \in \rm{End}(V)$ also as a tensor $\phi \in T _ 1^1 V$. Abusing the notation, we can write
+
+$$
+\phi(w, v) = w(\phi(v))
+$$
+
+for $v \in V$ and $w \in V^\ast$. For clarity, $\phi : V^\ast \times V \to K$ on the left hand side. As for the right hand side, $\phi : V \to V$. $w$ eats a vector $\phi(v) \in V$ and maps it to $K$ as it is an element of $V^\ast$.
+
+Thus, the components of $\phi \in \rm{End}(V)$ are $\phi _ j^i = \phi(f^i, e _ j) = f^i(\phi(e _ j))$.
+
+Since **matrix multiplication represents the composition of linear transformation**, for $\phi, \psi \in \rm{End}(V)$, let us consider the components of $\phi \circ \psi$ as a $(1,1)$-tensor. We have
+
+$$
+\begin{aligned}
+(\phi \circ \psi) _ j^i &= (\phi \circ \psi)(f^i, e _ j) \\
+&= f^i\big( (\phi\circ\psi)(e _ j) \big) \\
+&= f^i\big( \phi(\psi (e _ j)) \big) \\
+&\overset{(\ast)}{=} f^i\paren{ \phi\paren{\sum _ k \psi^k _ j e _ k} } \\
+&\overset{(\star)}{=} \sum _ k \psi _ j^k f^i \big( \phi(e _ k) \big) \\
+&= \sum _ k \psi _ j^k \phi _ k^i.
+\end{aligned}
+$$
+
+- $(\ast)$ follows by $\psi(e _ j) = \sum _ k \psi _ j^k e _ k$. This holds since for $w \in V^\ast$,
+
+	$$
+	\begin{aligned}
+    \psi(w, e _ j) &= \sum _ {k, l} \psi _ l^k \; (e _ k \otimes f^l)(w, e _ j) \\
+    &= \sum _ {k, l} \psi _ l^k \; e _ k(w) \cdot f^l(e _ j) \\
+    &= \sum _ {k} \psi _ j^k e _ k (w)
+    \end{aligned}
+	$$
+
+	because $f^l (e _ j) = \delta _ {j}^l$, leaving only the terms on $l=j$.
+- $(\star)$ follows by the linearity of $f^i$ and $\phi$.
+
+Doesn't this expression look familiar? Since $K$ is a field,
+
+$$
+\sum _ {k} \psi _ j^k \phi _ k^i = \sum _ k \phi^i _ k \psi _ j^k,
+$$
+
+which implies that the $(i, j)$-th component of $(1, 1)$-tensor $\phi \circ \psi$ exactly matches the $(i, j)$-th entry of the matrix product $\phi \psi$.
+
+Now we know why matrix multiplication is defined as "row times column". Tensor spaces, tensor products and dual spaces were behind all this.
+
+## Notes
+
+- The above argument should be generalizable to rectangular matrices, although I haven't checked yet.
+- Now we know why matrix multiplication is defined in a strange way, but the question is now why tensors are defined in such a strange way, involving vector spaces and their duals.
+- (Not sure) Matrix transpose should be interpretable with tensors?
+
+## References
+
+- [Geometrical Anatomy of Theoretical Physics](https://youtube.com/playlist?list=PLPH7f _ 7ZlzxTi6kS4vCmv4ZKm9u8g5yic&si=dJj6nmoAc944YOgl), Lecture 8
--- a/assets/img/posts/mathematics/algebra/matmul-tensor.png
+++ b/assets/img/posts/mathematics/algebra/matmul-tensor.png