[PUBLISHER] upload files #139

[PUBLISHER] upload files #138
[PUBLISHER] upload files #137
2026-02-05 09:28:02 +00:00 · 2024-01-17 18:20:54 +09:00 · 2024-01-17 18:19:48 +09:00 · 2024-01-17 18:15:11 +09:00 · 2024-01-17 18:14:26 +09:00 · 2024-01-17 18:14:03 +09:00
9 changed files with 1294 additions and 0 deletions
--- a/Cryptography/2023-10-05-number-theory.md
+++ b/Cryptography/2023-10-05-number-theory.md
@@ -0,0 +1,257 @@
 ---
 share: true
 toc: true
 math: true
 categories:
  - Lecture Notes
  - Modern Cryptography
 tags:
  - lecture-note
  - cryptography
  - number-theory
  - security
 title: 8. Number Theory
 date: 2023-10-05
 github_title: 2023-10-05-number-theory
 ---
 ## Background
 ### Number Theory
 Let $n$ be a positive integer and let $p$ be prime.
 > **Notation.** Let $\mathbb{Z}$ denote the set of integers. We will write $\mathbb{Z}_n = \left\lbrace 0, 1, \dots, n - 1 \right\rbrace$.
 > **Definition.** Let $x, y \in \mathbb{Z}$. $\gcd(x, y)$ is the **greatest common divisor** of $x, y$. $x$ and $y$ are relatively prime if $\gcd(x, y) = 1$.
 > **Definition.** The **multiplicative inverse** of $x \in \mathbb{Z}_n$ is an element $y \in \mathbb{Z}_n$ such that $xy = 1$ in $\mathbb{Z}_n$.
 > **Lemma.** $x \in \mathbb{Z}_n$ has a multiplicative inverse if and only if $\gcd(x, n) = 1$.
 > **Definition.** $\mathbb{Z}_n^\ast$ is the set of invertible elements in $\mathbb{Z}_n$. i.e, $\mathbb{Z}_n^\ast = \left\lbrace x \in \mathbb{Z}_n : \gcd(x, n) = 1 \right\rbrace$.
 > **Lemma.** (Extended Euclidean Algorithm) For $x, y \in \mathbb{Z}$, there exists $a, b \in \mathbb{Z}$ such that $ax + by = \gcd(x, y)$.
 ### Group Theory
 > **Definition.** A **group** is a set $G$ with a binary operation $* : G \times G \rightarrow G$, satisfying the following properties.
 > 
 > - $(\mathsf{G1})$ (Associative) $(a * b) * c = a * (b * c)$ for all $a, b, c \in G$.
 > - $(\mathsf{G2})$ (Identity) $\exists e \in G$ such that for all $a\in G$, $e * a = a * e = a$.
 > - $(\mathsf{G3})$ (Inverse) For each $a \in G$, $\exists x \in G$ such that $a * x = x * a = e$. In this case, $x = a^{-1}$.
 > **Definition.** A group is **commutative** if $a * b = b * a$ for all $a, b \in G$.
 > **Definition.** The **order** of a group is the number of elements in $G$, denoted as $\left\lvert G \right\lvert$.
 > **Definition.** A set $H \subseteq G$ is a **subgroup** of $G$ if $H$ is itself a group under the operation of $G$. We write $H \leq G$.
 > **Theorem.** (Lagrange) Let $G$ be a finite group and $H \leq G$. Then $\left\lvert H \right\lvert \mid \left\lvert G \right\lvert$.
 *Proof*. All left cosets of $H$ have the same number of elements. A bijection between any two coset can be constructed. Cosets partition $G$, so $\left\lvert G \right\lvert$ is equal to the number of left cosets multiplied by $\left\lvert H \right\lvert$.
 Let $G$ be a group.
 > **Definition.** Let $g \in G$. The set $\left\langle g \right\rangle = \left\lbrace g^n : n \in \mathbb{Z} \right\rbrace$ is called the **cyclic subgroup generated by $g$**. The **order** of $g$ is the number of elements in $\left\langle g \right\rangle$, denoted as $\left\lvert g \right\lvert$.
 > **Definition.** $G$ is **cyclic** if there exists $g \in G$ such that $G = \left\langle g \right\rangle$.
 > **Theorem.** $\mathbb{Z}_p^\ast$ is cyclic.
 *Proof*. $\mathbb{Z}_p$ is a finite field, so $\mathbb{Z}_p^\ast = \mathbb{Z}_p \setminus \left\lbrace 0 \right\rbrace$ is cyclic.
 > **Theorem.** If $G$ is a finite group, then $g^{\left\lvert G \right\lvert} = 1$ for all $g \in G$. i.e, $\left\lvert g \right\lvert \mid \left\lvert G \right\lvert$.
 *Proof*. Consider $\left\langle g \right\rangle \leq G$, then the result follows from Lagrange's theorem.
 > **Corollary.** (Fermat's Little Theorem) If $x \in \mathbb{Z}_p^\ast$, $x^{p-1} = 1$.
 *Proof*. $\mathbb{Z}_p^\ast$ has $p-1$ elements.
 > **Corollary.** (Euler's Generalization) If $x \in \mathbb{Z}_n^\ast$, $x^{\phi(n)} = 1$.
 *Proof*. $\mathbb{Z}_n^\ast$ has $\phi(n)$ elements, where $\phi(n)$ is the Euler's totient function.
 ---
 Schemes such as Diffie-Hellman rely on the hardness of the DLP. So, *how hard is it*? How does one compute the discrete logarithm?
 There are group-specific algorithms that exploit the algebraic features of the group, but we only cover generic algorithms, that works on any cyclic group. A trivial example would be the exhaustive search, where if $\left\lvert G \right\lvert = n$ and given a generator $g \in G$, find the discrete logarithm of $h \in G$ by computing $g^i$ for all $i = 1, \dots, n - 1$. Obviously, it has running time $\mathcal{O}(n)$. We can do better than this.
 ## Baby Step Giant Step Method (BSGS)
 Let $G = \left\langle g \right\rangle$, where $g \in G$ has order $q$. $q$ need not be prime for this method. We are given $u = g^\alpha$, $g$, and $q$. Our task is to find $\alpha \in \mathbb{Z}_q$.
 Set $m = \left\lceil \sqrt{q} \right\rceil$. $\alpha$ is currently unknown, but by the division algorithm, there exists integers $i,j$ such that $\alpha = i \cdot m + j$ and $0\leq i, j < m$. Then $u = g^\alpha = g^{i\cdot m + j} = g^{im} \cdot g^j$. Therefore,
 $$
 u(g^{-m})^i = g^j.
 $$
 Now, we compute the values of $g^j$ for $j = 0, 1,\dots, m - 1$ and keep a table of $(j, g^j)$ pairs. Next, compute $g^{-m}$ and for each $i$, compute $u(g^{-m})^{i}$ and check if this value is in the table. If a value is found, then we found $(i, j)$ such that $i \cdot m + j = \alpha$.
 We see that this algorithm takes $2\sqrt{q}$ group operations on $G$ in the worst case, so the time complexity is $\mathcal{O}(\sqrt{q})$. However, to store the values of $(j, g^j)$ pairs, a lot of memory is required. The table must be large enough to contain $\sqrt{q}$ group elements, so the space complexity is also $\mathcal{O}(\sqrt{q})$.
 To get around this, we can build a smaller table by choosing a smaller $m$. But then $0 \leq j < m$ but $i$ must be checked for around $q/m$ values.
 There is actually an algorithm using constant space. **Pollard's Rho** algorithm takes $\mathcal{O}(\sqrt{q})$ times and $\mathcal{O}(1)$ space.
 ## Groups of Composite Order
 In Diffie-Hellman, we only used large primes. There is a reason for using groups with prime order. We study what would happen if we used composite numbers.
 Let $G$ be a cyclic group of composite order $n$. First, we start with a simple case.
 ### Prime Power Case: Order $n = q^e$
 Let $G = \left\langle g \right\rangle$ be a cyclic group of order $q^e$.[^1] ($q > 1$, $e \geq 1$) We are given $g,q, e$ and $u = g^\alpha$ and we will find $\alpha$. ($0 \leq \alpha < q^e)$
 For each $f = 0, \dots, e$, define $g_f = g^{(q^f)}$. Then
 $$
 (g_f)^{(q^{e-f})} = g^{(q^f) \cdot (q^{e-f})} = g^{(q^e)} = 1.
 $$
 So $g_f$ generates a cyclic subgroup of order $q^{e-f}$. In particular, $g_{e-1}$ generates a cyclic subgroup of order $q$. Using this fact, we will reduce the given problem into a discrete logarithm problem on a group having smaller order $q$.
 We proceed with recursion on $e$. If $e = 1$, then $\alpha \in \mathbb{Z}_q$, so we have nothing to do. Suppose $e > 1$. Choose $f$ so that $1 \leq f \leq e-1$. We can write $\alpha = i\cdot q^f + j$, where $0 \leq i < q^{e-f}$ and $0 \leq j < g^f$. Then
 $$
 u = g^\alpha = g^{i \cdot q^f + j} = (g_f)^i \cdot g^j.
 $$
 Since $g_f$ has order $q^{e-f}$, exponentiate both sides by $q^{e-f}$ to get
 $$
 u^{(q^{e-f})} = (g_f)^{q^{e-f} \cdot i} \cdot g^{q^{e-f} \cdot j} = (g_{e-f})^j.
 $$
 Now the problem has been reduced to a discrete logarithm problem with base $g_{e-f}$, which has order $q^f$. We can compute $j$ using algorithms for discrete logarithms.
 After finding $j$, we have
 $$
 u/g^j = (g_f)^i
 $$
 which is also a discrete logarithm problem with base $g_f$, which has order $q^{e-f}$. We can compute $i$ that satisfies this equation. Finally, we can compute $\alpha = i \cdot q^f + j$. We have reduced a discrete logarithm problem into two smaller discrete logarithm problems.
 To get the best running time, choose $f \approx e/2$. Let $T(e)$ be the running time, then
 $$
 T(e) = 2T\left( \frac{e}{2} \right) + \mathcal{O}(e\log q).
 $$
 The $\mathcal{O}(e\log q)$ term comes from exponentiating both sides by $q^{e-f}$. Solving this recurrence gives
 $$
 T(e) = \mathcal{O}(e \cdot T_{\mathrm{base}} + e\log e \log q),
 $$
 where $T_\mathrm{base}$ is the complexity of the algorithm for the base case $e = 1$. $T_\mathrm{base}$ is usually the dominant term, since the best known algorithm takes $\mathcal{O}(\sqrt{q})$.
 Thus, computing the discrete logarithm in $G$ is only as hard as computing it in the subgroup of prime order.
 ### General Case: Pohlig-Hellman Algorithm
 Let $G = \left\langle g \right\rangle$ be a cyclic group of order $n = q_1^{e_1}\cdots q_r^{e_r}$, where the factorization of $n$ into distinct primes $q_i$ is given. We want to find $\alpha$ such that $g^\alpha = u$.
 For $i = 1, \dots, r$, define $q_i^\ast = n / q_i^{e_i}$. Then $u^{q_i^\ast} = (g^{q_i^\ast})^\alpha$, where $g^{q_i^\ast}$ will have order $q_i^{e_i}$ in $G$. Now compute $\alpha_i$ using the algorithm for the prime power case.
 Then for all $i$, we have $\alpha \equiv \alpha_i \pmod{q_i^{e_i}}$. We can now use the Chinese remainder theorem to recover $\alpha$. Let $q_r$ be the largest prime, then the running time is bounded by
 $$
 \sum_{i=1}^r \mathcal{O}(e_i T(q_i) + e_i \log e_i \log q_i) = \mathcal{O}(T(q_r) \log n + \log n \log \log n)
 $$
 group operations. Thus, we can conclude the following.
 > The difficulty of computing discrete logarithms in a cyclic group of order $n$ is determined by the size of the largest prime factor.
 ### Consequences
 - For a group with order $n = 2^k$, the Pohlig-Hellman algorithm will easily compute the discrete logarithm, since the largest prime factor is $2$. The DL assumption is false for this group.
 - For primes of the form $p = 2^k + 1$, the group $\mathbb{Z}_p^\ast$ has order $2^k$, so the DL assumption is also false for these primes.
 - In general, $G$ must have at least one large prime factor for the DL assumption to be true.
 - By the Pohlig-Hellman algorithm, discrete logarithms in groups of composite order is a little harder than groups of prime order. So we often use a prime order group.
 ## Information Leakage in Groups of Composite Order
 Let $G = \left\langle g \right\rangle$ be a cyclic group of composite order $n$. We suppose that $n = n_1n_2$, where $n_1$ is a small prime factor.
 By the Pohlig-Hellman algorithm, the adversary can compute $\alpha_1 \equiv \alpha \pmod {n_1}$ by computing the discrete logarithm of $u^{n_2}$ with base $g^{n_2}$.
 Consider $n_1 = 2$. Then the adversary knows whether $\alpha$ is even or not.
 > **Lemma.** $\alpha$ is even if and only if $u^{n/2} = 1$.
 *Proof*. If $\alpha$ is even, then $u^{n/2} = g^{\alpha n/2} = (g^{\alpha/2})^n = 1$, since the group has order $n$. Conversely, if $u^{n/2} = g^{\alpha n/2} = 1$, then the order of $g$ must divide $\alpha n/2$, so $n \mid (\alpha n /2)$ and $\alpha$ is even.
 This lemma can be used to break the DDH assumption.
 > **Lemma.** Given $u = g^\alpha$ and $v = g^\beta$, $\alpha\beta \in \mathbb{Z}_n$ is even if and only if $u^{n/2} = 1$ or $v^{n/2} = 1$.
 *Proof*. $\alpha\beta$ is even if and only if either $\alpha$ or $\beta$ is even. By the above lemma, this is equivalent to $u^{n/2} = 1$ or $v^{n/2} = 1$.
 Now we describe an attack for the DDH problem.
 > 1. The adversary is given $(g^\alpha, g^\beta, g^\gamma)$.
 > 2. The adversary computes the parity of $\gamma$ and $\alpha\beta$ and compares them.
 > 3. The adversary outputs $\texttt{accept}$ if the parities match, otherwise output $\texttt{reject}$.
 If $\gamma$ was chosen uniformly, then the adversary wins with probability $1/2$. But if $\gamma = \alpha\beta$, the adversary always wins, so the adversary has DDH advantage $1/2$.
 The above process can be generalized to any groups with small prime factor. See Exercise 16.2[^2] Thus, this is another reason we use groups of prime order.
 - DDH assumption does not hold in $\mathbb{Z}_p^\ast$, since its order $p-1$ is always even.
 - Instead, we use a prime order subgroup of $\mathbb{Z}_p^\ast$ or prime order elliptic curve group.
 ## Summary of Discrete Logarithm Algorithms
 |Name|Time Complexity|Space Complexity|
 |:-:|:-:|:-:|
 |BSGS|$\mathcal{O}(\sqrt{q})$|$\mathcal{O}(\sqrt{q})$|
 |Pohlig-Hellman|$\mathcal{O}(\sqrt{q_\mathrm{max}}$|$\mathcal{O}(1)$|
 |Pollard's Rho|$\mathcal{O}(\sqrt{q})$|$\mathcal{O}(1)$|
 - In generic groups, solving the DLP requires $\Omega(\sqrt{q})$ operations.
 	- By *generic groups*, we mean that only group operations and equality checks are allowed. Algebraic properties are not used.
 - Thus, we use a large prime $q$ such that $\sqrt{q}$ is large enough.
 ## Candidates of Discrete Logarithm Groups
 We need groups of order prime, and we cannot use $\mathbb{Z}_p^\ast$ as itself. We have two candidates.
 - Use a subgroup of $\mathbb{Z}_p^\ast$ having prime order $q$ such that $q \mid (p-1)$ as in Diffie-Hellman.
 - Elliptic curve group modulo $p$.
 ### Reduced Residue Class $\mathbb{Z}_p^\ast$
 There are many specific algorithms for discrete logarithms on $\mathbb{Z}_p^\ast$.
 - Index-calculus
 - Elliptic-curve method
 - Special number-field sieve (SNFS)
 - **General number-field sieve** (GNFS)
 GNFS running time is dominated by the term $\exp(\sqrt[3]{\ln p})$. If we let $p$ to be an $n$-bit prime, then the complexity is $\exp(\sqrt[3]{n})$. Suppose that GNFS runs in time $T$ for prime $p$. Since $\sqrt[3]{2} \approx 1.26$, doubling the number of bits will increase the running time of GNFS to $T^{1,26}$.
 Compare this with symmetric ciphers such as AES, where doubling the key size squares the amount of work required.[^3] NIST and Lenstra recommends the size of primes that gives a similar level of security to that of symmetric ciphers.
 |Symmetric key length|Size of prime (NIST)|Size of prime (Lenstra)|
 |:-:|:-:|:-:|
 |80|1024|1329|
 |128|3072|4440|
 |256|15360|26268|
 All sizes are in bits. Thus we need a very large prime, for example $p > 2^{2048}$, for security these days.
 ### Elliptic Curve Group over $\mathbb{Z}_p$
 Currently, the best-known attacks are generic attacks, so we can use much smaller parameters than $\mathbb{Z}_p^\ast$. Often the groups have sizes about $2^{256}$, $2^{384}$, $2^{512}$.
 [^1]: We didn't require $q$ to be prime!
 [^2]: A Graduate Course in Applied Cryptography
 [^3]: Recall that the best known attack was only 4 times faster than brute-force search.
--- a/Cryptography/2023-10-19-public-key-encryption.md
+++ b/Cryptography/2023-10-19-public-key-encryption.md
@@ -0,0 +1,457 @@
 ---
 share: true
 toc: true
 math: true
 categories:
  - Lecture Notes
  - Modern Cryptography
 tags:
  - lecture-note
  - cryptography
  - security
 title: 9. Public Key Encryption
 date: 2023-10-19
 github_title: 2023-10-19-public-key-encryption
 image:
  path: assets/img/posts/Lecture Notes/Modern Cryptography/mc-09-ss-pke.png
 attachment:
  folder: assets/img/posts/Lecture Notes/Modern Cryptography
 ---
 In symmetric encryption, we assumed that the two parties had a shared key in advance. If the two parties do not have a shared key, **public-key encryption** can be used to encrypt messages.
 ## Public Key Encryption
 > **Definition.** A **public key encryption scheme** $\mc{E} = (G, E, D)$ is a triple of efficient algorithms: a **key generation** algorithm $G$, an **encryption algorithm** $E$, a decryption algorithm $D$.
 > 
 > - $G$ generates a key pair as $(pk, sk) \la G()$. $pk$ is called a **public key** and $sk$ is called a **secret key**.
 > - $E$ takes a public key $pk$ and a message $m$ and outputs ciphertext $c \la E(pk, m)$.
 > - $D$ takes a secret key $sk$ and a ciphertext $c$ and outputs plaintext $m \la D(sk, c)$ or a special $\texttt{reject}$ value $\bot$.
 > 
 > We say that $\mc{E} = (G, E, D)$ is defined over $(\mc{M}, \mc{C})$.
 $G$ and $E$ may be probabilistic, but $D$ must be deterministic. Also, correctness condition is required. For any $(pk, sk)$ and $m \in \mc{M}$,
 $$
 \Pr[D(sk, E(pk, m)) = m] = 1.
 $$
 Public key $pk$ will be publicized. After Alice obtains $pk$, she can use it to encrypt any message and send it to Bob. This is the only interaction required. The public key can be used multiple times, and others besides Alice can use it too. Finally, $sk$ should be hard to compute from $pk$, obviously for security.
 ## CPA Security for Public Key Encryption
 ### Semantic Security
 The following notion of security is only for an eavesdropping adversary.
 ![mc-09-ss-pke.png](../../../assets/img/posts/Lecture%20Notes/Modern%20Cryptography/mc-09-ss-pke.png)
 > **Definition.** Let $\mc{E} = (G, E, D)$ be a public key encryption scheme defined over $(\mc{M}, \mc{C})$. For an adversary $\mc{A}$, we define two experiments.
 > 
 > **Experiment** $b$.
 > 1. The challenger computes $(pk, sk) \la G()$ and sends $pk$ to the adversary.
 > 2. The adversary chooses $m_0, m_1 \in \mc{M}$ of the same length, and sends them to the challenger.
 > 3. The challenger computes $c \la E(pk, m_b)$ and sends $c$ to the adversary.
 > 4. $\mc{A}$ outputs a bit $b' \in \braces{0, 1}$.
 > 
 > Let $W_b$ be the event that $\mc{A}$ outputs $1$ in experiment $b$. The **advantage** of $\mc{A}$ with respect to $\mc{E}$ is defined as
 > 
 > $$
 > \Adv[SS]{\mc{A}, \mc{E}} = \abs{\Pr[W_0] - \Pr[W_1]}.
 > $$
 > 
 > $\mc{E}$ is **semantically secure** if $\rm{Adv}_{\rm{SS}}[\mc{A}, \mc{E}]$ is negligible for any efficient $\mc{A}$.
 Note that $pk$ is sent to the adversary, and adversary can encrypt any message! Thus, encryption must be randomized. Otherwise, the adversary can compute $E(pk, m_b)$ for each $b$ and compare with $c$ given from the challenger.
 ### Semantic Security $\implies$ CPA
 For symmetric ciphers, semantic security (one-time) did not guarantee CPA security (many-time). But in public key encryption, semantic security implies CPA security. This is because *the attacker can encrypt any message using the public key*.
 First, we check the definition of CPA security for public key encryption. It is similar to that of symmetric ciphers, compare with [CPA Security for symmetric key encryption (Modern Cryptography)](./2023-09-19-symmetric-key-encryption.md#cpa-security).
 > **Definition.** For a given public-key encryption scheme $\mc{E} = (G, E, D)$ defined over $(\mc{M}, \mc{C})$ and given an adversary $\mc{A}$, define experiments 0 and 1.
 > 
 > **Experiment $b$.**
 > 1. The challenger computes $(pk, sk) \la G()$ and sends $pk$ to the adversary.
 > 2. The adversary submits a sequence of queries to the challenger:
 > 	- The $i$-th query is a pair of messages $m_{i, 0}, m_{i, 1} \in \mc{M}$ of the same length.
 > 3. The challenger computes $c_i = E(pk, m_{i, b})$ and sends $c_i$ to the adversary.
 > 4. The adversary computes and outputs a bit $b' \in \braces{0, 1}$.
 > 
 > Let $W_b$ be the event that $\mc{A}$ outputs $1$ in experiment $b$. Then the **CPA advantage with respect to $\mc{E}$** is defined as
 > 
 > $$
 > \Adv[CPA]{\mc{A}, \mc{E}} = \abs{\Pr[W_0] - \Pr[W_1]}.
 > $$
 > 
 > If the CPA advantage is negligible for all efficient adversaries $\mc{A}$, then $\mc{E}$ is **semantically secure against chosen plaintext attack**, or simply **CPA secure**.
 We formally prove the following theorem.
 > **Theorem.** If a public-key encryption scheme $\mc{E}$ is semantically secure, then it is also CPA secure.
 > 
 > For any $q$-query CPA adversary $\mc{A}$, there exists an SS adversary $\mc{B}$ such that
 > 
 > $$
 > \rm{Adv}_{\rm{CPA}}[\mc{A}, \mc{E}] = q \cdot \rm{Adv}_{\rm{SS}}[\mc{B}, \mc{E}].
 > $$
 *Proof*. The proof uses a hybrid argument. For $j = 0, \dots, q$, the *hybrid game* $j$ is played between $\mc{A}$ and a challenger that responds to the $q$ queries as follows:
 - On the $i$-th query $(m_{i,0}, m_{i, 1})$, respond with $c_i$ where
 	- $c_i \la E(pk, m_{i, 1})$ if $i \leq j$.
 	- $c_i \la E(pk, m_{i, 0})$ otherwise.
 So, the challenger in hybrid game $j$ encrypts $m_{i, 1}$ in the first $j$ queries, and encrypts $m_{i, 0}$ for the rest of the queries. If we define $p_j$ to be the probability that $\mc{A}$ outputs $1$ in hybrid game $j$, we have
 $$
 \Adv[CPA]{\mc{A}, \mc{E}} = \abs{p_q - p_0}
 $$
 since hybrid $q$ is precisely experiment $1$, hybrid $0$ is experiment $0$. With $\mc{A}$, we define $\mc{B}$ as follows.
 1. $\mc{B}$ randomly chooses $\omega \la \braces{1, \dots, q}$.
 2. $\mc{B}$ obtains $pk$ from the challenger, and forwards it to $\mc{A}$.
 3. For the $i$-th query $(m_{i, 0}, m_{i, 1})$ from $\mc{A}$, $\mc{B}$ responds as follows.
 	- If $i < \omega$, $c \la E(pk, m_{i, 1})$.
 	- If $i = \omega$, forward query to the challenger and forward its response to $\mc{A}$.
 	- Otherwise, $c_i \la E(pk, m_{i, 0})$.
 4. $\mc{B}$ outputs whatever $\mc{A}$ outputs.
 Note that $\mc{B}$ can encrypt queries on its own, since the public key is given. Define $W_b$ as the event that $\mc{B}$ outputs $1$ in experiment $b$ in the semantic security game. For $j = 1, \dots, q$, we have that
 $$
 \Pr[W_0 \mid \omega = j] = p_{j - 1}, \quad \Pr[W_1 \mid \omega = j] = p_j.
 $$
 In experiment $0$ with $\omega = j$, $\mc{A}$ receives encryptions of $m_{i, 1}$ in the first $j - 1$ queries and receives encryptions of $m_{i, 1}$ for the rest of the queries. The second equation follows similarly.
 Then the SS advantage can be calculated as
 $$
 \begin{aligned}
 \Adv[SS]{\mc{B}, \mc{E}} &= \abs{\Pr[W_0] - \Pr[W_1]} \\
 &= \frac{1}{q} \abs{\sum_{j=1}^q \Pr[W_0 \mid \omega = j] - \sum_{j = 1}^q \Pr[W_1 \mid \omega = j]} \\
 &= \frac{1}{q} \abs{\sum_{j=1}^q (p_{j-1} - p_j)} \\
 &= \frac{1}{q} \Adv[CPA]{\mc{A}, \mc{E}}.
 \end{aligned}
 $$
 ## CCA Security for Public Key Encryption
 We also define CCA security for public key encryption, which models a wide spectrum of real-world attacks. The definition is also very similar to that of symmetric ciphers, compare with [CCA security for symmetric ciphers (Modern Cryptography)](./2023-09-26-cca-security-authenticated-encryption.md#cca-security).
 > **Definition.** Let $\mc{E} = (G, E, D)$ be a public-key encryption scheme over $(\mc{M}, \mc{C})$. Given an adversary $\mc{A}$, define experiments $0$ and $1$.
 > 
 > **Experiment $b$.**
 > 1. The challenger computes $(pk, sk) \la G()$ and sends $pk$ to the adversary.
 > 2. $\mc{A}$ makes a series of queries to the challenger, which is one of the following two types.
 > 	- *Encryption*: Send $(m_{i_,0}, m_{i, 1})$ and receive $c'_i \la E(pk, m_{i, b})$.
 > 	- *Decryption*: Send $c_i$ and receive $m'_i \la D(sk, c_i)$.
 > 	- Note that $\mc{A}$ is not allowed to make a decryption query for any $c_i'$.
 > 3. $\mc{A}$ outputs a pair of messages $(m_0^ * , m_1^*)$.
 > 4. The challenger generates $c^* \la E(pk, m_b^*)$ and gives it to $\mc{A}$.
 > 5. $\mc{A}$ is allowed to keep making queries, but not allowed to make a decryption query for $c^*$.
 > 6. The adversary computes and outputs a bit $b' \in \left\lbrace 0, 1 \right\rbrace$.
 > 
 > Let $W_b$ be the event that $\mc{A}$ outputs $1$ in experiment $b$. Then the **CCA advantage with respect to $\mc{E}$** is defined as
 > 
 > $$
 > \rm{Adv}_{\rm{CCA}}[\mc{A}, \mc{E}] = \left\lvert \Pr[W_0] - \Pr[W_1] \right\lvert.
 > $$
 > 
 > If the CCA advantage is negligible for all efficient adversaries $\mc{A}$, then $\mc{E}$ is **semantically secure against a chosen ciphertext attack**, or simply **CCA secure**.
 Note that encryption queries are not strictly required, since in public-key schemes, the adversary can encrypt any messages on its own. We can consider a restricted security game, where an adversary makes only a single encryption query.
 > **Definition.** If $\mc{A}$ is restricted to making a single encryption query, we denote its advantage by $\Adv[1CCA]{\mc{A}, \mc{E}}$. A public-key encryption scheme $\mc{E}$ is **one-time semantically secure against chosen ciphertext attack**, or simply **1CCA** secure if $\Adv[1CCA]{\mc{A}, \mc{E}}$ is negligible for all efficient adversaries $\mc{A}$.
 Similarly, 1CCA security implies CCA security, as in the above theorem. So to show CCA security for public-key schemes, *it suffices to show that the scheme is 1CCA secure*.
 > **Theorem.** If a public-key encryption scheme $\mc{E}$ is 1CCA secure, then it is also CCA secure.
 *Proof*. Same as the proof in above theorem.
 ### Active Adversaries in Symmetric vs Public Key
 In symmetric key encryption, we studied [authenticated encryption (AE)](./2023-09-26-cca-security-authenticated-encryption.md#authenticated-encryption-ae), which required the scheme to be CPA secure and provide ciphertext integrity. In symmetric key settings, AE implied CCA.
 However in public-key schemes, adversaries can always create new ciphertexts using the public key, which makes the original definition of ciphertext integrity unusable. Thus we directly require CCA security.
 ## Hybrid Encryption and Key Encapsulation Mechanism
 Symmetric key encryptions are significantly faster than public key encryption, so we use public-key encryption for sharing the key, and then the key is used for symmetric key encryption.
 Generate $(pk, sk)$ for the public key encryption, and generate a symmetric key $k$. For the message $m$, encrypt it as
 $$
 (c, c_S) \la \big( E(pk, k), E_S(k, m) \big)
 $$
 where $E_S$ is the symmetric encryption algorithm, $E$ is the public-key encryption algorithm. The receiver decrypts $c$ and recovers $k$ that can be used for decrypting $c_S$. This is a form of **hybrid encryption**. We are *encapsulating* the key $k$ inside a ciphertext, so we call this **key encapsulation mechanism** (KEM).
 We can use public-key schemes for KEM, but there are dedicated constructions for KEM which are more efficient. The dedicated algorithms does the key generation and encryption in one-shot.
 > **Definition.** A KEM $\mc{E}_\rm{KEM}$ consists of a triple of algorithms $(G, E_\rm{KEM}, D_\rm{KEM})$.
 > 
 > - The key generation algorithm generates $(pk, sk) \la G()$.
 > - The encapsulation algorithm generates $(k, c_\rm{KEM}) \la E_\rm{KEM}(pk)$.
 > - The decapsulation algorithm generates $k \la D_\rm{KEM}(sk, c_\rm{KEM})$.
 Note that $E_\rm{KEM}$ only takes the public key as a parameter. The correctness condition is that for any $(pk, sk) \la G()$ and any $(k, c_\rm{KEM}) \la E_\rm{KEM}(pk)$, we must have $k \la D_\rm{KEM}(sk, c_\rm{KEM})$.
 Using the KEM, the symmetric key is automatically encapsulated during encryption process.
 > **Definition.** A KEM scheme is secure if any efficient adversary cannot distinguish between $(c_\rm{KEM}, k_0)$ and $(c_\rm{KEM}, k_1)$, where $k_0$ is generated by $E(pk)$, and $k_1$ is chosen randomly from $\mc{K}$.
 Read more about this in Exercise 11.9.[^1]
 ## The ElGamal Encryption
 We introduce a public-key encryption scheme based on the hardness of discrete logarithms.
 > **Definition.** Suppose we have two parties Alice and Bob. Let $G = \left\langle g \right\rangle$ be a cyclic group of prime order $q$, let $\mc{E}_S = (E_S, D_S)$ be a symmetric cipher.
 > 
 > 1. Alice chooses $sk = \alpha \la \Z_q$, computes $pk = g^\alpha$ and sends $pk$ to Bob.
 > 2. Bob also chooses $\beta \la \Z_q$ and computes $k = h^\beta = g^{\alpha\beta}$.
 > 3. Bob sends $\big( g^\beta, E_S(k, m) \big)$ to Alice.
 > 4. Alice computes $k = g^{\alpha\beta} = (g^\beta)^\alpha$ using $\alpha$ and recovers $m$ by decrypting $E_S(k, m)$.
 As a concrete example, set $E_S(k, m) = k \cdot m$ and $D_S(k, c) = k^{-1} \cdot c$. The correctness property automatically holds. Therefore,
 - $G$ outputs $sk = \alpha \la \Z_q$, $pk = h = g^\alpha$.
 - $E(pk, m) = (c_1, c_2) \la (g^\beta, h^\beta \cdot m)$ where $\beta \la \Z_q$.
 - $D(sk, c) = c_2 \cdot (c_1)^{-\alpha} = m$.
 ### Security of ElGamal Encryption
 > **Theorem.** If the DDH assumption holds on $G$, and the symmetric cipher $\mc{E}_S = (E_S, D_S)$ is semantically secure, then the ElGamal encryption scheme $\mc{E}_\rm{EG}$ is semantically secure.
 > 
 > For any SS adversary $\mc{A}$ of $\mc{E}_\rm{EG}$, there exist a DDH adversary $\mc{B}$, and an SS adversary $\mc{C}$ for $\mc{E}_S$ such that
 > 
 > $$
 > \Adv[SS]{\mc{A}, \mc{E}_\rm{EG}} \leq 2 \cdot \Adv[DDH]{\mc{B}, G} + \Adv[SS]{\mc{C}, \mc{E}_S}.
 > $$
 *Proof Idea*. For any $m_0, m_1 \in G$ and random $\gamma \la \Z_q$,
 $$
 E_S(g^{\alpha\beta}, m_0) \approx_c E_S(g^{\gamma}, m_0) \approx_c E_S(g^\gamma, m_1) \approx_c E_S(g^{\alpha\beta}, m_1).
 $$
 The first two and last two ciphertexts are computationally indistinguishable since the DDH problem is hard. The second and third ciphertexts are also indistinguishable since $\mc{E}_S$ is semantically secure.
 *Proof*. Full proof in Theorem 11.5.[^1]
 Note that $\beta \la \Z_q$ must be chosen differently for each encrypted message. This is the randomness part of the encryption, since $pk = g^\alpha, sk =\alpha$ are fixed.
 ### Hashed ElGamal Encryption
 **Hashed ElGamal encryption** scheme is a variant of the original ElGamal scheme, where we use a hash function $H : G \ra \mc{K}$, where $\mc{K}$ is the key space of $\mc{E}_S$.
 The only difference is that we use $H(g^{\alpha\beta})$ as the key.[^2]
 > 1. Alice chooses $sk = \alpha \la \Z_q$, computes $pk = g^\alpha$ and sends $pk$ to Bob.
 > 2. Bob also chooses $\beta \la \Z_q$ and computes $h^\beta = g^{\alpha\beta}$**, and sets $k = H(g^{\alpha\beta})$.**
 > 3. Bob sends $\big( g^\beta, E_S(k, m) \big)$ to Alice.
 > 4. Alice computes $g^{\alpha\beta} = (g^\beta)^\alpha$ using $\alpha$, **computes $k = H(g^{\alpha\beta})$** and recovers $m$ by decrypting $E_S(k, m)$.
 This is also semantically secure, under the random oracle model.
 > **Theorem.** Let $H : G \ra \mc{K}$ be modeled as a random oracle. If the CDH assumption holds on $G$ and $\mc{E}_S$ is semantically secure, then the hashed ElGamal scheme $\mc{E}_\rm{HEG}$ is semantically secure.
 *Proof Idea*. Given a ciphertext $\big( g^\beta, E_S(k, m) \big)$ with $k = H(g^{\alpha\beta})$, the adversary learns nothing about $k$ unless it constructs $g^{\alpha\beta}$. This is because we modeled $H$ as a random oracle. If the adversary learns about $k$, then this adversary breaks the CDH assumption for $G$. Thus, if CDH assumption holds for the adversary, $k$ is completely random, so the hashed ElGamal scheme is secure by the semantic security of $\mc{E}_S$.
 *Proof*. Refer to Theorem 11.4.[^1]
 Since the hashed ElGamal scheme is semantically secure, it is automatically CPA secure. But this is not CCA secure, and we need a stronger assumption.
 ### Interactive Computational Diffie-Hellman Problem (ICDH)
 > **Definition.** Let $G = \left\langle g \right\rangle$ be a cyclic group of prime order $q$. Let $\mc{A}$ be a given adversary.
 > 
 > 1. The challenger chooses $\alpha, \beta \la \Z_q$ and sends $g^\alpha, g^\beta$ to the adversary.
 > 2. The adversary makes a sequence of **DH-decision oracle queries** to the challenger.
 > 	- Each query has the form $(v, w) \in G^2$, challenger replies with $1$ if $v^\alpha = w$, replies $0$ otherwise.
 > 3. The adversary calculates and outputs some $w \in G$.
 > 
 > We define the **advantage in solving the interactive computational Diffie-Hellman problem for $G$** as
 > 
 > $$
 > \Adv[ICDH]{\mc{A}, G} = \Pr[w = g^{\alpha\beta}].
 > $$
 > 
 > We say that the **interactive computational Diffie-Hellman (ICDH) assumption** holds for $G$ if for any efficient adversary $\mc{A}$, $\Adv[ICDH]{\mc{A}, G}$ is negligible.
 This is also known as **gap-CDH**. Intuitively, it says that even if we have a DDH solver, CDH is still hard.
 ### CCA Security of Hashed ElGamal
 > **Theorem.** If the gap-CDH assumption holds on $G$ and $\mc{E}_S$ provides AE and $H : G \ra \mc{K}$ is a random oracle, then the hashed ElGamal scheme is CCA secure.
 *Proof*. See Theorem 12.4.[^1] (very long)
 ## The RSA Encryption
 The RSA scheme was originally designed by Rivest, Shamir and Adleman in 1977.[^3] The RSA trapdoor permutation is used in many places such as SSL/TLS, both for encryption and digital signatures.
 ### Textbook RSA Encryption
 The "textbook RSA" is done as follows.
 - Key generation algorithm $G$ outputs $(pk, sk)$.
 	- Sample two large random primes $p, q$ and set $N = pq$.
 	- Choose $e \in \Z$ such that $\gcd(e, \phi(N)) = 1$, compute $d = e^{-1} \bmod{\phi(N)}$.
 	- Output $pk = (N, e)$, $sk = (N, d)$.
 - Encryption $E(pk, m) = m^e \bmod N$.
 - Decryption $D(sk, c) = c^d \bmod N$ .
 Correctness holds by **Fermat's little theorem**. $ed = 1 \bmod \phi(N)$, so
 $$
 D(sk, (E(pk, m))) = m^{ed} = m^{1 + k(p-1)(q-1)} \bmod N.
 $$
 Since $m^{p-1} = 1 \bmod p$, $m^{ed} = m \bmod N$ (holds trivially if $p \mid m$). A similar argument holds for modulus $q$, so we have $m^{ed} = m \bmod N$.
 ### Attacks on Textbook RSA Encryption
 But this scheme is not CPA secure, since it is deterministic and the ciphertext is malleable. For instance, one can choose two messages to be $1$ and $2$. Then the ciphertext is easily distinguishable.
 Also, ciphertext is malleable by the **homomorphic property**. If $c_1 = m_1^e \bmod N$ and $c_2 = m_2^e \bmod N$, then set $c =c_1c_2 = (m_1m_2)^e \bmod N$, which is an encryption of $m_1m_2$.
 #### Attack on KEM
 Assume that the textbook RSA is used as KEM. Suppose that $k$ is $128$ bits, and the attacker sees $c = k^e \bmod N$. With high probability ($80\%$), $k = k_1 \cdot k_2$ for some $k_1, k_2 < 2^{64}$. Using the homomorphic property, $c = k_1^e k_2^e \bmod N$, so the following attack is possible.
 1. Build a table of $c\cdot k_2^{-e}$ for $0 \leq k_2 < 2^{64}$.
 2. For each $1 \leq k_1 < 2^{64}$, compute $k_1^e$ to check if it is in the table.
 3. Output a match $(k_1, k_2)$.
 The attack has complexity $\mc{O}(2^{n/2})$ where $n$ is the key length.
 ## Trapdoor Functions
 Textbook RSA is not secure, but it is a **one-way trapdoor function**.
 A **one-way function** is a function that is computationally hard to invert. But we sometimes need to invert the functions, so we need functions that have a **trapdoor**. A trapdoor is a secret door that allows efficient inversion, but without the trapdoor, the function must be still hard to invert.
 > **Definition.** Let $\mc{X}$ and $\mc{Y}$ be finite sets. A **trapdoor function scheme** $\mc{T} = (G, F, I)$ defined over $(\mc{X}, \mc{Y})$ is a triple of algorithms.
 > 
 > - $G$ is a probabilistic key generation algorithm that outputs $(pk, sk)$, where $pk$ is the public key and $sk$ is the secret key.
 > - $F$ is a deterministic algorithm that outputs $y \la F(pk, x)$ for $x \in \mc{X}$.
 > - $I$ is a deterministic algorithm that outputs $x \la I(sk, y)$ for $y \in \mc{Y}$.
 The correctness property says that for any $(pk, sk) \la G()$ and $x \in \mc{X}$, $I(sk, F(pk, x)) = x$. So $sk$ is the trapdoor that inverts this function.
 One-wayness is defined as a security game.
 > **Definition.** Given a trapdoor function scheme $\mc{T} = (G, F, I)$ and an adversary $\mc{A}$, define a security game as follows.
 > 
 > 1. The challenger computes $(pk, sk) \la G()$, $x \la \mc{X}$ and $y \la F(pk, x)$.
 > 2. The challenger sends $pk$ and $y$ to the adversary.
 > 3. The adversary computes and outputs $x' \in \mc{X}$.
 > 
 > $\mc{A}$ wins if $\mc{A}$ inverts the function. The advantage is defined as
 > 
 > $$
 > \Adv[OW]{\mc{A}, \mc{T}} = \Pr[x = x'].
 > $$
 > 
 > If the advantage is negligible for any efficient adversary $\mc{A}$, then $\mc{T}$ is **one-way**.
 A one-way trapdoor function is not an encryption. The algorithm is deterministic, so it is not CPA secure. Never encrypt with trapdoor functions.
 ### Textbook RSA as a Trapdoor Function
 It is easy to see that the textbook RSA is a trapdoor function.
 - Key generation algorithm $G$ chooses random primes $p, q$ and sets $N = pq$.
 	- Then chooses integer $e$ such that $\gcd(e, \phi(N)) = 1$.
 	- Set $d = e^{-1} \bmod \phi(N)$.
 - Then $F(pk, x) = x^e \bmod N$, and $I(sk, y) = y^d \bmod N$.
 - The correctness property holds by the above proof.
 But is RSA a *secure* trapdoor function? Is it one-way?
 - If $d$ is known, it is obviously not one-way.
 - If $\phi(N)$ is known, it is not one-way.
 	- One can find $d = e^{-1} \bmod \phi(N)$.
 - If $p$ and $q$ are known, it is not one-way.
 	- $\phi(N) = (p-1)(q-1)$.
 Thus, if factoring is easy, RSA is not one-way. Thus if RSA is a secure trapdoor function, then factoring must be hard. How about the converse? We don't have a proof, but it seems reasonable to assume.
 ## The RSA Assumption
 The RSA assumption says that the RSA problem is hard, which implies that RSA is a **one-way** trapdoor function.
 ### The RSA Problem
 > **Definition.** Let $\mc{T}_\rm{RSA} = (G, F, I)$ the RSA trapdoor function scheme. Given an adversary $\mc{A}$,
 > 
 > 1. The challenger chooses $(pk, sk) \la G()$ and $x \la \Z_N$.
 > 	- $pk = (N, e)$, $sk = (N, d)$.
 > 2. The challenger computes $y \la x^e \bmod N$ and sends $pk$ and $y$ to the adversary.
 > 3. The adversary computes and outputs $x' \in \Z_N$.
 > 
 > The adversary wins if $x = x'$. The advantage is defined as
 > 
 > $$
 > \rm{Adv}_{\rm{RSA}}[\mc{A}, \mc{T_\rm{RSA}}] = \Pr[x = x'].
 > $$
 > 
 > We say that the **RSA assumption** holds if the advantage is negligible for any efficient $\mc{A}$.
 ## RSA Public Key Encryption (ISO Standard)
 - Let $(E_S, D_S)$ be a symmetric encryption scheme over $(\mc{K}, \mc{M}, \mc{C})$ that provides AE.
 - Let $H : \Z_N^{\ast} \ra \mc{K}$ be a hash function.
 The RSA public key encryption is done as follows.
 - Key generation is the same.
 - Encryption
 	1. Choose random $x \la \Z_N^{\ast}$ and let $y = x^e \bmod N$.
 	2. Compute $c \la E_S(H(x), m)$.
 	3. Output $c' = (y, c)$.
 - Decryption
 	- Output $D_S(H(y^d), c)$.
 This works because $x = y^d \bmod N$ and $H(y^d) = H(x)$. In short, this uses RSA trapdoor function as a **key exchange mechanism**, and the actual encryption is done by symmetric encryption.
 It is known that with RSA assumption and $H$ modeled as a random oracle, this scheme is CPA secure.
 ### Optimizations for RSA
 The computation time depends on the exponents $e, d$.
 - To speed up RSA, choose a small public exponent $e$.
 	- $e = 65537 = 2^{16} + 1$ is often used, which only takes $17$ multiplications.
 - But $d$ cannot be too small.
 	- RSA is insecure for $d < N^{0.25}$. (Wiener'87)
 	- RSA is insecure for $d < N^{0.292}$. (BD'98)
 	- Is RSA secure for $d < N^{0.5}$? (open problem)
 - Often, encryption is fast, but decryption is slow.
 	- ElGamal takes approximately the same time for both.[^4]
 ## Attacks on RSA Implementation
 - Timing Attack
 	- Time to compute $c^d \bmod N$ exposes $d$.
 	- More $1$'s in the binary representation of $d$ leads to more multiplications.
 - Power Attack
 	- The power consumption of a smartcard during the computation of $c^d \bmod N$ exposes $d$.
 - Faults Attack
 	- An error during computation exposes $d$.
 - Poor Randomness
 	- Poor entropy at initialization, then same $p$ is generated for multiple devices.
 	- Collect modulus $N$ from many public keys, and their $\gcd$ will be $p$.
 	- *PRG must be properly seeded when generating keys.*
 [^1]: A Graduate Course in Applied Cryptography.
 [^2]: There is another variant that uses $H : G^2 \ra \mc{K}$ and sets $H(g^\beta, g^{\alpha\beta})$ as the key. This one is also semantically secure, and gives further security properties than the one in the text.
 [^3]: This was one year before ElGamal.
 [^4]: Discrete logarithms have the same complexity for average case and worst case, but this is not the case for RSA. (Source?)
--- a/Cryptography/2023-10-26-digital-signatures.md
+++ b/Cryptography/2023-10-26-digital-signatures.md
@@ -0,0 +1,245 @@
 ---
 share: true
 toc: true
 math: true
 categories:
  - Lecture Notes
  - Modern Cryptography
 tags:
  - lecture-note
  - cryptography
  - security
 title: 10. Digital Signatures
 date: 2023-10-26
 github_title: 2023-10-26-digital-signatures
 image:
  path: assets/img/posts/Lecture Notes/Modern Cryptography/mc-10-dsig-security.png
 attachment:
  folder: assets/img/posts/Lecture Notes/Modern Cryptography
 ---
 ## Digital Signatures
 > **Definition.** A **signature scheme** $\mc{S} = (G, S, V)$ is a triple of efficient algorithms, where $G$ is a **key generation** algorithm, $S$ is a **signing** algorithm, and $V$ is a **verification** algorithm.
 > 
 > - A probabilistic algorithm $G$ outputs a pair $(pk, sk)$, where $sk$ is called a secret **signing key**, and $pk$ is a public **verification key**.
 > - Given $sk$ and a message $m$, a probabilistic algorithm $S$ outputs a **signature** $\sigma \la S(sk, m)$.
 > - $V$ is a deterministic algorithm that outputs either $\texttt{{accept}}$ or $\texttt{reject}$ for $V(pk, m, \sigma)$.
 The correctness property requires that all signatures generated by $S$ is always accepted by $V$. For all $(pk, sk) \la G$ and $m \in \mc{M}$,
 $$
 \Pr[V(pk, m, S(sk, m)) = \texttt{{accept}}] = 1.
 $$
 ### Properties of Digital Signatures
 - Digital signatures can be verified by anyone, whereas MACs can be verified by the parties sharing the same key.
 	- No need to share a key for digital signatures.
 - **Non-repudiation**: cannot deny having created the signature.
 	- Signatures can only be created by people having the secret key.
 	- In cases where the secret key is leaked, then we don't have non-repudiation.
 	- In MACs, the secret key is shared by two parties, so we don't have non-repudiation.
 - Must trust the identity of the public key.
 	- How do you trust that this public key is Alice's?
 	- We need **public key infrastructure** (PKI).
 ### Applications
 - Electronic document signing
 - HTTPS/TLS certificates
 - Software installation
 - Authenticated email (DKIM)
 - Bitcoins
 ## Secure Digital Signatures
 The definition is similar to the [secure MAC](./2023-09-21-macs.md#secure-mac-unforgeability). The adversary can perform a **chosen message attack**, but cannot create an **existential forgery**.
 ![mc-10-dsig-security.png](../../../assets/img/posts/Lecture%20Notes/Modern%20Cryptography/mc-10-dsig-security.png)
 > **Definition.** Let $\mc{S} = (G, S, V)$ be a signature scheme defined over $(\mc{M}, \Sigma)$. Given an adversary $\mc{A}$, the game goes as follows.
 > 
 > 1. The challenger generates $(pk, sk) \la G()$ and sends $pk$ to $\mc{A}$.
 > 2. $\mc{A}$ makes a series of *signing queries* to the challenger.
 > 	- Each query is a message $m_i \in \mc{M}$, the challenger responds with $\sigma_i \la S(sk, m_i)$.
 > 3. $\mc{A}$ computes and outputs a candidate forgery pair $(m, \sigma) \in \mc{M} \times \Sigma$.
 > 	- $m \notin \left\lbrace m_1, \dots, m_q \right\rbrace$.
 > 	- $(m, \sigma) \notin \left\lbrace (m_1, \sigma_1), \dots, (m_q, \sigma_q) \right\rbrace$. (strong)
 > 
 > $\mc{A}$ wins if $V(pk, m, \sigma) = \texttt{accept}$, let this event be $W$. The advantage of $\mc{A}$ with respect to $\mc{S}$ is defined as
 > 
 > $$
 > \rm{Adv}_{\rm{SIG}}[\mc{A}, \mc{S}] = \Pr[W].
 > $$
 > 
 > If the advantage is negligible for all efficient adversaries $\mc{A}$, the signature scheme $S$ is (strongly) **secure**. $\mc{S}$ is **existentially unforgeable under a chosen message attack**.
 - We do not make verification queries, since the adversary can always check any signature.
 - The normal definition of security is sufficient. Secure signature schemes can be converted into strongly secure signature schemes. See Exercise 14.10.[^1]
 ### Message Confusion
 Two different messages $m, m'$ can produce the same signature $\sigma$. In this case, the scheme is vulnerable to **message confusion**. See Exercise 13.3.[^1]
 In common implementations, we consider $m$, $m'$ both to be valid. But there may be situations that this is undesirable. For those cases, a signature is would be a *binding commitment* to the message, and there will be no confusion.
 ### Signer Confusion
 Suppose that $(m, \sigma)$ is a valid pair with $pk$, i.e, $V(pk, m, \sigma) = \texttt{accept}$. But an attacker can generate $pk'$ different from $pk$ such that $V(pk', m, \sigma) = \tt{accept}$. In this cases, we have **signer confusion** since both can claim to have signed $m$. See Exercise 13.4.[^1]
 ### Strongly Binding Signatures
 **Strongly binding signatures** prevent both message confusion and signer confusion.
 Any signature scheme can be made strongly binding by appending a collision resistant hash of $(pk, m)$ to the signature. See Exercise 13.5.[^1]
 ## Extending the Message Space
 We can extend the message space of a secure digital signature scheme, [as we did for MACs](./2023-09-28-hash-functions.md#mac-domain-extension). Let $\mc{S} = (G, S, V)$ be a signature scheme defined over $(\mc{M}, \Sigma)$ and let $H : \mc{M}' \ra \mc{M}$ be a hash function with $\left\lvert \mc{M}' \right\lvert \geq \left\lvert \mc{M} \right\lvert$.
 Define a new signature scheme $\mc{S}' = (G, S', V')$ over $(\mc{M}', \Sigma)$ as
 $$
 S'(sk, m) = S(sk, H(m)), \qquad V'(pk, m, \sigma) = V(pk, H(m), \sigma).
 $$
 This is often called the **hash-and-sign paradigm**, and the new signature scheme is also secure.
 > **Theorem.** Suppose that $\mc{S}$ is a secure signature scheme and $H$ is a collision resistant hash function. Then $\mc{S}'$ is a secure signature.
 > 
 > If $\mc{A}$ is an adversary attacking $\mc{S}'$, then there exist an adversary $\mc{B}_\mc{S}$ attacking $\mc{S}$ and an adversary $\mc{B}_H$ attacking $H$ such that
 > 
 > $$
 > \rm{Adv}_{\rm{SIG}}[A, \mc{S}'] \leq \rm{Adv}_{\rm{SIG}}[\mc{B}_\mc{S}, \mc{S}] + \rm{Adv}_{\rm{CR}}[\mc{B}_H, H].
 > $$
 *Proof*. The proof is identical to the theorem for MACs.
 ## Digital Signature Constructions
 We can build secure signature schemes from hash functions, trapdoor permutations, or from discrete logarithms.
 ### Textbook RSA Signatures
 This is the signature scheme based on the textbook RSA. It is also insecure.
 - Key generation: $pk = (N, e)$ and $sk = (N, d)$ are chosen to satisfy $d = e^{-1} \bmod \phi(N)$ for $N = pq$.
 - Sign: $S(sk, m) = m^d \bmod N$.
 - Verify: $V(pk, m, \sigma)$ returns $\texttt{accept}$ if and only if $\sigma^e = m \bmod N$.
 Here are some possible attacks.
 - No message attack
 	- Just return $(\sigma^e, \sigma)$ for some $\sigma$. Then it passes verification.
 - Attack using the homomorphic property.
 	- Suppose we want to forge a message $m$.
 	- Pick $m_1 \in \Z_N^{\ast}$ and set $m_2 = m\cdot m_1^{-1} \bmod N$.
 	- Query signatures for both messages and multiply the responses.
 		- $\sigma = \sigma_1 \cdot \sigma_2 = m_1^e \cdot m^e \cdot m_1^{-e} = m^e \bmod N$.
 	- Then $(m, \sigma)$ is a valid pair.
 Because of the second attack, the textbook RSA signature is **universally forgeable**. This property is used to create **blind signatures**, where the signer creates a signature without any knowledge about the message. See Exercise 13.15.[^1]
 ### RSA Full Domain Hash Signature Scheme
 Given a hash function $H : \mc{M} \ra \mc{Y}$, the **RSA full domain hash** signature scheme $\mc{S}_\rm{RSA-FDH}$ is defined as follows.
 - Key generation: $pk = (N, e)$ and $sk = (N, d)$ are chosen to satisfy $d = e^{-1} \bmod \phi(N)$ for $N = pq$.
 - Sign: $S(sk, m) = H(m)^d \bmod N$.
 - Verify: $V(pk, m, \sigma)$ returns $\texttt{accept}$ if and only if $\sigma^d = H(m) \bmod N$.
 This scheme is now secure.
 > **Theorem.** If the hash function $H$ is modeled as a random oracle, and the RSA assumptions holds, then $\mc{S}_\rm{RSA-FDH}$ is a secure signature scheme.
 > 
 > For any $q$-query adversary $\mc{A}$ against hashed RSA, there exists an adversary $\mc{B}$ solving the RSA problem such that
 > 
 > $$
 > \rm{Adv}_{\rm{SIG}}[\mc{A}, \mc{S}_\rm{RSA-FDH}] \leq q \cdot \rm{Adv}_{\rm{RSA}}[\mc{B}].
 > $$
 ### Full Domain Hash Signature Scheme
 The following is a description of a **full domain hash** scheme $\mc{S}_\rm{FDH}$, constructed from trapdoor permutation scheme $\mc{T} = (G, F, I)$.
 - Key generation: $(pk, sk) \la G()$.
 - Sign: $S(sk, m)$ returns $\sigma \la I(sk, H(m))$.
 - Verify: $V(pk, m, \sigma)$ returns $\texttt{accept}$ if and only if $F(pk, \sigma) = H(m)$.
 This scheme $\mc{S}_\rm{FDH} = (G, S, V)$ is secure if $\mc{T}$ is a **one-way trapdoor permutation** and $H$ is a random oracle.
 > **Theorem.** Let $\mc{T} = (G,F,I)$ be a one-way trapdoor permutation defined over $\mc{X}$. Let $H : \mc{M} \ra \mc{X}$ be a hash function, modeled as a random oracle. Then the derived FDH signature scheme $\mc{S}_\rm{FDH}$ is a secure signature scheme.
 *Proof*. See Theorem 13.3.[^1]
 ## Schnorr Digital Signature Scheme
 This one uses discrete logarithms.
 ### The Schnorr Identification Protocol
 This scheme is originally from the **Schnorr identification protocol**.
 Let $G = \left\langle g \right\rangle$ be a cyclic group of prime order $q$. We consider an interaction between two parties, prover $P$ and a verifier $V$. The prover has a secret $\alpha \in \Z_q$ and the verification key is $u = g^\alpha$. **$P$ wants to convince $V$ that he knows $\alpha$, but does not want to reveal $\alpha$**.
 ![mc-10-schnorr-identification.png](../../../assets/img/posts/Lecture%20Notes/Modern%20Cryptography/mc-10-schnorr-identification.png)
 The protocol $\mc{I}_\rm{sch} = (G, P, V)$ works as follows.
 > 1. A **secret key** $\alpha \la \Z_q$ and **verification key** $u \la g^\alpha$ is generated. The prover $P$ has $\alpha$ and the verifier $V$ has $u$.
 > 2. $P$ computes a random $\alpha_t \la \Z_q$, and sends $u_t \la g^{\alpha_t}$ to $V$.
 > 3. $V$ chooses a random $c \la \Z_q$ and sends it to $P$.
 > 4. $P$ computes $\alpha_z \la \alpha_t + \alpha c \in \Z_q$ and sends it to $V$.
 > 5. $V$ checks if $g^{\alpha_z} = u_t \cdot u^c$. Accept if and only if it is equal.
 - $u_t$ is the **commitment** sent to the verifier.
 - $c$ is the **challenge** sent to the prover.
 	- If $P$ can predict the challenge, $P$ can choose $\alpha_t$ and $\alpha_z$ so that verifier accepts it.
 - $\alpha_z$ is the **response** sent to the verifier.
 We must check a few things.
 - **Correctness**: If $P$ has the correct $\alpha$, then $g^{\alpha_z} = g^{\alpha_t} \cdot (g^\alpha)^c = u_t \cdot u^c$.
 - **Soundness**: If $P$ does not have the correct $\alpha$, it is reject with probability $1 - \frac{1}{q}$.
 	- We can repeat this many times then the probability of reject is $1 - \frac{1}{q^n} \ra 1$.
 	- Thus $q$ (the size of the challenge space) must be large.
 - **Zero-knowledge**: $V$ learns no information about $x$ from the conversation.
 	- This will be revisited later. See [here](2023-11-07-sigma-protocols.md#the-schnorr-identification-protocol-revisited).
 > **Theorem.** The Schnorr identification protocol is secure if the DL problem is hard, and the challenge space $\mc{C}$ is large.
 ### Schnorr Digital Signature Scheme
 We *transform* the above protocol to a signature scheme.[^2] We need a hash function $H : \mc{M} \times G \ra \mc{C}$, modeled as a random oracle. The protocol originally involves interaction between two parties, but a signature is computed by a single party. Intuitively, $H$ will play the role of the verifier.
 The **Schnorr signature scheme** $\mc{S}_\rm{sch} = (G, S, V)$ is defined as follows.
 - Key generation: a **secret key** $sk = \alpha \la \Z_q$ and **public key** $pk = u \la g^\alpha$ is generated.
 - Sign: $S(sk, m)$ outputs $\sigma = (u_t, \alpha_z)$ where
 	- Choose random $\alpha_t \la \Z_q$ and set $u_t \la g^{\alpha_t}$.
 	- **Compute $c \la H(m, u_t)$** and set $\alpha_z \la \alpha_t + \alpha c$.
 - Verify: $V(pk, m, \sigma)$ outputs $\texttt{accept}$ if and only if $g^{\alpha_z} = u_t \cdot u^c$.
 	- $c \la H(m, u_t)$ can be computed and $u$ is known.
 Since $H$ is being modeled as a random oracle, the signer cannot predict the value of the challenge $c$. Also, $c$ must take both $m$ and $u_t$ as input, since without $m$, the signature is not related to $m$ (the signature has no $m$ term inside it). On the other hand, without $u_t$, then the scheme is insecure since the Schnorr identification protocol is HVZK. See Exercise 19.12.[^1]
 > **Theorem.** If $H$ is modeled as a random oracle and Schnorr's identification protocol is secure, then Schnorr's signature scheme is also secure.
 *Proof*. See Theorem 19.7.[^1]
 Note that $\alpha \la \Z_q$ must be chosen randomly every time.
 ## Digital Signature Algorithm
 Schnorr's scheme was protected by a patent, so NIST opted for a ad-hoc signature scheme based on a prime order subgroup of $\Z_p^{\ast}$. This algorithm eventually became the **Digital Signature Algorithm** (DSA). The standard was updated to support elliptic curve groups over a finite field, resulting in **ECDSA**.
 ## Public Key Infrastructure
 How would you trust public keys? We introduce **digital certificates** for this.
 Read in [public key infrastructure (Internet Security)](../Internet%20Security/2023-10-16-pki.md).
 [^1]: A Graduate Course in Applied Cryptography
 [^2]: By using the [Fiat-Shamir transform](2023-11-07-sigma-protocols.md#the-fiat-shamir-transform).
--- a/Cryptography/2023-10-31-advanced-topics.md
+++ b/Cryptography/2023-10-31-advanced-topics.md
@@ -0,0 +1,222 @@
 ---
 share: true
 toc: true
 math: true
 categories:
  - Lecture Notes
  - Modern Cryptography
 tags:
  - lecture-note
  - cryptography
  - security
 title: 11. Advanced Topics
 date: 2023-10-31
 github_title: 2023-10-31-advanced-topics
 ---
 ## Ciphertext Indistinguishability
 - By **Shafi Goldwasser** and **Silvio Micali**
 	- Turing Award in 2012
 An adversary should not be able to...
 - **(Semantic Security)** gain any partial information about a secret.
 - **(Ciphertext Indistinguishability)** distinguish pairs of ciphertexts based on the chosen messages.
 They showed that
 - These two definitions are equivalent under chosen-plaintext attack.
 - Encryption schemes must be randomized.
 > **Definition.** A symmetric key encryption scheme $E$ is **semantically secure** if for any efficient adversary $\mc{A}$, there exists an efficient $\mc{A}'$ such that for any efficiently computable functions $f$ and $h$,
 > 
 > $$
 > \bigg\lvert \Pr\left[ \mc{A}\big( E(k, m), h(m) \big) = f(m) \right] - \Pr\left[ \mc{A}'\big( h(m) \big) = f(m) \right] \bigg\lvert
 > $$
 > 
 > is negligible.
 ## Commitment Schemes
 A commitment scheme is for committing a value, and opening it later. The committed value cannot be forged.
 > **Definition.** A **commitment scheme** for a finite message space $\mc{M}$ is a pair of efficient algorithms $\mc{C} = (C, V)$ satisfying the following.
 > 
 > - For a message $m \in \mc{M}$ to be committed, $(c, o) \la C(m)$, where $c$ is the **commitment string**, and $o$ is an **opening string**.
 > - $V$ is a deterministic algorithm that $V(m, c, o)$ is either $\texttt{accept}$ or $\texttt{reject}$.
 > - **Correctness**: for all $m \in \mc{M}$, if $(c, o) \la C(m)$ then $V(m, c, o) = \texttt{accept}$.
 Suppose Alice wants to commit a message $m$. She computes $(c, o) \la C(m)$, and sends the commitment string $c$ to Bob, and keeps the opening string $o$ to herself. After some time, Alice sends the opening string $o$ to open the commitment, then Bob will verify the commitment by computing $V(m, c, o)$.
 ### Secure Commitment Schemes
 The scheme must satisfy the following properties. First, the commitment must open to a single message. This is called the **binding** property. Next, the commitment must not reveal any information about the message. This is called the **hiding** property.
 > **Definition.** A commitment scheme $\mc{C} = (C, V)$ is **binding** if for every efficient adversary $\mc{A}$ that outputs a $5$-tuple $(c, m_1, o_1, m_2, o_2)$, the probability
 > 
 > $$
 > \Pr[m_1 \neq m_2 \land V(m_1, c, o_1) = V(m_2, c, o_2) = \texttt{{accept}}]
 > $$
 > 
 > is negligible.
 The hiding property is defined as a security game.
 > **Definition.** Let $\mc{C} = (C, V)$ be a commitment scheme. Given an adversary $\mc{A}$, define two experiments.
 > 
 > **Experiment $b$**.
 > 1. $\mc{A}$ sends $m_0, m_1 \in \mc{M}$ to the challenger.
 > 2. The challenger computes $(c, o) \la C(m_b)$ and sends $c$ to $\mc{A}$.
 > 3. $\mc{A}$ computes and outputs $b' \in \braces{0, 1}$.
 > 
 > Let $W_b$ be the event that $\mc{A}$ outputs $1$ in experiment $b$. The **advantage** of $\mc{A}$ with respect to $\mc{C}$ is defined as
 > 
 > $$
 > \Adv{\mc{A}, \mc{C}} = \abs{\Pr[W_0] - \Pr[W_1]}.
 > $$
 > 
 > If the advantage is negligible for all efficient adversaries $\mc{A}$, then the commitment scheme $\mc{C}$ has the **hiding** property.
 Next, the definition of secure commitment schemes.
 > **Definition.** A commitment scheme $\mc{C} = (C, V)$ is **secure** if it is both hiding and binding.
 ### Non-binding Encryption Schemes
 A semantically secure cipher does not always yield a secure commitment scheme. One might be tempted to use a secure cipher $(E, D)$ as follows.
 - For $m \in \mc{M}$, choose $k \la \mc{K}$ and set $\big( E(k, m), k \big) \la C(m)$.
 - $V(m, c, k)$ accepts if and only if $D(k, c) = m$.
 However, it may be feasible to find another $k' \in \mc{K}'$ such that $D(k, c) \neq D(k', c)$. As an example, consider the one-time pad. It is easy for the committer to manipulate the message. $c = m \oplus k$, so later set $k' = k \oplus m \oplus m'$ as the opening string, then $c \oplus k' = m'$, resulting in a different message.
 ## Constructions of Commitment Schemes
 ### Commitment from Secure PRGs
 To commit a bit, we can use a secure PRG. The following is due to Naor.
 > Let $G : \mc{S} \ra \mc{R}$ be a secure PRG where $\left\lvert \mc{R} \right\lvert \geq \left\lvert \mc{S} \right\lvert^3$ and $\mc{R} = \braces{0, 1}^n$. Suppose that Bob wants to commit a bit $b_0 \in \braces{0, 1}$.
 > 
 > 1. Alice chooses a random $r \in \mc{R}$ and sends it to Bob.
 > 2. Bob chooses a random $s \in \mc{S}$ and computes $c \la C(s, r, b_0)$, where
 > 
 > 	$$
 > 	C(s, r, b_0) = \begin{cases} G(s) & (b_0 = 0) \\ G(s) \oplus r & (b_0 = 1). \end{cases}
 > 	$$
 > 
 > 	Then Bob outputs $(c, s)$ as the commitment and the opening string.
 > 3. During opening, Bob sends $(b_0, s)$ to Alice.
 > 4. Alice accepts if and only if $C(s, r, b_0) = c$.
 Correctness is obvious, since Alice recomputes $C(s, r, b_0)$.
 The hiding property follows since $G(s)$ and $G(s) \oplus r$ are indistinguishable if $G$ is a secure PRG.
 The binding property follows if $1 / \left\lvert \mc{S} \right\lvert$ is negligible. For Bob to open $c$ as both $0$ and $1$, he must find two seeds $s_0, s_1 \in \mc{S}$ such that $c = G(s_0) = G(s_1) \oplus r$. Then $r = G(s_0) \oplus G(s_1)$. There are at most $\left\lvert \mc{S} \right\lvert^2$ possible $r \in \mc{R}$ values that this can happen. The probability that Alice chooses such $r$ is
 $$
 \left\lvert \mc{S} \right\lvert^2 / \left\lvert \mc{R} \right\lvert \leq \left\lvert \mc{S} \right\lvert^2 / \left\lvert \mc{S} \right\lvert^3 = 1 / \left\lvert \mc{S} \right\lvert
 $$
 by assumption.
 The downside of the above protocol is that it has to be interactive.
 #### Coin Flipping Protocol
 A bit commitment scheme can be used for a **coin flipping protocol**. Suppose that Alice and Bob are flipping coins, when they are physically distant from each other.
 > 1. Bob chooses a random bit $b_0 \la \braces{0, 1}$.
 > 2. Execute the commitment protocol.
 > 	- Alice obtains a commitment string $c$ of $b_0$.
 > 	- Bob keeps an opening string $o$.
 > 3. Alice chooses a random bit $b_1 \la \braces{0, 1}$, and sends it to Bob.
 > 4. Bob reveals $b_0$ and $s$ to Alice, she verifies that $c$ is valid.
 > 5. The final outcome is $b = b_0 \oplus b_1$.
 After step $2$, Alice has no information about $b_0$ because of the hiding property. Her choice of $b_1$ is unbiased, and cannot affect the final outcome. Next, in step $4$, $b_0$ cannot be manipulated by the binding property.
 Thus, $b_0$ and $b_1$ are both random, so $b$ is either $0$ or $1$ each with probability $1/2$.[^1]
 ### Commitment Scheme from Hashing
 > Let $H : \mc{X} \ra \mc{Y}$ be a collision resistant hash function, where $\mc{X} = \mc{M} \times \mc{R}$. $\mc{M}$ is the message space, and $\mc{R}$ is a finite nonce space. For $m \in \mc{M}$, the derived commitment scheme $\mc{C}_H = (C, V)$ is defined as follows.
 > 
 > - $C(m)$: choose random $o \la \mc{R}$, set $c = H(m, o)$ and output $(c, o)$.
 > - $V(m, c, o)$: output $\texttt{accept}$ if and only if $c = H(m, o)$.
 Correctness is obvious.
 The binding property follows since $H$ is collision resistant. If it is easy to find a $5$-tuple $(c, m_1, o_1, m_2, o_2)$ such that $c = H(m_1, o_1) = H(m_2, o_2)$, $H$ is not collision resistant.
 The hiding property follows if $H$ is modeled as a random oracle, or has a property called **input hiding**. For adversarially chosen $m_1, m_2 \in \mc{M}$ and random $o \la \mc{R}$, the distributions of $H(m_1, o)$ and $H(m_2, o)$ are computationally indistinguishable.
 Additionally, this scheme is **non-malleable** if $H$ is modeled as a random oracle and $\mc{Y}$ is sufficiently large.[^2]
 ### Commitment Scheme from Discrete Logarithms
 > Let $G = \left\langle g \right\rangle$ be a cyclic group of prime order $q$. Let $h$ be chosen randomly from $G$.
 > 
 > - $C(m)$: choose random $o \la \mathbb{Z}_q$ and $c \la g^m h^o$ and return $(c, o)$.
 > - $V(m, c, o)$: output $\texttt{accept}$ if and only if $c = g^m h^o$.
 Correctness is obvious.
 The binding property follows from the DL assumption. If an adversary finds $m_1, m_2$, $o_1, o_2$ such that $c = g^{m_1} h^{o_1} = g^{m_2} h^{o_2}$, then $h = g^{(m_2 - m_1)/(o_1 - o_2)}$, solving the discrete logarithm problem for $h$.
 The hiding property follows since $h$ is uniform in $G$ and $o$ is also uniform in $\mathbb{Z}_q$. Then $g^m h^o$ is uniform in $G$, not revealing any information.
 ## Post Quantum Cryptography
 Quantum computers use **qubits** and **quantum gates** for computation. A **qubit** is a *quantum bit*, a **superposition** of two states $\ket{0}$ and $\ket{1}$.
 $$
 \ket{\psi} = \alpha \ket{0} + \beta \ket{1}
 $$
 where $\alpha, \beta \in \mathbb{C}$ and $\left\lvert \alpha \right\lvert^2 + \left\lvert \beta \right\lvert^2 = 1$. The quantum gates are usually orthogonal matrices.
 The *superposition* may give the false impression that a quantum computer tries all possible solutions in parallel, but the actual magic comes from **complex amplitudes**.
 Quantum computers use **quantum interference**, carefully choreograph computations so that wrong answers *cancel out* their amplitudes, while correct answers combine. This process increases the probability of measuring correct results. Naturally, only a few special problems allow this choreograph.
 A scheme is **post-quantum secure** if it is secure against an adversary who has access to a quantum computer. Post-quantum cryptography is about classical algorithms that are believed to withstand quantum attacks.
 AES is probably safe, since it still takes $\mc{O}(2^{n/2})$ to solve it. (Grover's algorithm) Also, lattice-based cryptography is another candidate.
 ## Shor's Algorithm
 But factorization and discrete logarithms are not safe. The core idea is that a quantum computer is very good at detecting periodicity. This is done by using the **quantum Fourier transform** (QFT).
 ### Quantum Factorization
 Let $n \in \mathbb{Z}$ and $0\neq g \in \mathbb{Z}_n$. Let $\gamma_g : \mathbb{Z} \ra \mathbb{Z}_n$ be defined as $\gamma_g(\alpha) = g^\alpha$. This function is periodic, since $g^{\phi(n)} = 1$ by Euler's generalization. Also, the order of $g$ will certainly divide the period.
 Thus, find a period $p$, and let $t$ be the smallest positive integer such that $g^{p/2^t} \neq 1$. Then $\gcd(n, g^{p/2^t} - 1)$ is a non-trivial factor of $n$ with probability about $1/2$ over the choice of $g$. See Exercise 16.10.[^3]
 Shor's algorithm factors $n$ in $\mc{O}(\log^3 n)$ time. RSA is not a secure one-way trapdoor function for quantum computers.
 ### Quantum Discrete Logarithms
 Let $G = \left\langle g \right\rangle$ be a cyclic group of prime order $q$. Let $u = g^\alpha$. Consider the function $f : \mathbb{Z}^2 \ra G$ defined as
 $$
 f(\gamma, \delta) = g^\gamma \cdot u^\delta.
 $$
 The period of this function is $(\alpha, -1)$, since for all $(\gamma, \delta) \in \mathbb{Z}^2$,
 $$
 f(\gamma + \alpha, \delta - 1) = g^{\gamma} \cdot g^\alpha \cdot u^\delta \cdot u^{-1} = g^\gamma \cdot u^\delta = f(\gamma, \delta).
 $$
 This period can be found in $\mc{O}(\log^3 q)$ time. The DL assumption is false for quantum computers.
 (Detailed explanation to be added...)
 [^1]: There is one caveat. Bob gets to know the final result before Alice. If the outcome is not what he desired, he could abort the protocol in some way, like sending an invalid $c$, and go over the whole process again.
 [^2]: A commitment scheme is **malleable** if a commitment $c = (c_1, c_2)$ of a message $m$ can be transformed into a commitment $c' = (c_1, c_2 + \delta)$ of a message $m + \delta$.
 [^3]: A Graduate Course in Applied Cryptography.
--- a/Cryptography/2023-11-02-zkp-intro.md
+++ b/Cryptography/2023-11-02-zkp-intro.md
@@ -0,0 +1,113 @@
 ---
 share: true
 toc: true
 math: true
 categories:
  - Lecture Notes
  - Modern Cryptography
 tags:
  - lecture-note
  - cryptography
  - security
 title: 12. Zero-Knowledge Proof (Introduction)
 date: 2023-11-02
 github_title: 2023-11-02-zkp-intro
 image:
  path: assets/img/posts/Lecture Notes/Modern Cryptography/mc-12-id-protocol.png
 attachment:
  folder: assets/img/posts/Lecture Notes/Modern Cryptography
 ---
 - In 1980s, the notion of *zero knowledge* was proposed by Shafi Goldwasser, Silvio micali and Charles Rackoff.
 - **Interactive proof systems**: a **prover** tries to convince the **verifier** that some statement is true, by exchanging messages.
 	- What if the prover is trying to trick the verifier?
 	- What if the verifier is an adversary that tries to obtain more information?
 - These proof systems are harder to build in the digital world.
 	- This is because it is easy to copy data in the digital world.
 ## Identification Protocol
 ![mc-12-id-protocol.png](../../../assets/img/posts/Lecture%20Notes/Modern%20Cryptography/mc-12-id-protocol.png)
 > **Definition.** An **identification protocol** is a triple of algorithms $\mc{I} = (G, P, V)$ satisfying the following.
 > 
 > - $G$ is a probabilistic **key generation** algorithm that outputs $(vk, sk) \leftarrow G()$. $vk$ is the **verification key** and $sk$ is the **secret key**.
 > - $P$ is an interactive protocol algorithm called the **prover**, which takes the secret key $sk$ as an input.
 > - $V$ is an interactive protocol algorithm called the **verifier**, which takes the verification key $vk$ as an input and outputs $\texttt{accept}$ or $\texttt{reject}$.
 > 
 > For all possible outputs $(vk, sk)$ of $G$, at the end of the interaction between $P(sk)$ and $V(vk)$, $V$ outputs $\texttt{accept}$ with probability $1$.
 ### Password Authentication
 A client is trying to log in, must prove its identity to the server. But the client cannot trust the server (verifier), so the client must prove itself without revealing the secret. The password is the secret in this case. The login is a *proof* that the client is who it claims to be. What should be the verification key? Setting $vk = sk$ certainly works, but the server learns the password, so this should not be used.
 Instead, we could set $vk = H(sk)$ by using a hash function $H$. Then the client sends the password, server computes the hash and checks if it is equal. This method still reveals the plaintext password to the server.
 ## Example: 3-Coloring
 Suppose we are given a graph $G = (V, E)$, which we want to color the vertices with at most $3$ colors, so that no two adjacent vertices have the same color. This is an NP-complete problem.
 Bob has a graph $G$ and he is trying to $3$-color the graph. Alice shows up and claims that there is a way to $3$-color $G$. If the coloring is valid, Bob is willing to buy the solution, but he cannot trust Alice. Bob won't pay until he is convinced that Alice has a solution, and Alice won't give the solution until she receives the money. How can Alice and Bob settle this problem?
 ### Protocol
 > 1. Bob gives Alice the graph $G = (V, E)$.
 > 2. Alice shuffles the colors and colors the graph. The coloring is hidden to Bob.
 > 3. Bob randomly picks a single edge $(u, v) \in E$ of this graph.
 > 4. Alice reveals the colors of $u$ and $v$.
 - If $u$ and $v$ have the same color, Alice is lying to Bob.
 - If they have different colors, Alice *might be* telling the truth.
 - What if Alice just sends two random colors in step $4$?
 	- We can use **commitment schemes** so that Alice cannot manipulate the colors after Bob's query.
 	- Specifically, send the colors of each $v$ using a commitment scheme.
 	- For Bob's query $(u, v)$, send the opening strings of $u$ and $v$.
 - What if Alice doesn't have a solution, but Bob picks an edge with different colors just by luck?
 	- We can repeat the protocol many times.
 	- For each protocol instance, an invalid solution can pass with probability $p = \frac{1}{\abs{E}}$.
 	- Repeat this many times, then $p^n \rightarrow 0$, so invalid solutions will pass with negligible probability.
 - Does Bob's query reveal anything about the solution?
 	- No, Alice randomizes colors for every protocol instance.
 	- Need formal definition and proof for this.[^1]
 ## Zero Knowledge Proof (ZKP)
 We need three properties for a **zero-knowledge proof** (ZKP).
 - (**Completeness**) If the statement is true, an honest verifier must accept the fact by an honest prover.
 - (**Soundness**) If the statement is false, no cheating prover can convince an honest verifier, except with some small probability.
 - (**Zero Knowledge**) If the statement is true, no verifier (including honest and cheating) learns anything other than the truth of the statement. The statement does not reveal anything about the prover's secret.
 We define these formally.
 > **Definition.** Let $\mc{R} \subset \mc{X} \times \mc{Y}$ be a relation. A statement $y \in \mc{Y}$ is **true** if $(x, y) \in \mc{R}$ for some $x \in \mc{X}$. The set of true statements
 > 
 > $$
 > L_\mc{R} = \braces{y \in \mc{Y} : \exists x \in \mc{X},\; (x, y) \in \mc{R}}
 > $$
 > 
 > is called the **language** defined by $\mc{R}$.
 > **Definition.** A **zero-knowledge proof** is a protocol between a prover $P(x, y)$ and a verifier $V(x)$. At the end of the protocol, the verifier either accepts or rejects.
 In the above definition, $y$ is the statement to prove, and $x$ is the proof of that statement, which the prover wants to hide. The prover and the verifier exchanges messages for the protocol, and this collection of interactions is called the **view** (or conversation, transcript).
 > **Definition.**
 > 
 > - (**Completeness**) If $(x, y) \in R$, then an honest verifier accepts with very high probability.
 > - (**Soundness**) If $y \notin L$, an honest verifier accepts with a negligible probability.
 But how do we define *zero knowledge*? What is *knowledge*? If the verifier learns something, the verifier obtains something that he couldn't have computed without interacting with the prover. Thus, we define zero knowledge as the following.
 > **Definition.** We say that a protocol is **honest verifier zero knowledge** (HVZK) if there exists an efficient algorithm $\rm{Sim}$ (simulator) on input $x$ such that the output distribution of $\rm{Sim}(x)$ is indistinguishable from the distribution of the verifier's view.
 > 
 > $$
 > \rm{Sim}(x) \approx \rm{View}_V[P(x, y) \lra V(x)]
 > $$
 For every verifier $V^{\ast}$, possibly dishonest, there exists a simulator $\rm{Sim}$ such that $\rm{Sim}(x)$ is indistinguishable from the verifier's view $\rm{View}_{V^{\ast}}[P(x, y) \leftrightarrow V^{\ast}(x)]$.
 If the proof is *zero knowledge*, the adversary can simulate conversations on his own without knowing the secret. Meaning that the adversary learns nothing from the conversation.
 [^1]: How to give a formal proof for HVZK...?
--- a/Cryptography/mc-09-ss-pke.png
+++ b/Cryptography/mc-09-ss-pke.png
--- a/Cryptography/mc-10-dsig-security.png
+++ b/Cryptography/mc-10-dsig-security.png
--- a/Cryptography/mc-10-schnorr-identification.png
+++ b/Cryptography/mc-10-schnorr-identification.png
--- a/Cryptography/mc-12-id-protocol.png
+++ b/Cryptography/mc-12-id-protocol.png
Author	SHA1	Message	Date
Sungchan Yi	3c4ace39b2	[PUBLISHER] upload files #139	2024-01-17 18:20:54 +09:00
Sungchan Yi	a76e7f9f43	[PUBLISHER] upload files #138	2024-01-17 18:19:48 +09:00
Sungchan Yi	8173c1033d	[PUBLISHER] upload files #137 * PUSH NOTE : 12. Zero-Knowledge Proofs (Introduction).md * PUSH ATTACHMENT : mc-12-id-protocol.png	2024-01-17 18:15:11 +09:00
Sungchan Yi	e3a721332b	[PUBLISHER] upload files #136	2024-01-17 18:14:26 +09:00
Sungchan Yi	ef4d4258f7	[PUBLISHER] upload files #135 * PUSH NOTE : 10. Digital Signatures.md * PUSH ATTACHMENT : mc-10-dsig-security.png * PUSH ATTACHMENT : mc-10-schnorr-identification.png	2024-01-17 18:14:03 +09:00
Sungchan Yi	69211e0742	[PUBLISHER] upload files #134 * PUSH NOTE : 9. Public Key Encryption.md * PUSH ATTACHMENT : mc-09-ss-pke.png	2024-01-17 18:12:44 +09:00
Sungchan Yi	7dc99f1ef5	[PUBLISHER] upload files #133	2024-01-17 18:11:34 +09:00