mirror of
https://github.com/calofmijuck/blog.git
synced 2025-12-06 22:53:51 +00:00
feat: Modern Cryptography Midterm Posts (#113)
* [PUBLISHER] upload files #108 * PUSH NOTE : 6. Hash Functions.md * PUSH ATTACHMENT : mc-06-merkle-damgard.png * PUSH ATTACHMENT : mc-06-davies-meyer.png * PUSH ATTACHMENT : mc-06-hmac.png * [PUBLISHER] upload files #109 * PUSH NOTE : 7. Key Exchange.md * PUSH ATTACHMENT : mc-07-dhke.png * PUSH ATTACHMENT : mc-07-dhke-mitm.png * PUSH ATTACHMENT : mc-07-merkle-puzzles.png * [PUBLISHER] upload files #110 * PUSH NOTE : 6. Hash Functions.md * PUSH ATTACHMENT : mc-06-merkle-damgard.png * PUSH ATTACHMENT : mc-06-davies-meyer.png * PUSH ATTACHMENT : mc-06-hmac.png * [PUBLISHER] upload files #111 * PUSH NOTE : 7. Key Exchange.md * PUSH ATTACHMENT : mc-07-dhke.png * PUSH ATTACHMENT : mc-07-dhke-mitm.png * PUSH ATTACHMENT : mc-07-merkle-puzzles.png * [PUBLISHER] upload files #112 * PUSH NOTE : 7. Key Exchange.md * PUSH ATTACHMENT : mc-07-dhke.png * PUSH ATTACHMENT : mc-07-dhke-mitm.png * PUSH ATTACHMENT : mc-07-merkle-puzzles.png * fix: fixed links to other posts
This commit is contained in:
@@ -0,0 +1,261 @@
|
|||||||
|
---
|
||||||
|
share: true
|
||||||
|
toc: true
|
||||||
|
math: true
|
||||||
|
categories:
|
||||||
|
- Lecture Notes
|
||||||
|
- Modern Cryptography
|
||||||
|
tags:
|
||||||
|
- lecture-note
|
||||||
|
- cryptography
|
||||||
|
- security
|
||||||
|
title: 6. Hash Functions
|
||||||
|
date: 2023-09-28
|
||||||
|
github_title: 2023-09-28-hash-functions
|
||||||
|
image:
|
||||||
|
path: assets/img/posts/Lecture Notes/Modern Cryptography/mc-06-merkle-damgard.png
|
||||||
|
attachment:
|
||||||
|
folder: assets/img/posts/Lecture Notes/Modern Cryptography
|
||||||
|
---
|
||||||
|
|
||||||
|
Hash functions are functions that take some input an compress them to produce an output of fixed size, usually just called *hash* or *digest*. A desired property of hash function is **collision resistance**.
|
||||||
|
|
||||||
|
Hash functions are also used in hash table data structures, and for data structures, it isn't a huge problem if there is a collision. Although the search time may be affected, there are ways to handle conflicting hashes for each data structure.
|
||||||
|
|
||||||
|
But *cryptographic hash functions* are different. They should *avoid* collisions, since some adversary will attack in order to find collisions and break the system. Thus cryptographic hash functions are much harder to design.
|
||||||
|
|
||||||
|
## Collision Resistance
|
||||||
|
|
||||||
|
Intuitively, a function $H$ is collision resistant if it is computationally infeasible to find a collision for $H$. Formally, this can be defined also in the form of a security game.
|
||||||
|
|
||||||
|
> **Definition.** Let $H$ be a hash function defined over $(\mathcal{M}, \mathcal{T})$. Given an adversary $\mathcal{A}$, the adversary outputs two messages $m_0, m_1 \in \mathcal{M}$.
|
||||||
|
>
|
||||||
|
> $\mathcal{A}$ wins the game if $H(m_0) = H(m_1)$ and $m_0 \neq m_1$. The **advantage** of $\mathcal{A}$ with respect to $H$ is defined as the probability that $\mathcal{A}$ wins the game.
|
||||||
|
>
|
||||||
|
> $$
|
||||||
|
> \mathrm{Adv}_{\mathrm{CR}}[\mathcal{A}, H] = \Pr[H(m_0) = H(m_1) \wedge m_0 \neq m_1].
|
||||||
|
> $$
|
||||||
|
>
|
||||||
|
> If the advantage is negligible for any efficient adversary $\mathcal{A}$, then the hash function $H$ is **collision resistant**.
|
||||||
|
|
||||||
|
With a collision resistant hash function, we can do many things. For example, password hashing is a very common example. Instead of storing the plaintext password, the plaintext is hashed, and the hash is stored. One of the reasons for doing this is for privacy. Even the developers who can access the database shouldn't be able to obtain the plaintext password, and the plaintext password will be safe even if the database is leaked.
|
||||||
|
|
||||||
|
When the user logins, the password user entered will be hashed to compare with the stored hash in the server. It is obvious that we need collision resistant hashes, since if not, a malicious user can login using the collision.
|
||||||
|
|
||||||
|
Another desirable property would be the **one-wayness** of $H$, that it should be hard to find the preimage of any hash. It can be shown that collision resistance implies one-wayness.[^1]
|
||||||
|
|
||||||
|
## MAC Domain Extension
|
||||||
|
|
||||||
|
One possible use of hash function is for extending the domain of MACs. A MAC scheme is usually defined for a fixed block size, so for longer messages, we need other constructions. This is where hash functions can come in.
|
||||||
|
|
||||||
|
Let $\Pi = (S, V)$ be a MAC scheme defined over $(\mathcal{K}, \mathcal{M}, \mathcal{T})$, and let $H : \mathcal{M}' \rightarrow \mathcal{M}$ be a hash function, where $\mathcal{M}'$ is usually larger than $\mathcal{M}$. A naive way to construct a MAC would be to apply the hash first to compress the message and then sign it. It turns out that this new construction is a secure MAC if $\Pi$ is secure and $H$ is collision resistant.
|
||||||
|
|
||||||
|
> **Theorem.** Let $\Pi' = (S', V')$ be a MAC defined over $(\mathcal{K}, \mathcal{M}', \mathcal{T})$. Let
|
||||||
|
>
|
||||||
|
> $$
|
||||||
|
> S'(k, m) = S(k, H(m)), \quad V'(k, m, t) = V(k, H(m), t).
|
||||||
|
> $$
|
||||||
|
>
|
||||||
|
> If $\Pi$ is a secure MAC and $H$ is collision resistant, then $\Pi'$ is a secure MAC.
|
||||||
|
>
|
||||||
|
> For any efficient adversary $\mathcal{A}$ attacking $\Pi'$, there exist a MAC adversary $\mathcal{B} _ \mathrm{MAC}$ attacking $\Pi$ and an adversary $\mathcal{B} _ \mathrm{CR}$ attacking $H$ such that
|
||||||
|
>
|
||||||
|
> $$
|
||||||
|
> \mathrm{Adv}_{\mathrm{MAC}}[\mathcal{A}, \Pi'] \leq \mathrm{Adv}_{\mathrm{MAC}}[\mathcal{B}_\mathrm{MAC}, \Pi] + \mathrm{Adv}_{\mathrm{CR}}[\mathcal{B}_\mathrm{CR}, H].
|
||||||
|
> $$
|
||||||
|
|
||||||
|
*Proof*. See Theorem 8.1.[^2]
|
||||||
|
|
||||||
|
Intuitively, suppose that the MAC scheme $\Pi'$ is insecure. During the MAC security game, $\mathcal{A}$ can either find or not find a collision for $H$. If $\mathcal{A}$ found a collision, $H$ is not collision resistant. If $\mathcal{A}$ didn't find a collision, then $\Pi$ must be broken. Thus we have a contradiction.
|
||||||
|
|
||||||
|
But in reality, this construction is not used very often. We need a *secure* MAC *and* a *collision resistant* hash function, so it is hard to implement.
|
||||||
|
|
||||||
|
## Attacks on Hash Functions
|
||||||
|
|
||||||
|
There are specific attacks that exploit the internal mechanism of some specific hash function, but we only cover generic attacks that work for any given hash function.
|
||||||
|
|
||||||
|
A very simple attack would be the brute force attack. If the hash is $n$ bits, then the attacker can hash $2^n+1$ arbitrary messages to get a collision, by the pigeonhole principle. But usually $n$ is large enough that performing this computation is infeasible.
|
||||||
|
|
||||||
|
### Birthday Attacks
|
||||||
|
|
||||||
|
Actually, the attacker doesn't have to hash that many messages. This is because of the birthday paradox.
|
||||||
|
|
||||||
|
Let $N$ be the size of the hash space. (If the hash is $n$ bits, then $N = 2^n$)
|
||||||
|
|
||||||
|
> 1. Sample $s$ uniform random messages $m_1, \dots, m_s \in \mathcal{M}$.
|
||||||
|
> 2. Compute $x_i \leftarrow H(m_i)$.
|
||||||
|
> 3. Find and output a collision if it exists.
|
||||||
|
|
||||||
|
> **Lemma.** The above algorithm will output a collision with probability at least $1/2$ when $s \geq 1.2\sqrt{N}$.
|
||||||
|
|
||||||
|
*Proof*. We show that the probability of no collisions is less than $1/2$. The probability that there is no collision is
|
||||||
|
|
||||||
|
$$
|
||||||
|
\prod_{i=1}^{s-1}\left( 1-\frac{i}{N} \right) \leq \prod_{i=1}^{s-1} \exp\left( -\frac{i}{N} \right) = \exp\left( -\frac{s(s-1)}{2N} \right).
|
||||||
|
$$
|
||||||
|
|
||||||
|
So solving $\exp\left( -s(s-1)/2N \right) < 1/2$ for $s$ gives approximately $s \geq \sqrt{(2\log2)N} \approx 1.17 \sqrt{N}$.
|
||||||
|
|
||||||
|
In the above proof, we assume that $H$ is uniform. But in reality, $H$ might be biased, but it can be shown that collision probability is minimized when $H$ is uniform, so the above argument holds.
|
||||||
|
|
||||||
|
Note that birthday attacks can be done entirely *offline*. The adversary doesn't have to interact with any users of the system, so adversaries can invest huge computing resources to find a collision, without anybody noticing. Thus, offline attacks are considered more dangerous than *online* attacks that require many interactions.
|
||||||
|
|
||||||
|
## Merkle-Damgård Transform
|
||||||
|
|
||||||
|
Now we want to construct collision resistant hash functions that work for arbitrary input length. Thanks to the **Merkle-Damgård transform**, we can start from a collision resistant hash function that works for short messages.
|
||||||
|
|
||||||
|
The Merkle-Damgård transform gives as a way to extend our input domain of the hash function by iterating the function.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
> **Definition.** Let $h : \left\lbrace 0, 1 \right\rbrace^n \times \left\lbrace 0, 1 \right\rbrace^l \rightarrow \left\lbrace 0, 1 \right\rbrace^n$ be a hash function. The **Merkle-Damgård function derived from $h$** is a function $H$ that works as follows.
|
||||||
|
>
|
||||||
|
> 1. Given an input $m \in \left\lbrace 0, 1 \right\rbrace^{\leq L}$, pad $m$ so that the length of $m$ is a multiple of $l$.
|
||||||
|
> - The padding block $\mathrm{PB}$ must contain an encoding of the input message length. i.e, it is of the form $100\dots00\parallel\left\lvert m \right\lvert$.
|
||||||
|
> 2. Then partition the input into $l$-bit blocks so that $m' = m_1 \parallel m_2 \parallel \cdots \parallel m_s$.
|
||||||
|
> 3. Set $t_0 \leftarrow \mathrm{IV} \in \left\lbrace 0, 1 \right\rbrace^n$.
|
||||||
|
> 4. For $i = 1, \dots, s$, calculate $t_i \leftarrow h(t_{i-1}, m_i)$.
|
||||||
|
> 5. Return $t_s$.
|
||||||
|
|
||||||
|
- The function $h$ is called the **compression function**.
|
||||||
|
- The $t_i$ values are called **chaining values**.
|
||||||
|
- Note that because of the padding block can be at most $l$-bits, the maximum message length is $2^l$, but usually $l = 64$, so it is enough.
|
||||||
|
- $\mathrm{IV}$ is fixed to some value, and is usually set to some complicated string.
|
||||||
|
- We included the length of the message in the padding. This will be used in the security proof.
|
||||||
|
|
||||||
|
The Merkle-Damgård construction is secure.
|
||||||
|
|
||||||
|
> **Theorem.** If $h$ is a collision resistant hash function, then so is $H$.
|
||||||
|
|
||||||
|
*Proof*. We show by contradiction. Suppose that an adversary $\mathcal{A}$ of $H$ found a collision for $H$. Let $H(m) = H(m')$ for $m \neq m'$. Now we construct an adversary $\mathcal{B}$ of $h$. $\mathcal{B}$ will examine $m$ and $m'$ and work its way backwards.
|
||||||
|
|
||||||
|
Suppose that $m = m_1\cdots m_u$ and $m' = m_1'\cdots m_v'$. Let the chaining values be $t_i = h(t_{i-1},m_i)$ and $t_i' = h(t_{i-1}', m_i')$. Then since $H(m) = H(m')$, the very last iteration should give the same output.
|
||||||
|
|
||||||
|
$$
|
||||||
|
h(t_{u-1},m_u) = h(t_{v-1}', m_v').
|
||||||
|
$$
|
||||||
|
|
||||||
|
Suppose that $t_{u-1} \neq t_{v-1}'$ and $m_u \neq m_v'$. Then this is a collision for $h$, so $\mathcal{B}$ returns this collision, and we are done. So suppose otherwise. Then $t_{u-1} = t_{v-1}'$ and $m_u = m_v'$. But because the last block contains the padding, the padding values must be the same, which means that the length of these two messages must have been the same, so $u = v$.
|
||||||
|
|
||||||
|
Now we have $t_{u-1} = t_{u-1}'$, which implies $h(t_{u-2}, m_{u-1}) = h(t_{u-2}', m_{u-1}')$. We can now repeat the same process until the first block. If $\mathcal{B}$ did not find any collision then it means that $m_i = m_i'$ for all $i$, so $m = m'$. This is a contradiction, so $\mathcal{B}$ must have found a collision.
|
||||||
|
|
||||||
|
By the above argument, we see that $\mathrm{Adv} _ {\mathrm{CR}}[\mathcal{A}, H] = \mathrm{Adv} _ {\mathrm{CR}}[\mathcal{B}, h]$.
|
||||||
|
|
||||||
|
### Attacking Merkle-Damgård Hash Functions
|
||||||
|
|
||||||
|
See Joux's attack.[^2]
|
||||||
|
|
||||||
|
## Davies-Meyer Compression Functions
|
||||||
|
|
||||||
|
Now we only have to build a collision resistant compression function. We can build these functions from either a block cipher, or by using number theoretic primitives.
|
||||||
|
|
||||||
|
Number theoretic primitives will be shown after we learn some number theory.[^3] An example is shown in [collision resistance using DL problem (Modern Cryptography)](../2023-10-03-key-exchange#collision-resistance-based-on-dl-problem).
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
> **Definition.** Let $\mathcal{E} = (E, D)$ be a block cipher over $(\mathcal{K}, X, X)$ where $X = \left\lbrace 0, 1 \right\rbrace^n$. The **Davies-Meyer compression function derived from $E$** maps inputs in $X \times \mathcal{K}$ to outputs in $X$, defined as follows.
|
||||||
|
>
|
||||||
|
> $$
|
||||||
|
> h(x, y) = E(y, x) \oplus x.
|
||||||
|
> $$
|
||||||
|
|
||||||
|
> **Theorem.** Suppose $\mathcal{E}$ is an ideal cipher.[^4] Then finding a collision for $h$ takes $\mathcal{O}(2^{n/2})$ evaluations of $(E, D)$.
|
||||||
|
|
||||||
|
*Proof*. Check Theorem 8.4.[^2]
|
||||||
|
|
||||||
|
Due to the birthday attack, we see that this bound is the best possible.
|
||||||
|
|
||||||
|
There are other constructions of $h$ using the block cipher. But some of them are totally insecure. These are some insecure functions.
|
||||||
|
|
||||||
|
$$
|
||||||
|
h_1(x, y) = E(y, x) \oplus y, \quad h_2(x, y) = E(x, x \oplus y) \oplus x.
|
||||||
|
$$
|
||||||
|
|
||||||
|
Also, just using $E(y, x)$ is insecure.
|
||||||
|
|
||||||
|
## Secure Hash Algorithm (SHA)
|
||||||
|
|
||||||
|
This is a family of hash functions published by NIST.
|
||||||
|
|
||||||
|
- 1993: SHA0
|
||||||
|
- 1995: SHA1
|
||||||
|
- 2001: **SHA2-256** and **SHA2-512** (most widely used)
|
||||||
|
- 2015: SHA3-256 and SHA3-512
|
||||||
|
|
||||||
|
There are known attacks for SHA0 and SHA1, so use at least SHA2.
|
||||||
|
|
||||||
|
SHA1 and SHA2 uses Merkle-Damgård and Davies-Meyer compression function. But if we use just AES, then the block size is $128$ bits, meaning that birthday attacks take $\mathcal{O}(2^{64})$, which is a bit small. So SHA2 uses a different block cipher called SHACAL-2 that uses $256$ bit blocks.
|
||||||
|
|
||||||
|
## HMAC
|
||||||
|
|
||||||
|
We needed a complicated construction for MACs that work on long messages. We might be able to use the collision resistance of hash functions and build a MAC with it.
|
||||||
|
|
||||||
|
### Some Approaches
|
||||||
|
|
||||||
|
Here are a few approaches. Suppose that a compression function $h$ is given and $H$ is a Merkle-Damgård function derived from $h$.
|
||||||
|
|
||||||
|
Recall that [we can construct a MAC scheme from a PRF](../2023-09-21-macs#mac-constructions-from-prfs), so either we want a secure PRF or a secure MAC scheme.
|
||||||
|
|
||||||
|
#### Prepending the Key
|
||||||
|
|
||||||
|
Define $S(k, m) = H(k \parallel m)$. This is insecure by length extension attacks. Given $H(k \parallel m)$, one can compute $H(k \parallel m \parallel m')$ for any $m'$, resulting in forgery.
|
||||||
|
|
||||||
|
#### Appending the Key
|
||||||
|
|
||||||
|
Define $S(k, m) = H(m \parallel k)$. This is vulnerable to an offline attack on $h$. If there is a collision on $h$, then $h(\mathrm{IV}, m) = h(\mathrm{IV}, m')$ for some $m \neq m'$. Then $S(k, m) = S(k, m')$ which results in forgery.
|
||||||
|
|
||||||
|
#### Envelope Method
|
||||||
|
|
||||||
|
Define $S(k, m) = H(k \parallel M \parallel k)$. This can be proven to be a secure PRF under reasonable assumptions. See Exercise 8.17.[^2]
|
||||||
|
|
||||||
|
#### Two-Key Nest
|
||||||
|
|
||||||
|
Define $S((k_1,k_2), m) = H(k_2 \parallel H(k_1 \parallel m))$. This can also be proven to be a secure PRF under reasonable assumptions. See Section 8.7.1.[^2]
|
||||||
|
|
||||||
|
This can be thought of as blocking the length extension attack from prepending the key method.
|
||||||
|
|
||||||
|
### HMAC
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
This is a variant of the two-key nest, but the difference is that the keys $k_1', k_2'$ are not independent. Choose a key $k \leftarrow \mathcal{K}$, and set
|
||||||
|
|
||||||
|
$$
|
||||||
|
k_1 = k \oplus \texttt{ipad}, \quad k_2 = k\oplus \texttt{opad}
|
||||||
|
$$
|
||||||
|
|
||||||
|
where $\texttt{ipad} = \texttt{0x363636}...$ and $\texttt{opad} = \texttt{0x5C5C5C}...$. Then
|
||||||
|
|
||||||
|
$$
|
||||||
|
\mathrm{HMAC}(k, m) = H(k_2 \parallel H(k_1 \parallel m)).
|
||||||
|
$$
|
||||||
|
|
||||||
|
The security proof given for two-key nest does not apply here, since $k_1$ and $k_2$ are not independent. With stronger assumptions on $h$, then we almost get an optimal security bound.
|
||||||
|
|
||||||
|
## The Random Oracle Model
|
||||||
|
|
||||||
|
### Motivation
|
||||||
|
|
||||||
|
Some constructions using cryptographic hash functions cannot be proven secure only using the collision resistance assumption.
|
||||||
|
|
||||||
|
A conservative way to solve this problem would be to construct schemes that can be proven secure, using reasonable assumptions about the hash function. But it may be hard to find such schemes and they may be less efficient than existing approaches that haven't been formally proven. On the other hand, it is unacceptable to use a cryptosystem without a security proof, even though attackers have been unsuccessful.
|
||||||
|
|
||||||
|
Introducing an *idealized model* offers a middle ground to this. The model is not real, and reality is far from ideal. But as long as the model is *reasonable*, proofs under the idealized model is better than nothing. Proof with idealized model lets us understand the scheme better.
|
||||||
|
|
||||||
|
### Random Oracle Model
|
||||||
|
|
||||||
|
The **random oracle model** is a model that treats a cryptographic hash function as a truly random function. In this model, there is a public, random function $H$, that can be evaluated *only* by querying the oracle.
|
||||||
|
|
||||||
|
The random oracle model also provides a formal method that can be used to design and validate cryptosystems using the following approach.
|
||||||
|
|
||||||
|
1. Design a scheme and prove that it is secure in the random oracle model.
|
||||||
|
2. During implementation, replace the random oracle with a cryptographic hash function.
|
||||||
|
|
||||||
|
We hope that the cryptographic hash function used in step 2 is good enough to mimic a random oracle. Then the proof of security in the random oracle model would be still valid in the real world.
|
||||||
|
|
||||||
|
But there are schemes that can be proven insecure when instantiated with hash functions, even if they were proven secure in the random oracle model. Also, any hash function cannot behave like a random oracle/function. So a security proof in the random oracle model suggests that some scheme has no internal design flaws, but it is not enough to claim security of the scheme in the real world.
|
||||||
|
|
||||||
|
[^1]: There is a subtle detail here, refer to [this question](https://crypto.stackexchange.com/questions/17924/does-collision-resistance-imply-or-not-second-preimage-resistance) on cryptography SE.
|
||||||
|
[^2]: A Graduate Course in Applied Cryptography
|
||||||
|
[^3]: These are rarely used since they rely on prime numbers, and prime numbers are expensive. Also block ciphers are blazingly fast compared to computing integers.
|
||||||
|
[^4]: We treat the block cipher as a family of random permutations. i.e, for each $k \in \mathcal{K}$, $E(k, \cdot)$ is a random permutation.
|
||||||
@@ -0,0 +1,244 @@
|
|||||||
|
---
|
||||||
|
share: true
|
||||||
|
toc: true
|
||||||
|
math: true
|
||||||
|
categories:
|
||||||
|
- Lecture Notes
|
||||||
|
- Modern Cryptography
|
||||||
|
tags:
|
||||||
|
- lecture-note
|
||||||
|
- cryptography
|
||||||
|
- security
|
||||||
|
title: 7. Key Exchange
|
||||||
|
date: 2023-10-03
|
||||||
|
github_title: 2023-10-03-key-exchange
|
||||||
|
image:
|
||||||
|
path: assets/img/posts/Lecture Notes/Modern Cryptography/mc-07-dhke.png
|
||||||
|
attachment:
|
||||||
|
folder: assets/img/posts/Lecture Notes/Modern Cryptography
|
||||||
|
---
|
||||||
|
|
||||||
|
In symmetric key encryption, we assumed that the two parties already share the same key. We will see how this can be done.
|
||||||
|
|
||||||
|
In symmetric key settings, a user has to agree and store every key for every other user, so if there are $N$ users in the system, $\mathcal{O}(N^2)$ keys are to be stored in the system. But these keys are secret information, so they have to be handled with care. With so many keys, it is hard to store and manage them securely.
|
||||||
|
|
||||||
|
Distributing a key requires a lot of care. The two parties need a secure channel beforehand, or have to meet physically in person to safely exchange keys. But for open systems, physical meetings cannot be arranged and users are not aware of each other before communicating.
|
||||||
|
|
||||||
|
In summary, symmetric key cryptography has at least three problems.
|
||||||
|
|
||||||
|
1. It is hard to distribute keys securely.
|
||||||
|
2. It is hard to storing and managing many secret keys securely.
|
||||||
|
3. Symmetric key cryptography cannot be applied to open systems.
|
||||||
|
|
||||||
|
Problems 1 and 2 can be solved partially using **trusted third parties** (TTP), but such TTPs become a single point of failure, and is usually used only in a single organization.
|
||||||
|
|
||||||
|
## Diffie-Hellman Key Exchange (DHKE)
|
||||||
|
|
||||||
|
We need a method to share a secret key. For now, assume that the adversary only eavesdrops, and does not tamper with the message.
|
||||||
|
|
||||||
|
### Generic Description and Requirements
|
||||||
|
|
||||||
|
**Diffie-Hellman key exchange** protocol allows two parties to generate a shared secret key, without establishing a physical meeting. Here is a generic description of the protocol.
|
||||||
|
|
||||||
|
> We have two functions $E(\cdot)$ and $F(\cdot, \cdot)$.
|
||||||
|
> 1. Alice chooses a random secret $\alpha$ and computes $E(\alpha)$.
|
||||||
|
> 2. Bob chooses a random secret $\beta$ and computes $E(\beta)$.
|
||||||
|
> 3. Alice and Bob exchange $E(\alpha), E(\beta)$ over an *insecure channel*.
|
||||||
|
> 4. Using the given information, Alice and Bob both compute a shared key $F(\alpha, \beta)$.
|
||||||
|
|
||||||
|
Alice only knows $\alpha, E(\beta)$, and Bob only knows $\beta, E(\alpha)$. With the given information for each party, they compute $F(\alpha, \beta)$ and use it as a shared key. Also, since Alice and Bob are currently exchanging keys, $E(\alpha)$ and $E(\beta)$ are sent over an insecure channel. Then the eavesdropper can see $E(\alpha), E(\beta)$.
|
||||||
|
|
||||||
|
Overall, for this protocol to be secure, $E$ and $F$ should at least satisfy the following.
|
||||||
|
|
||||||
|
- $E$ is easy to compute.
|
||||||
|
- Given $\alpha$ and $E(\beta)$, it is easy to compute $F(\alpha, \beta)$.
|
||||||
|
- Given $E(\alpha)$ and $\beta$, it is easy to compute $F(\alpha, \beta)$.
|
||||||
|
- Given $E(\alpha)$ and $E(\beta)$, it is **hard** to compute $F(\alpha, \beta)$.
|
||||||
|
- $E$ must be a one way function.
|
||||||
|
|
||||||
|
The first three conditions are for the communicating parties, and is sort of a correctness condition that Alice and Bob can agree on the same key efficiently. The last two conditions are a security condition. It should be hard for the eavesdropping adversary to compute the secret, and that it must be hard to recover $x$ from the value of $E(x)$. Otherwise, the adversary will find $\alpha$ or $\beta$ and compute $F(\alpha, \beta)$.
|
||||||
|
|
||||||
|
To implement the above protocol, we need two functions $E$ and $F$ that satisfy the above properties. We rely on the hardness of number theoretic problems to implement this.
|
||||||
|
|
||||||
|
### DHKE Protocol in Detail
|
||||||
|
|
||||||
|
Let $p$ be a large prime, and let $q$ be another large prime dividing $p - 1$. We typically use very large random primes, $p$ is about $2048$ bits long, and $q$ is about $256$ bits long.
|
||||||
|
|
||||||
|
All arithmetic will be done in $\mathbb{Z}_p$. We also consider $\mathbb{Z} _ p^ *$ , the **unit group** of $\mathbb{Z} _ p$. Since $\mathbb{Z} _ p$ is a field, $\mathbb{Z} _ p^ * = \mathbb{Z} _ p \setminus \left\lbrace 0 \right\rbrace$, meaning that $\mathbb{Z} _ p^ *$ has order $p-1$.
|
||||||
|
|
||||||
|
Since $q$ is a prime dividing $p - 1$, $\mathbb{Z}_p^*$ has an element $g$ of order $q$.[^1] Let
|
||||||
|
|
||||||
|
$$
|
||||||
|
G = \left\langle g \right\rangle = \left\lbrace 1, g, g^2, \dots, g^{q-1} \right\rbrace \leq \mathbb{Z}_p^*.
|
||||||
|
$$
|
||||||
|
|
||||||
|
We assume that the description of $p$, $q$ and $g$ are generated at the setup and shared by all parties. Now the actual protocol goes like this.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
> 1. Alice chooses $\alpha \leftarrow \mathbb{Z}_q$ and computes $g^\alpha$.
|
||||||
|
> 2. Bob chooses $\beta \leftarrow \mathbb{Z}_q$ and computes $g^\beta$.
|
||||||
|
> 3. Alice and Bob exchange $g^\alpha$ and $g^\beta$ over an insecure channel.
|
||||||
|
> 4. Using $\alpha$ and $g^\beta$, Alice computes $g^{\alpha\beta}$.
|
||||||
|
> 5. Using $\beta$ and $g^\alpha$, Bob computes $g^{\alpha\beta}$.
|
||||||
|
> 6. The secret key shared by Alice and Bob is $g^{\alpha\beta}$.
|
||||||
|
|
||||||
|
It works!
|
||||||
|
|
||||||
|
### Security of the DHKE Protocol
|
||||||
|
|
||||||
|
The protocol is secure if and only if the following holds.
|
||||||
|
|
||||||
|
> Let $\alpha, \beta \leftarrow \mathbb{Z}_q$. Given $g^\alpha, g^\beta \in G$, it is hard to compute $g^{\alpha\beta} \in G$.
|
||||||
|
|
||||||
|
This is called the **computational Diffie-Hellman assumption**. As we will see below, this is not as strong as the discrete logarithm assumption. But in the real world, CDH assumption is reasonable enough for groups where the DL assumption holds.
|
||||||
|
|
||||||
|
## Discrete Logarithm and Related Assumptions
|
||||||
|
|
||||||
|
We have used $E(x) = g^x$ in the above implementation. This function is called the **discrete exponentiation function**. This function is actually a *group isomorphism*, so it has an inverse function called the **discrete logarithm function**. The name comes from the fact that if $u = g^x$, then it can be written as '$x = \log_g u$'.
|
||||||
|
|
||||||
|
We required that $E$ must be a one-way function for the protocol to work. So it must be hard to compute the discrete logarithm function. There are some problems related to the discrete logarithm, which are used as assumptions in the security proof. They are formalized as a security game, as usual.
|
||||||
|
|
||||||
|
$G = \left\langle g \right\rangle \leq \mathbb{Z} _ p^{ * }$ will be a *cyclic group* of order $q$ and $g$ is given as a generator. Note that $g$ and $q$ are also given to the adversary.
|
||||||
|
|
||||||
|
### Discrete Logarithm Problem (DL)
|
||||||
|
|
||||||
|
> **Definition.** Let $\mathcal{A}$ be a given adversary.
|
||||||
|
>
|
||||||
|
> 1. The challenger chooses $\alpha \leftarrow \mathbb{Z}_q$ and sends $u = g^\alpha$ to the adversary.
|
||||||
|
> 2. The adversary calculates and outputs some $\alpha' \in \mathbb{Z}_q$.
|
||||||
|
>
|
||||||
|
> We define the **advantage in solving the discrete logarithm problem for $G$** as
|
||||||
|
>
|
||||||
|
> $$
|
||||||
|
> \mathrm{Adv}_{\mathrm{DL}}[\mathcal{A}, G] = \Pr[\alpha = \alpha'].
|
||||||
|
> $$
|
||||||
|
>
|
||||||
|
> We say that the **discrete logarithm (DL) assumption** holds for $G$ if for any efficient adversary $\mathcal{A}$, $\mathrm{Adv}_{\mathrm{DL}}[\mathcal{A}, G]$ is negligible.
|
||||||
|
|
||||||
|
So if we assume the DL assumption, it means that DL problem is **hard**. i.e, no efficient adversary can effectively solve the DL problem for $G$.
|
||||||
|
|
||||||
|
### Computational Diffie-Hellman Problem (CDH)
|
||||||
|
|
||||||
|
> **Definition.** Let $\mathcal{A}$ be a given adversary.
|
||||||
|
>
|
||||||
|
> 1. The challenger chooses $\alpha, \beta \leftarrow \mathbb{Z}_q$ and sends $g^\alpha, g^\beta$ to the adversary.
|
||||||
|
> 2. The adversary calculates and outputs some $w \in G$.
|
||||||
|
>
|
||||||
|
> We define the **advantage in solving the computational Diffie-Hellman problem for $G$** as
|
||||||
|
>
|
||||||
|
> $$
|
||||||
|
> \mathrm{Adv}_{\mathrm{CDH}}[\mathcal{A}, G] = \Pr[w = g^{\alpha\beta}].
|
||||||
|
> $$
|
||||||
|
>
|
||||||
|
> We say that the **computational Diffie-Hellman (CDH) assumption** holds for $G$ if for any efficient adversary $\mathcal{A}$, $\mathrm{Adv}_{\mathrm{CDH}}[\mathcal{A}, G]$ is negligible.
|
||||||
|
|
||||||
|
An interesting property here is that given $(g^\alpha, g^\beta)$, it is hard to determine if $w$ is a solution to the problem. ($w \overset{?}{=} g^{\alpha\beta}$)
|
||||||
|
|
||||||
|
### Decisional Diffie-Hellman Problem (DDH)
|
||||||
|
|
||||||
|
Since recognizing a solution to the CDH problem is hard, we have another assumption that it is hard to distinguish a solution to the CDH problem and a random element from $G$.
|
||||||
|
|
||||||
|
> **Definition.** Let $\mathcal{A}$ be a given adversary. We define two experiments 0 and 1.
|
||||||
|
>
|
||||||
|
> **Experiment $b$**.
|
||||||
|
> 1. The challenger chooses $\alpha, \beta, \gamma \leftarrow \mathbb{Z}_q$ and computes the following.
|
||||||
|
>
|
||||||
|
> $$
|
||||||
|
> u = g^\alpha, \quad v = g^\beta, \quad w_0 = g^{\alpha\beta}, \quad w_1 = g^\gamma.
|
||||||
|
> $$
|
||||||
|
>
|
||||||
|
> 2. The challenger sends the triple $(u, v, w_b)$ to the adversary.
|
||||||
|
> 3. The adversary calculates and outputs a bit $b' \in \left\lbrace 0, 1 \right\rbrace$.
|
||||||
|
>
|
||||||
|
> Let $W_b$ be the event that $\mathcal{A}$ outputs $1$ in experiment $b$. We define the **advantage in solving the decisional Diffie-Hellman problem for $G$** as
|
||||||
|
>
|
||||||
|
> $$
|
||||||
|
> \mathrm{Adv}_{\mathrm{DDH}}[\mathcal{A}, G] = \left\lvert \Pr[W_0] - \Pr[W_1] \right\lvert.
|
||||||
|
> $$
|
||||||
|
>
|
||||||
|
> We say that the **decisional Diffie-Hellman (DDH) assumption** holds for $G$ if for any efficient adversary $\mathcal{A}$, $\mathrm{Adv}_{\mathrm{DDH}}[\mathcal{A}, G]$ is negligible.
|
||||||
|
|
||||||
|
For $\alpha, \beta, \gamma \in \mathbb{Z}_q$, the triple $(g^\alpha, g^\beta, g^\gamma)$ is called a **DH-triple** if $\gamma = \alpha\beta$. So the assumption is saying that no efficient adversary can distinguish DH-triples from non DH-triples.
|
||||||
|
|
||||||
|
### Relations Between Problems
|
||||||
|
|
||||||
|
It is easy to see that the following holds.
|
||||||
|
|
||||||
|
> In the order of hardness, DL problem $\gt$ CDH problem $\gt$ DDH problem.
|
||||||
|
|
||||||
|
If an adversary can solve the DL problem, it can solve CDH and DDH, so DL problem is harder. It is known that strict inequality holds.
|
||||||
|
|
||||||
|
If we assume that an easier problem is hard, we have a strong assumption. That is, it is easier to be broken in the future, because we assumed too much.
|
||||||
|
|
||||||
|
> DDH assumption $\implies$ CDH assumption $\implies$ DL assumption
|
||||||
|
|
||||||
|
Suppose we used the DDH assumption in the proof. If the DDH assumption turns out to be false, proofs using the CDH or DL assumption remain valid.
|
||||||
|
|
||||||
|
If we used the DL assumption and it turns out to be false, there will be an efficient algorithm solving the DL problem. Then CDH, DDH problems can also be solved, so proofs using the DDH or CDH assumption will be invalidated. Thus DL assumption is the weakest assumption, since breaking DL will break both CDH and DDH.
|
||||||
|
|
||||||
|
## Multi-Party Diffie-Hellman
|
||||||
|
|
||||||
|
Suppose we want something like a secret group chat, where there are $N$ ($\geq 3$) people and they need to generate a shared secret key. It is known that $N$-party Diffie-Hellman is possible in $N-1$ rounds. Here's how it goes. The indices are all in modulo $N$.
|
||||||
|
|
||||||
|
Each party $i$ chooses $\alpha _ i \leftarrow \mathbb{Z} _ q$, and computes $g^{\alpha _ i}$. The parties communicate in a circular form, and passes the computed value to the $(i+1)$-th party. In the next round, the $i$-th party receives $g^{\alpha _ {i-1}}$ and computes $g^{\alpha _ {i-1}\alpha _ i}$ and passes it to the next party. After $N-1$ rounds, all parties have the shared key $g^{\alpha _ 1\cdots\alpha _ N}$.
|
||||||
|
|
||||||
|
Taking $\mathcal{O}(N)$ steps is impractical in the real world, due to many communications that the above algorithm requires. Researchers are looking for methods to generate a shared key in a single round. It has been solved for $N=3$ using bilinear pairings, but for $N \geq 4$ it is an open problem.
|
||||||
|
|
||||||
|
## Attacking Anonymous Diffie-Hellman Protocol
|
||||||
|
|
||||||
|
We assumed that the adversary only eavesdrops, but if the adversary carries out active attacks, then DHKE is not enough. The major problem is the lack of **authentication**. Alice and Bob are exchanging keys, but they both cannot be sure that there are in fact communicating with the other. An attacker can intercept messages and impersonate Alice or Bob. This attack is called a **man in the middle attack**, and this attack works on any key exchange protocol that lacks authentication.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
The adversary will impersonate Bob when communicating with Alice, and will do the same for Bob by pretending to be Alice. The values of $\alpha, \beta$ that Alice and Bob chose are not leaked, but the adversary can decrypt anything in the middle and obtain the plaintext.
|
||||||
|
|
||||||
|
## Collision Resistance Based on DL Problem
|
||||||
|
|
||||||
|
Suppose that the DL problem is hard on the group $G = \left\langle g \right\rangle$, with prime order $q$. Choose an element $h \in G$, and define a hash function $H : \mathbb{Z}_q \times \mathbb{Z}_q \rightarrow G$ as
|
||||||
|
|
||||||
|
$$
|
||||||
|
H(\alpha, \beta) = g^\alpha h^\beta.
|
||||||
|
$$
|
||||||
|
|
||||||
|
If an adversary were to find a collision, then $H(\alpha, \beta) = H(\alpha', \beta')$, which implies $g^\alpha h^\beta = g^{\alpha'}h^{\beta'}$, thus $h = g^{(\alpha - \alpha') / (\beta' - \beta)}$, calculating the discrete logarithm.
|
||||||
|
|
||||||
|
Thus under the DL assumption, the hash function $H$ is collision resistant.
|
||||||
|
|
||||||
|
## Merkle Puzzles (1974)
|
||||||
|
|
||||||
|
Before Diffie-Hellman, Merkle proposed an idea for secure key exchange protocol using symmetric key cryptography.
|
||||||
|
|
||||||
|
The idea was to use *puzzles*, which are problems that can be solved with some effort.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
> Let $\mathcal{E} = (E, D)$ be a block cipher defined over $(\mathcal{K}, \mathcal{M})$.
|
||||||
|
> 1. Alice chooses random pairs $(k_i, s_i) \leftarrow \mathcal{K} \times \mathcal{M}$ for $i = 1, \dots, L$.
|
||||||
|
> 2. Alice constructs $L$ puzzles, defined as a triple $(E(k_i, s_i), E(k_i, i), E(k_i, 0))$.
|
||||||
|
> 3. Alice randomly shuffles these puzzles and sends them to Bob.
|
||||||
|
> 4. Bob picks a random puzzle $(c_1, c_2, c_3)$ and solves the puzzle by **brute force**, trying all $k \in \mathcal{K}$ until some $D(k, c_3) = 0$ is found.
|
||||||
|
> - If Bob finds two different keys, he indicates Alice that the protocol failed and they start over.
|
||||||
|
> 5. Bob computes $l = D(k, c_2)$ and $s = D(k, c_1)$, sends $l$ to Alice.
|
||||||
|
> 6. Alice will locate the $l$-th puzzle and set $s = s_l$.
|
||||||
|
|
||||||
|
If successful, Alice and Bob can agree on a secret message $s \in \mathcal{M}$. It can be seen that Alice has to do $\mathcal{O}(L)$, Bob has to do $\mathcal{O}(\left\lvert \mathcal{K} \right\lvert)$ amount of work.
|
||||||
|
|
||||||
|
For block ciphers, we commonly set $\mathcal{K}$ large enough so that brute force attacks are infeasible. So for Merkle puzzles, we reduce the key space. For example, if we were to use AES-128 as $\mathcal{E}$, then we can set the first $96$ bits of the key as $0$. Then the search space would be reduced to $2^{32}$, which is feasible for Bob.
|
||||||
|
|
||||||
|
Now consider the adversary who obtains all puzzles $P_i$ and the value $l$. To obtain the secret message $s_l$, adversary has to locate the puzzle $P_l$. But since the puzzles are in random order, the adversary has to solve all puzzles until he finds $P_l$. Thus, the adversary must spend time $\mathcal{O}(L\left\lvert \mathcal{K} \right\lvert)$ to obtain $s$. So we have a quadratic gap here.
|
||||||
|
|
||||||
|
### Performance Issues
|
||||||
|
|
||||||
|
Suppose we set $L \approx \left\lvert \mathcal{K} \right\lvert$. Then first of all, Alice has to create that many puzzles and send all of them to Bob.
|
||||||
|
|
||||||
|
Next, the adversary must spend time $\mathcal{O}(L^2)$, but this doesn't satisfy our definitions of security, since the adversary has advantage about $1/L^2$ with constant work, which is non-negligible. Also, $L$ must be large enough in practice, which raises the first problem again.
|
||||||
|
|
||||||
|
### Impossibility Results
|
||||||
|
|
||||||
|
It is unknown whether we can get a better gap (than quadratic) using a general symmetric cipher. A partial result was given that quadratic gap is the best possible if we only use block ciphers.[^2]
|
||||||
|
|
||||||
|
To get exponential gaps, we need number theory.
|
||||||
|
|
||||||
|
[^1]: By Cauchy's theorem, or use the fact that $\mathbb{Z}_p^*$ is commutative. Finite commutative groups have a subgroup of every order that divides the order of the group.
|
||||||
|
[^2]: R. Impagliazzo and S. Rudich. Limits on the provable consequences of one-way permutations. In Proceedings of the Symposium on Theory of Computing (STOC), pages 44–61, 1989.
|
||||||
Binary file not shown.
|
After Width: | Height: | Size: 6.0 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 8.8 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 8.1 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 16 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 9.3 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 9.7 KiB |
Reference in New Issue
Block a user