Trust Issues #
With the given advancements in AI, quantum computers, and who knows what will be invented in the near future of 5 years, our way of protecting against breaches is simply outdated. With our current encryption, we store data on our database and when sending it to a client, we encrypt and the client decrypts. Encryption has come a long way with async encryption, but the logic behind how we structure this transaction depends on trusting the server to be faithful or secure. You trust your government to keep your personal data safe, but they can never be 100 percent safe.
With that in mind, I came across this encrypting method called Fully Homomorphic Encryption (specifically the CKKS scheme) while researching for my dissertation topic.
Disclaimer: This is purely from a few days of reading and skimming through papers; I do not have deep knowledge.
How It Works #
The 5-year-old explanation I constructed in my head is that it is using some sort of complex math behind the curtain to turn the data into a polynomial equation with a bit of noise (this is going to be important soon).
Let’s say we got two of those equations (2 encrypted data); we can run multiplication and addition on them. If we decrypt the product of that, it will be the same thing as the product of the previous two unencrypted data processed. Using that, on a database, we can run a query with encrypted data directly on them. This leads to one big change in our database server structure: we do not need to have unencrypted raw data to be stored, and the database can be breached without the risk of data reveal. This is huge for biometric data storage.
The Computation Tax #
Now, obviously, there are many known attacks on FHE such as physical electricity pattern leakage, client-side attacks, and such. However, the biggest problem is the complexity itself. The compute power required is a lot; the current production systems will face a bottleneck on speed and computation.
These equations are not represented in a simple few bytes like integers. One polynomial consists of a huge number of coefficients going from 16,384 up to 65,536. It could look like this:
a(x) = a0 + a1x^1 + a2x^2 + … + a(N-1)x^(N-1)
These large coefficients are obviously way too big for a single piece of data; that is where SIMD (Single Instruction, Multiple Data) comes in handy. An encrypted ciphertext has a fixed number of Slots—for a standard 16,384-degree polynomial, you get 8,192 slots.In a biometric database use case, a single face embedding is typically a vector of 512 floating-point numbers. Since we have 8,192 slots, we can ‘pack’ 16 different face embeddings into one single ciphertext (512 x 16 = 8,192) and process all of them simultaneously. This increases the throughput substantially. If we only assign one face embedding to a ciphertext, the rest of the slots are filled with zeros, but the processing time for 16 embeddings versus 1 embedding remains exactly the same. This makes high-throughput batching the only way to make the ‘computation tax’ of FHE worth it.
The next problem is the noise. Every time the data is processed, the product equation has even more noise. Too much noise will turn the equation into random data. Before that happens, bootstrapping takes place, where we recalculate everything to its minimal noise all at once—which is incredibly computation-heavy.
Data Expansion: Standard vs. CKKS #
| Feature | Standard Computing (Plaintext) | FHE (CKKS Ciphertext) |
|---|---|---|
| Basic Unit | 64-bit Integer / Float | Polynomials (e.g., Degree 16,384) |
| Storage Size | 8 Bytes | ~2 MB to 5 MB |
| Processing Goal | Logic & Branching (if/else) | Arithmetic (+ and x only) |
| Analogy | Carrying a single coin | Carrying a 50lb bag of sand to hide one coin |
The Future: No Liability #
An interesting note: FHE is a type of Lattice-based cryptography, which is specifically designed for the threat of quantum computers. While quantum machines will eventually be able to decrypt our AES or RSA protected data with ease, they don’t have a known “shortcut” for the math behind FHE.
So while this is a very promising future, it has a lot of tradeoffs. With the right hardware architecture designed for polynomial computation instead of 64-bit integers—or using a different physical way to calculate like using light or atoms—the calculation seems far more applicable. Instead of trusting someone to keep your data safe and also not use it for themselves, data can be given with no liability.