Searching Encrypted Data: Homomorphic Encryption and Query Privacy
Explore how homomorphic encryption enables searching encrypted data without decryption, and why this matters for zero-knowledge infrastructure and data sovereignty.
The Problem: Privacy vs. Utility
When you encrypt sensitive data and send it to a server for storage or processing, you face a fundamental tradeoff. Either the server has access to the plaintext (breaking privacy), or it can't perform useful operations on your data (breaking utility). This is where searchable encryption and homomorphic encryption step in—they promise something that sounds like magic: computation on encrypted data without decryption.
For developers building zero-knowledge infrastructure, privacy-preserving search is critical. Whether you're designing encrypted messaging, confidential document storage, or compliance-first healthcare records, the ability to query data server-side without exposing plaintext is a game-changer.
Homomorphic Encryption: Compute Without Decryption
Homomorphic encryption (HE) allows you to perform computations on encrypted data and get back encrypted results. When decrypted, those results match what you'd get from computing on plaintext.
There are three flavors:
Partially Homomorphic Encryption (PHE) supports one operation (either addition OR multiplication, but not both). RSA is partially homomorphic for multiplication:
E(m1) × E(m2) = E(m1 × m2)
Somewhat Homomorphic Encryption (SHE) supports a limited number of operations before noise accumulates and decryption becomes unreliable. Useful for specific, bounded computations.
Fully Homomorphic Encryption (FHE) is the holy grail—it supports unlimited operations of any type. However, current FHE schemes are computationally expensive; encrypting and evaluating even simple circuits can take milliseconds to seconds per operation, making them impractical for high-throughput queries.
Searchable Encryption: A Pragmatic Alternative
When search is the primary use case, a more practical approach is searchable encryption (SE), also called encrypted search or searchable symmetric encryption (SSE).
The idea: encrypt your data client-side, upload it to the server, and then send encrypted search tokens. The server uses these tokens to identify matching records without learning what it's searching for.
A simple approach using symmetric encryption and deterministic MACs:
// Pseudo-code: client-side indexing
const plaintext = "confidential document content";
const dataKey = deriveKey("master_key", "data");
const indexKey = deriveKey("master_key", "index");
const encrypted = encrypt(dataKey, plaintext);
// Create searchable index
const keywords = extractKeywords(plaintext);
const searchTokens = keywords.map(kw =>
mac(indexKey, kw) // Deterministic token
);
// Send to server
server.store({ encrypted, searchTokens });
// Later: client searches
const queryToken = mac(indexKey, "search_term");
const matches = server.find(queryToken); // Server finds matches without knowing the query
Why This Matters for BitAtlas and Friends
Zero-knowledge services like encrypted backups, private cloud storage, and confidential analytics all need searchable encryption:
- Compliance: EU GDPR and data sovereignty regulations often require proof that operators cannot access plaintext. Searchable encryption provides cryptographic evidence.
- Scalability: Unlike FHE, SSE is practical at production scale. Indexing and search are orders of magnitude faster.
- Trust reduction: With searchable encryption, a breach doesn't expose plaintext—only encrypted data and cryptographic tokens, which reveal query patterns but not query content.
Practical Limitations
Searchable encryption isn't a silver bullet:
- Query pattern leakage: Even encrypted search leaks some information. If an attacker observes which encrypted records match a query token, repeated queries can reveal access patterns. Defenses include dummy queries and oblivious RAM (ORAM), but these add latency.
- Index size: Creating searchable indices can bloat storage. A keyword-based index may be 2–3x larger than plaintext.
- Limited query expressiveness: Most practical SSE schemes support single-keyword or boolean queries. Complex range queries or joins require more sophisticated (and slower) approaches.
Implementation Considerations
If you're building a privacy-preserving system:
-
Decide between FHE and SSE: FHE for arbitrary computation (accept latency), SSE for search-centric workloads (practical at scale).
-
Client-side derivation: Users must derive index and data keys from a master key. This keeps key material off your servers.
-
Obfuscate access patterns: Consider padding results or injecting dummy queries to hide search frequency and result set sizes.
-
Audit cryptography: Use proven libraries. Don't roll your own homomorphic evaluation or MAC-based indices.
-
Measure leakage: Use information-theoretic tools to quantify what an attacker learns from search patterns, audit results, and timing.
Emerging Tools and MCP Servers
The MCP ecosystem is starting to include agents that reason about privacy-preserving protocols. Some research projects expose homomorphic encryption libraries as MCP servers, allowing AI agents to prototype encrypted-search workflows.
For production systems, look at established libraries:
- SEAL (Microsoft): Practical FHE implementation in C++.
- HElib: IBM's open-source FHE library.
- Encrypted-search-js: JavaScript implementations of SSE for client-side indexing.
Conclusion
Homomorphic encryption and searchable encryption bridge the gap between privacy and utility. FHE is powerful but costly; SSE is practical but leaks query patterns. For most zero-knowledge infrastructure, SSE is the engineering choice today. As FHE schemes improve and hardware accelerators mature, fully homomorphic operations may become routine—but we're not there yet.
The key insight: cryptographic tools exist to search encrypted data, but they trade off performance, leakage, and expressiveness. Understand those tradeoffs for your use case, and design accordingly.