Back to blog
·6 min·BitAtlas

Privacy-Preserving Machine Learning: ZK Inference with Homomorphic Encryption

Exploring zero-knowledge proofs and homomorphic encryption for private ML inference—techniques to prove ML predictions without exposing the model or input data.

zero-knowledgemachine learninginferenceprivacyneural networks

Machine learning models are valuable assets, but deploying them raises a critical tension: how do you let users benefit from your model without exposing it to reverse engineering? Conversely, how do users trust that their sensitive data—medical records, financial information, personal preferences—won't leak during inference?

Traditional solutions fall short. Keeping the model server-side exposes model parameters to motivated attackers. Shipping the model to clients reveals intellectual property. Privacy-preserving techniques like differential privacy add noise that degrades accuracy. Zero-knowledge proofs and homomorphic encryption offer a different path: let the model owner prove correct inference without revealing the model or the input.

The Core Problem

When you invoke a machine learning API, three things are at stake:

  1. Model confidentiality: Your model is your competitive advantage. Competitors want to extract it.
  2. Input privacy: The user's data is sensitive. You don't want it on your servers, and they don't trust you with it.
  3. Correctness: The user needs assurance the inference was actually computed correctly—not faked or stolen from a cache.

Homomorphic encryption and zero-knowledge proofs address these by allowing computation on encrypted data and proving results without revealing either the computation or the data.

Homomorphic Encryption for Inference

Homomorphic encryption (HE) is encryption that preserves algebraic structure. You can add and multiply encrypted values without decrypting them. The result, when decrypted, equals what you'd get if you'd added or multiplied the plaintext.

Encrypt(a) + Encrypt(b) = Encrypt(a + b)
Encrypt(a) × Encrypt(b) = Encrypt(a × b)

For ML inference, this means:

  1. The user encrypts their input data with their own key.
  2. You run the model's matrix multiplications and activations directly on the encrypted vectors.
  3. You return the encrypted result.
  4. The user decrypts to get the prediction.

Advantages:

  • Your model parameters are never exposed.
  • User input never exists unencrypted on the server.
  • The computation is deterministic and verifiable.

Challenges:

  • Encryption overhead is substantial. A single matrix multiplication might become 100–1,000× slower. A deep neural network can take hours on homomorphic evaluation.
  • Only certain operations are efficient. Sigmoid and ReLU activations require approximation or specific lattice-based schemes.
  • Encrypted arithmetic has precision constraints. You're working with fixed-point numbers in rings, not floating-point.

Practical example: A healthcare startup wants to offer a diagnostic model. Hospitals encrypt their patient data with BFV (Brakerski–Fan–Vercauteren) encryption, send it to the model server, and get an encrypted prediction back. The hospital decrypts and sees the diagnosis—without the startup ever seeing the raw medical data, and the hospital never learning the model structure.

Zero-Knowledge Proofs for Model Verification

Homomorphic encryption solves privacy but not trust-in-correctness. How do you know the server actually computed your inference and didn't hallucinate the result?

Zero-knowledge proofs provide a cryptographic proof that a computation was done correctly, without revealing the computation itself.

The ZK angle: The model owner generates a proof π that shows:

  • The output is the correct result of running the model on the input
  • Without revealing the model weights, the input, or intermediate values

A verifier checks π in milliseconds. The proof itself is tiny (hundreds of bytes).

Building blocks:

  • Arithmetic circuits: Express the ML model as a circuit of additions and multiplications. Modern frameworks (Circom, Leo, Arkworks) compile neural networks to rank-1 constraint systems (R1CS).
  • Proof systems: Use ZK-SNARK variants (Groth16, PlonK) or ZK-STARK (Scalable Transparent ARguments of Knowledge), which are quantum-resistant but produce larger proofs.
  • Quantization: Neural networks for ZK must be quantized to small integer fields. Float32 becomes int8 or similar. This degrades accuracy and requires careful calibration.

Trade-offs:

  • Proof generation time: A modest ResNet might take seconds to minutes to prove on consumer hardware.
  • Proof size: SNARKs are compact (<1 KB). STARKs are larger (~100 KB).
  • Verifier time: Under 100 ms for SNARKs; a few seconds for STARKs.
  • Accuracy loss: Quantization to int8 typically reduces top-1 accuracy by 1–5%.

Real-world sketch: An insurance underwriter wants to offer credit scoring. A applicant submits encrypted personal data. The model is evaluated on the encrypted input using HE, and the server issues a ZK-SNARK proving the score is correct. The underwriter verifies the proof without ever seeing the applicant's data or the model.

Combining HE + ZK

Neither technique alone is a full solution:

  • HE alone: Privacy for input and model, but verifier must trust the server didn't introduce bugs.
  • ZK alone: Proof of correctness, but you still need to hide the model—either keep it server-side (exposing it) or use HE anyway.

Together, they form a powerful primitive:

  1. Client encrypts input with HE.
  2. Server evaluates model on encrypted data.
  3. Server generates a ZK proof that the evaluation is correct.
  4. Client receives encrypted result + proof.
  5. Client verifies proof (proof time: milliseconds; doesn't require knowing model or plaintext).
  6. Client decrypts to get the trusted result.

The proof is compact, verification is fast, and neither the model nor the data is exposed.

Practical Challenges Today

Performance: Even quantized models are slow. A 100-layer ResNet might take 10–100 seconds to prove. HE evaluation of the same network might take 1–10 minutes depending on scheme. Real-time inference is still out of reach for complex models.

Accuracy: Quantization to small fields (int8) and approximating activations cost accuracy. Medical-grade models may not tolerate 2–5% accuracy loss.

Ecosystem maturity: Libraries like ZKLLMs, ezkl, and Modulus are maturing, but ZK ML tooling is still new. Documentation and community support lag behind conventional ML.

Hardware acceleration: GPU-friendly schemes (like BGV with GPU evaluation) exist but aren't standard. Most deployments still run on CPU.

What's on the Horizon

  • Faster proving: New proof systems (HyperPlonK, Lasso) reduce prover time and proof size.
  • Batching: Proving multiple inferences in one transaction amortizes cost.
  • Specialized hardware: FPGA and ASIC designs for HE and ZK are emerging.
  • Approximate schemes: Relaxing some constraints (e.g., approximate HE for recommendation systems) trades perfect security for speed.

When to Consider ZK + HE

Use zero-knowledge ML inference when:

  • The model is high-value (worth protecting).
  • User input is sensitive (medical, financial, personal).
  • Regulatory requirements demand strong privacy (GDPR, HIPAA).
  • Latency tolerance is > 10 seconds (non-realtime).
  • Accuracy loss of a few percentage points is acceptable.

For real-time, high-accuracy use cases (autonomous vehicles, fraud detection requiring <50 ms response), you'll need to compromise: maybe use HE for one sensitive field and plaintext for others, or accept model exposure.

Conclusion

Zero-knowledge proofs and homomorphic encryption are reshaping how we think about model security and user privacy. They're not a universal fit—performance and accuracy constraints are real. But for use cases where privacy is paramount and latency is flexible, they unlock a model-as-a-service paradigm that today's architectures cannot: servers that can't see the data, clients that can't reverse the model, and proofs that make both trust the result anyway.

The tooling is maturing. If you're building privacy-critical ML products, now is the time to experiment with these primitives.

Encrypt your agent's data today

BitAtlas gives your AI agents AES-256-GCM encrypted storage with zero-knowledge guarantees. Free tier, no credit card required.