2026-03-28 · 7 min read

BYOK Inference and the Locked LLM Layer

Why your LLM provider key should never leave your cluster — and what Shakti's BYOK adapter actually does under the hood.

Every enterprise LLM-product buyer asks the same question: does my code leave my cluster? The honest answer at most vendors is “yes, through our proxy, but we promise we don’t log it.” That answer doesn’t survive contact with a compliance officer.

Shakti takes the opposite stance. Your provider key is registered directly against the Shakti server running inside your cluster. The key is encrypted at rest with AES-256-GCM (keyed off SHAKTI_VCS_KEK). Every adapter call — Anthropic, OpenAI, Google, Groq, DeepSeek, self-hosted Ollama, vLLM, LM Studio — fires from the Shakti process directly to the provider’s API. We never sit in the middle.

What BYOK actually means

“Bring your own key” has become marketing shorthand. In practice it usually means: you register a key with the vendor, the vendor’s proxy dials the LLM on your behalf, and you get told the proxy “only stores metadata.” That’s fine for observability but useless for compliance because the data still crossed the vendor’s boundary.

Shakti’s BYOK runs inside your cluster. The binary loads the encrypted key, decrypts it in memory, and dials the provider directly. The only bytes that leave your cluster are the ones you explicitly opt into by configuring a provider. There is no vendor proxy.

The locked LLM layer

The “locked LLM layer” is the architectural consequence. Every Shakti agent resolves its provider via the ProviderRegistry; rotating a key is a single POST that emits an audit entry and swaps the adapter atomically. A lost key is a single DELETE that removes the adapter from the registry and invalidates the encrypted row.

This matters for three reasons:

Compliance. Your data residency answer is deterministic; the only external calls are the ones your runtime opted into.
Cost accounting. Shakti tracks token counts and derives USD spend per provider + per agent without needing the provider’s bill, so your /dashboard/costs view is accurate even on the BYOK path.
Rotation speed. Rotating a compromised key is seconds of downtime, not a cross-vendor coordination exercise.

Self-hosted Ollama, vLLM, LM Studio

For teams that won’t send even prompts to a third party, the BYOK layer treats a local inference endpoint (Ollama, vLLM, LM Studio) as a first-class provider. The adapter takes a base URL and a model name; the pipeline is otherwise identical. You lose nothing except the provider choice.

If your compliance boundary is tight enough that you’d never have bought a tool with a proxy-SaaS model, Shakti was built for you.

What BYOK actually means

The locked LLM layer

Self-hosted Ollama, vLLM, LM Studio

Keep Shakti current.