altitudes® Cloud · Platform · AI Amsterdam · Rotterdam --:--
AI ENABLEMENTMAY 28, 20268 min read
[INSIGHT] / AI ENABLEMENT _

LLM production readiness for EU enterprise teams.

The US-focused production checklists for LLMs cover latency, evals, fallbacks, and cost. For EU enterprise teams, three more dimensions apply: data residency under GDPR at inference time, AI Act obligation classification for your specific use case, and cost governance for GPU workloads that behave nothing like the compute you have been managing. The extra dimensions are not bureaucratic overhead. They are the parts that break first.

LLM production readiness for EU enterprise teams.

What US checklists miss for EU teams

The standard production readiness checklist for LLM features covers: latency p99 at load, model evaluation set with regression gate, fallback for model errors, cost per query estimate, prompt injection defence, rate limiting. These are all correct and all necessary.

For EU enterprise teams operating under GDPR, NIS2, or serving regulated sectors, three additional dimensions are non-negotiable. First: does inference happen within EU data residency boundaries, and does the model provider's data processing agreement cover the data types your prompts contain? Second: does your use case fall within the AI Act's high-risk classification, and if so, what does that require you to build? Third: how does GPU compute cost behave at your expected query volume, and who owns that cost when it spikes?

These are not compliance additions to a technical checklist. They are architecture decisions that determine which providers you can use, what logging you must retain, and whether your current FinOps model can absorb GPU cost behaviour at all.

Data residency at inference time

GDPR applies to personal data. Most LLM prompts in enterprise contexts contain personal data: employee names, customer identifiers, contract references, free-text fields from CRM systems. If the prompt leaves the EU to reach a model provider's inference endpoint, you need a legal mechanism under GDPR Chapter V: adequacy decision, standard contractual clauses, or binding corporate rules.

The practical implication: your choice of model provider is constrained by the data you put in prompts. Major providers all operate EU data residency tiers. The control is typically a regional API endpoint selection and a data processing agreement addendum. Neither is difficult. Neither is the default. You have to ask for it explicitly.

For more sensitive data (health records, financial data, HR data), the risk profile of sending prompts to a third-party provider may not be acceptable at any tier. In those cases, the architecture shifts to locally hosted models running within your own tenant, or to prompt engineering that removes personal data before inference.

AI Act: high-risk classification and what it means operationally

The EU AI Act entered into application from August 2024. High-risk AI systems (Annex III) include systems used in employment decisions, credit scoring, access to essential services, and safety-critical applications. If your LLM feature falls into these categories, you are building a regulated AI system.

The platform engineering implications of high-risk classification are specific: conformity assessment before deployment, documentation of model selection rationale, a technically enforced human oversight mechanism, accuracy and robustness evaluation against a representative test set, and a post-market monitoring plan that logs model outputs and feeds drift signals back into your evaluation process.

For most enterprise LLM features — internal knowledge search, code assistance, document summarisation — high-risk classification does not apply. The checklist item is not 'are you compliant with the AI Act.' It is 'have you determined whether your use case is in scope and documented that determination.' The determination takes an afternoon. The absence of it is a compliance gap that surfaces at audit, not in production.

"Data residency at inference, AI Act classification, and GPU cost behaviour are not compliance additions. They are the dimensions that break first in EU enterprise deployments."

Sebastiaan van Parijs / Founder

Cost governance for GPU workloads

GPU compute has three properties that distinguish it from CPU compute. First: per-token pricing produces non-linear cost scaling. A prompt that returns a verbose response costs 3 to 10 times a prompt that returns a terse one on the same query type. Usage patterns that look smooth on query-count metrics look spiky on cost metrics.

Second: latency-cost tradeoffs are explicit and must be decided before launch. Smaller models are faster and cheaper. Larger models are more accurate and more expensive. The optimal model is determined by your eval set and your cost threshold together. Teams that select models on quality alone and discover the cost afterwards have the tradeoff backwards.

Third: GPU workloads on self-hosted infrastructure do not idle cleanly. A GPU node at 30 percent utilisation still costs 100 percent of its allocation. The cost model for GPU is closer to committed reserved capacity than on-demand CPU. Treating it as on-demand until it shows up on the bill is the most common GPU cost surprise we encounter.

The fix: add per-query cost to your eval dashboard alongside quality metrics. Set a cost-per-query budget per feature before launch. Wire a cost anomaly alert to the same on-call path as your latency alerts. If GPU spend increases 30 percent without a corresponding query volume increase, something is wrong with the model selection or prompt template.

Written by Sebastiaan van Parijs Founder
[KEEP TALKING]

Recognise this in your own platform? One call, one written summary.