LLM production readiness for EU enterprise teams

Sebastiaan van Parijs Founder

MAY 28, 2026

What US checklists miss for EU teams

The standard production readiness checklist for LLM features covers: latency p99 at load, model evaluation set with regression gate, fallback for model errors, cost per query estimate, prompt injection defence, rate limiting. These are all correct and all necessary.

For EU enterprise teams operating under GDPR, NIS2, or serving regulated sectors, three additional dimensions are non-negotiable. First: does inference happen within EU data residency boundaries, and does the model provider's data processing agreement cover the data types your prompts contain? Second: does your use case fall within the AI Act's high-risk classification, and if so, what does that require you to build? Third: how does GPU compute cost behave at your expected query volume, and who owns that cost when it spikes?

These are not compliance additions to a technical checklist. They are architecture decisions that determine which providers you can use, what logging you must retain, and whether your current FinOps model can absorb GPU cost behaviour at all.

Data residency at inference time

GDPR applies to personal data. Most LLM prompts in enterprise contexts contain personal data: employee names, customer identifiers, contract references, free-text fields from CRM systems. If the prompt leaves the EU to reach a model provider's inference endpoint, you need a legal mechanism under GDPR Chapter V: an adequacy decision (the EU-US Data Privacy Framework or DPF, in force since July 2023 but under active legal challenge from noyb, the European Centre for Digital Rights led by Max Schrems, and expected to reach the Court of Justice of the European Union (CJEU) as Schrems III), standard contractual clauses with supplementary measures per Schrems II, or binding corporate rules. The Framework is the simplest path while it stands. Plan as if it will not.

The practical implication has two layers. First: model provider choice is constrained by what the prompts contain. Major providers all operate EU data residency tiers. The control is typically a regional API endpoint selection and a data processing agreement addendum. Neither is difficult. Neither is the default. You have to ask for it explicitly. Second: an EU region is not sovereign immunity. The US CLOUD Act (Clarifying Lawful Overseas Use of Data Act) can compel any US-headquartered provider, including its EU subsidiaries and its EU-region or 'sovereign cloud' offerings, to produce customer data on a valid US legal demand regardless of where the data sits. For ordinary personal data, an EU region plus a Chapter V mechanism is sufficient. For trade secrets, regulated workloads, or data your customers have explicitly placed outside US reach, region selection is the wrong layer to be making that decision.

For more sensitive data (health records, financial data, HR data, defence-related data), the risk profile of sending prompts to a US-headquartered provider may not be acceptable at any tier, even with an EU region. In those cases, the architecture shifts to a provider not subject to US jurisdiction, open-weight models hosted locally within your own tenant, or prompt engineering that removes personal data before inference. The trade-off is quality and cost against jurisdictional control. The choice is upstream of vendor selection, not inside it.

AI Act: high-risk classification and what it means operationally

The EU AI Act (Regulation 2024/1689) entered into force on 1 August 2024 and applies in phases. Prohibited practices since February 2025. General-purpose AI model obligations since August 2025. The high-risk regime under Annex III applies from August 2026. Annex III covers AI systems used in employment decisions, credit scoring and access to essential services, biometric identification, critical infrastructure, education, law enforcement, and the administration of justice. If your LLM feature falls into these categories, you are building a regulated AI system.

The platform engineering implications of high-risk classification are specific: conformity assessment before deployment, documentation of model selection rationale, a technically enforced human oversight mechanism, accuracy and robustness evaluation against a representative test set, and a post-market monitoring plan that logs model outputs and feeds drift signals back into your evaluation process.

For most enterprise LLM features — internal knowledge search, code assistance, document summarisation — high-risk classification does not apply. The checklist item is not 'are you compliant with the AI Act.' It is 'have you determined whether your use case is in scope and documented that determination.' The determination takes an afternoon. The absence of it is a compliance gap that surfaces at audit, not in production.

"Data residency at inference, AI Act classification, and GPU cost behaviour are not compliance additions. They are the dimensions that break first in EU enterprise deployments."
Sebastiaan van Parijs / Founder

Cost governance for GPU workloads

GPU compute has three properties that distinguish it from CPU compute. First: per-token pricing produces non-linear cost scaling. A prompt that returns a verbose response costs 3 to 10 times a prompt that returns a terse one on the same query type. Usage patterns that look smooth on query-count metrics look spiky on cost metrics.

Second: latency-cost tradeoffs are explicit and must be decided before launch. Smaller models are faster and cheaper. Larger models are more accurate and more expensive. The optimal model is determined by your eval set and your cost threshold together. Teams that select models on quality alone and discover the cost afterwards have the tradeoff backwards.

Third: GPU workloads on self-hosted infrastructure do not idle cleanly. A GPU node at 30 percent utilisation still costs 100 percent of its allocation. The cost model for GPU is closer to committed reserved capacity than on-demand CPU. Treating it as on-demand until it shows up on the bill is the most common GPU cost surprise we encounter.

The fix: add per-query cost to your eval dashboard alongside quality metrics. Set a cost-per-query budget per feature before launch. Wire a cost anomaly alert to the same on-call path as your latency alerts. If GPU spend increases 30 percent without a corresponding query volume increase, something is wrong with the model selection or prompt template.

LLM production readiness for EU enterprise teams.

What US checklists miss for EU teams

Data residency at inference time

AI Act: high-risk classification and what it means operationally

Cost governance for GPU workloads

Related insights.

Why most agentic pilots don't scale.

Platform engineering versus DevOps is the wrong question.

Recognise this in your own platform? One call, one written summary.

Cloud foundations for NXP, ASML, UWV, Sopra Steria and many more across the EU.