Self-Hosting AI Services with ASC: Deployment Topologies, Security Boundaries, and Runtime Guarantees
Self-Hosting AI Services with ASC: Deployment Topologies, Security Boundaries, and Runtime Guarantees
The decision to self-host an AI control plane is rarely taken lightly. It trades operational simplicity for sovereignty, adds infrastructure overhead in exchange for data locality, and demands that your platform team owns components that a SaaS vendor would otherwise manage. For organisations in regulated industries, government sectors, or with strict contractual data handling requirements, this trade is not optional — it is the only viable path.
AIARCO ASC is designed to be deployed in all three modes: SaaS, hybrid, and fully self-hosted. This article focuses on the self-hosted case: what topology you should choose, how security boundaries are enforced at the network and application layers, and what runtime guarantees the platform provides in your own infrastructure.
Why Self-Hosting
Before getting into the mechanics, it is worth being explicit about when self-hosting is the right decision.
Data residency requirements. Some organisations must guarantee that prompts, responses, and associated metadata never leave a specific jurisdictional boundary. A SaaS AI gateway, even one with regional hosting options, requires trusting the vendor's network and data handling. Self-hosting eliminates that dependency.
Air-gap constraints. Defence contractors, classified government systems, and certain financial market infrastructure operate in networks that are intentionally isolated from the public internet. No SaaS product can serve these environments.
Contractual obligations. Enterprise customer contracts in healthcare and finance increasingly include data processing agreements that prohibit passing customer data to subprocessors without explicit consent. Building a subprocessor relationship with an AI gateway vendor requires procurement and legal review cycles that teams often want to avoid.
Cost at scale. At very high throughput — tens of millions of AI requests per day — the per-request cost model of a managed gateway can exceed the cost of running the equivalent infrastructure. Self-hosting at scale often produces a meaningful margin improvement.
Custom compliance controls. Some compliance frameworks require that you hold the keys to every encryption layer in your infrastructure. With a managed gateway you are dependent on the vendor's KMS integration. Self-hosted ASC uses your own KMS from the first day.
Deployment Topology 1: Cloud-Native Single-Region
The simplest self-hosted topology deploys ASC in a single cloud region inside your own VPC. This is the appropriate choice for organisations with a primary cloud footprint in one region and no active-active requirements.
The gateway tier runs as a horizontally scalable deployment — typically Kubernetes pods behind an internal load balancer or ingress controller. The data plane — policy store, tenant configuration, audit log, credential store — runs as a managed service in the same VPC, or as a set of stateful Kubernetes workloads if you prefer full control over the persistence layer.
Network isolation is achieved through VPC security groups or network policies. The gateway pods accept inbound HTTPS from your application tier only. Outbound connections are limited to provider endpoints (OpenAI, Anthropic, etc.) and the data plane. No other outbound traffic is required or permitted.
Provider API keys are stored in your cloud KMS — AWS KMS, Google Cloud KMS, or Azure Key Vault. ASC retrieves and caches credentials using short-lived session tokens. The plaintext key material never appears in ASC's own data stores or logs.
When to use this topology: Single-region deployments where the primary concern is data residency within a cloud provider's region. Suitable for most regulated financial services and healthcare use cases that do not require geographic redundancy.
Deployment Topology 2: Multi-Region Active-Active
For organisations that need high availability across regions — either for redundancy or because their customer base spans multiple jurisdictions — ASC supports a multi-region active-active configuration.
In this topology, a gateway cluster runs independently in each target region. Each cluster has its own data plane, meaning audit logs and cached data do not cross regional boundaries. A global configuration layer — managed by AIARCO's management plane or self-hosted in a separate control region — synchronises tenant configuration, policy definitions, and credential metadata across regional clusters.
Traffic routing to the correct regional cluster is handled at the network layer: DNS-based geo-routing, a global load balancer like AWS Global Accelerator or Cloudflare, or application-level routing logic in your API gateway. ASC itself does not perform cross-region request routing; it processes the requests it receives and routes them to the appropriate model provider within the region.
Failover behaviour. If a regional cluster becomes unhealthy, traffic is rerouted to the nearest healthy cluster. Because each cluster has its own data plane, audit logs from the failed cluster's window are not available in real time in the failover cluster; they are reconciled when the original cluster recovers. This is an acceptable trade-off for most availability requirements but should be documented in your runbooks.
When to use this topology: Organisations with EU and US customers that require hard data residency per jurisdiction, or those with 99.99% availability requirements that a single-region deployment cannot meet.
Deployment Topology 3: Air-Gapped On-Premises
The most operationally demanding topology deploys ASC entirely within an on-premises network with no outbound internet connectivity. This configuration requires that all model providers are also on-premises — either open-source models served by vLLM or TGI, or licensed enterprise models from providers with on-premises deployment options.
In this topology, there is no dependency on AIARCO's management plane. All configuration, monitoring, and upgrades are handled through internal tooling. ASC provides a REST management API that your internal platform tooling can integrate with; the console UI is optional and can be deployed internally as a container.
Key management uses an on-premises HSM or an air-gapped Vault cluster. ASC's credential integration supports Vault's KV secrets engine, allowing credential rotation through your existing Vault workflows.
Observability requires an internal telemetry stack. ASC emits OpenTelemetry traces and Prometheus metrics; these feed into your internal Jaeger, Grafana, or equivalent deployment.
When to use this topology: Defence, intelligence, classified government, or critical national infrastructure. Any organisation with regulatory requirements that mandate processing within a defined physical perimeter.
Security Boundaries
Regardless of topology, ASC enforces security boundaries at multiple layers.
Network. Inbound connections to the gateway are HTTPS only. mTLS is supported for service-to-service communication within the platform tier. Network policies restrict lateral movement between gateway pods and data plane components.
Authentication and authorisation. External callers authenticate using API keys or JWT tokens. API keys are scoped to a tenant and can be further scoped to a subset of models, endpoints, or policies. JWT authentication integrates with your existing IdP — Okta, Azure AD, Auth0 — through the standard OIDC flow.
Credential handling. Provider API keys are encrypted at rest using envelope encryption. The data encryption key is itself encrypted by a key encryption key held in your KMS. ASC never has access to the raw key material after the initial encryption step; it requests ephemeral decryption tokens from the KMS when it needs to authenticate outbound requests.
Audit log integrity. The audit log uses append-only storage with content-addressed records. Each record includes a hash of the previous record, creating a hash chain that makes retroactive modification detectable. Compliance functions can verify the integrity of the audit log independently.
Secret rotation. Provider credentials can be rotated without service interruption. The rotation workflow updates the encrypted credential in the data plane, invalidates the cached plaintext in gateway pods, and verifies connectivity with the new credential before finalising the rotation. Rollback is supported if the new credential fails validation.
Runtime Guarantees
Self-hosted ASC provides the following operational guarantees that your team is responsible for upholding through appropriate infrastructure configuration.
High availability. The gateway tier is stateless and can run with any desired replica count. Kubernetes HPA scales replicas based on CPU and custom request-rate metrics. For production, a minimum of three replicas across three availability zones is recommended.
Circuit breaking. ASC implements per-provider circuit breakers. When a provider's error rate exceeds a configurable threshold, ASC opens the circuit and routes requests to fallback providers without waiting for a timeout on each individual request. Circuit state is shared across gateway pods through the data plane.
Request deduplication. Idempotent requests — those with an explicit idempotency key in the request header — are deduplicated within a configurable window. If the same request arrives twice within the window while the first is still processing, the second waiter blocks until the first completes and receives the cached response.
Graceful degradation. When the policy engine or audit store is temporarily unavailable, ASC can be configured to operate in a degraded mode: requests are still routed to providers, but policy enforcement is skipped and audit records are queued for later ingestion. This mode should be used only with appropriate monitoring and alerting.
Operational Considerations
Running ASC in production requires investment in a few operational areas.
Upgrade management. AIARCO releases gateway updates that include security patches, bug fixes, and new features. A self-hosted deployment requires your team to manage the upgrade lifecycle. AIARCO maintains semantic versioning and provides migration guides for breaking changes.
Capacity planning. Gateway pods are CPU-bound for policy evaluation and I/O-bound for proxying. A single pod can typically handle 500–1,000 requests per second depending on policy complexity. Size your cluster based on your peak request rate with a 50% headroom margin.
Observability. A self-hosted deployment requires that you operate your own monitoring stack. At minimum you need: latency histograms per provider, error rate alerts per provider, audit log ingestion lag, and circuit breaker state changes.
Conclusion
Self-hosting AIARCO ASC gives your organisation full control over the AI request path — from the moment a prompt leaves your application to the moment the response arrives. The three deployment topologies cover the range from cloud-native single-region deployments through to fully air-gapped on-premises configurations. Each adds operational complexity in proportion to the level of control it provides.
For most regulated organisations, the single-region cloud-native topology is the right starting point. It provides data residency and full auditability without the operational overhead of a multi-region or air-gapped deployment. The infrastructure patterns described here are well-established Kubernetes operational patterns; if your platform team already operates a production Kubernetes cluster, adding ASC is a straightforward extension.
Want to evaluate self-hosted ASC for your organisation? AIARCO ASC provides Helm charts, Terraform modules, and deployment guides for all three self-hosted topologies. Get started or speak with our solutions team about your specific deployment requirements.
Ready to take control of your AI services?
AIARCO ASC gives platform engineers a unified control plane for multi-provider AI — with audit trails, data residency, and per-tenant guardrails out of the box.