AIARCOAIARCOASC
Inference

Real-time inference.
Cold starts in under a second.

Deploy and scale low-latency inference for LLMs, audio and image generation. From a single function to thousands of GPUs — same API, same pricing.

python
1from asc import Function, gpu
2
3@Function(gpu=gpu.L4(), keep_warm=1)
4def generate(prompt: str) -> str:
5 from llm import Pipeline
6 pipe = Pipeline.from_pretrained("meta-llama/Llama-3-8B")
7 return pipe(prompt, max_new_tokens=256)
8
9# Deploy
10# $ asc deploy generate.py
<0ms
Cold start
0+
GPUs burstable
0%
Uptime SLA
Features

Designed for production.

Sub-second cold starts

Container snapshotting + lazy image loading get your first token in <800ms on L4 and A10G.

Per-second autoscale

Replicas scale in/out within seconds, billed per active GPU-second only.

Streaming first

Native SSE + WebSocket streaming with backpressure. Drop-in OpenAI-compatible router.

Pinned weights

Versioned model artifacts in our object store — instant rollbacks, zero re-downloads.

Global edge routing

Multi-region failover and locality-aware routing built in. No load balancer wiring.

Batched + concurrent

Dynamic batching and concurrent execution across replicas to maximise GPU utilisation.

Pricing

Metered. No markup.

Pay per active second / per GiB. Free tier covers small projects; $200/mo cap until you opt in. See the full calculator.

Line itemUnitRate (USD)
Functions — CPUper 1M requests$0.23
Functions — GPU L4per GPU-second$0.000095
GPU A10G podper hour$0.35
Egressper GiB$0.098

Ship your first deploy in minutes.

Free $30/month of compute. No card required.