Real-time inference.
Cold starts in under a second.
Deploy and scale low-latency inference for LLMs, audio and image generation. From a single function to thousands of GPUs — same API, same pricing.
1from asc import Function, gpu 2 3@Function(gpu=gpu.L4(), keep_warm=1) 4def generate(prompt: str) -> str: 5 from llm import Pipeline 6 pipe = Pipeline.from_pretrained("meta-llama/Llama-3-8B") 7 return pipe(prompt, max_new_tokens=256) 8 9# Deploy10# $ asc deploy generate.pyDesigned for production.
Sub-second cold starts
Container snapshotting + lazy image loading get your first token in <800ms on L4 and A10G.
Per-second autoscale
Replicas scale in/out within seconds, billed per active GPU-second only.
Streaming first
Native SSE + WebSocket streaming with backpressure. Drop-in OpenAI-compatible router.
Pinned weights
Versioned model artifacts in our object store — instant rollbacks, zero re-downloads.
Global edge routing
Multi-region failover and locality-aware routing built in. No load balancer wiring.
Batched + concurrent
Dynamic batching and concurrent execution across replicas to maximise GPU utilisation.
Metered. No markup.
Pay per active second / per GiB. Free tier covers small projects; $200/mo cap until you opt in. See the full calculator.
| Line item | Unit | Rate (USD) |
|---|---|---|
| Functions — CPU | per 1M requests | $0.23 |
| Functions — GPU L4 | per GPU-second | $0.000095 |
| GPU A10G pod | per hour | $0.35 |
| Egress | per GiB | $0.098 |
Ship your first deploy in minutes.
Free $30/month of compute. No card required.