Step-level checkpointing
Each step is an atomic transaction. When step 7 fails only step 7 retries — the tokens spent on steps 1-6 are never wasted.
AI fails unpredictably. APIs crash, context windows overflow, LLMs get rate limited. Wrap code in functions that checkpoint, wait, and offload without extra infrastructure.

Inngest completely transformed how we handle AI orchestration at Cohere. Its intuitive developer experience, built-in multi-tenant concurrency, and flow control allowed us to scale without the complexity of other tools or the need to build custom solutions. What would have taken us a month.
01
Stay in your codebase
Add automatic retries to code, whilecontrolling flow during surprise spikes.
02
Serverless-first
Run long running agents on any runtime,even serverless.
03
Agent-native observability
Treat all model calls like first-class events.See everything without extra tooling.
FirstFunctionTo FullProduction
FirstFunction
To FullProduction
Install the SDK. Serve one HTTP endpoint. Wrap any async function.
Quick StartDX humans love, structure agents grok.
Each step is an atomic transaction. When step 7 fails only step 7 retries — the tokens spent on steps 1-6 are never wasted.
Every model call is captured: prompt, response, token count, latency, and cost — automatically. Offloads the wait so serverless functions don't burn compute.
Rate-limit by any identifier. One user's burst can't saturate your OpenAI quota and degrade everyone else. Priority queuing for premium tiers, built in.
Pause mid-workflow for review or approval — hours or days. State maintained automatically. No polling, no cron, no database hacks.
LLM chains run for minutes. Batch pipelines run for hours. Inngest functions run to completion across any host — no 30-second limits.
One command. Live traces, step inspection, event replay, and every prompt/response pair — completely offline, before a single token is spent.
Inngest orchestrates AI workflows by invoking your functions via HTTP between steps. You write workflows as normal async functions and wrap logic in step.run(). Inngest handles retry logic, state, and scheduling between steps — no extra queues, workers, or stateful backends required.
Quick-start guideInngest handles LLM rate limits through built-in throttling and concurrency controls. You can cap simultaneous LLM calls, set per-user or per-tenant rate limits, and queue excess requests rather than dropping them. This prevents hitting provider rate limits at scale without custom infrastructure.
When an agentic workflow fails mid-execution, only the failed step retries — not the entire workflow. Inngest tracks completed steps and resumes from the point of failure. No work is duplicated and no state is lost.
Inngest works with serverless platforms by invoking functions via HTTP, so they run on any platform that serves HTTP requests. step.ai.infer offloads LLM inference to Inngest's infrastructure, pausing your function during the request so you don't pay for idle serverless execution time.
Yes, Inngest's Dev Server runs locally and provides full step-by-step execution traces, the ability to replay runs, and re-trigger functions — all before deploying to production.
Yes. Inngest supports workflows that run for hours or days. Functions can pause indefinitely — waiting for human input, external events, or slow inference — and resume exactly where they left off with no timeout constraints on workflow duration.