What is PromptForge?
PromptForge is a distributed LLM job processing platform. You upload a JSONL file of prompts, choose a provider and model, and PromptForge handles the rest — rate limiting, retries, checkpointing, and parallel dispatch — returning structured results when complete. It’s designed for workloads where you need to run hundreds or thousands of prompts reliably without babysitting API rate limits or worrying about partial failures mid-run.How it works
Init a job
Call
POST /v1/jobs/init with your provider and model. You get back a signed GCS upload URL and a job_id.Upload your prompts
PUT your
prompts.jsonl file directly to the signed URL. Each line is one prompt.Processing starts automatically
PromptForge detects the upload, spins up a processing pod, and starts dispatching prompts to the LLM provider. The pod self-regulates rate limits using a slow-start algorithm.
Poll for completion
Call
GET /v1/jobs/{job_id}/status to track progress — completed count, failed count, current status.Key properties
- Crash-safe — checkpoints every 30s to GCS. If the pod dies mid-run, it resumes from the last offset.
- Adaptive rate limiting — learns the real RPM/TPM ceiling of your provider key using a slow-start algorithm, not the documented limits.
- Multi-provider — OpenAI, Anthropic, Gemini, Mistral via LiteLLM.
- Per-client isolation — one active processing pod per API key at a time. Second job queues automatically.
Quickstart
Run your first batch job in under 5 minutes.

