ProjectFull-Stack & AI Engineer2026shipped

PromptForge

An agentic prompt-engineering platform with a creator–critic–evaluator pipeline.

Stack

Node.jsExpress.jsBullMQConvexOpenRouterReact.js

Outcomes

Multi-modelOpenAI · Anthropic · Gemini · OSS

IterativeCreator → Critic → Evaluator

AsyncBullMQ-backed job pipeline

What it is

A platform for designing, testing, and optimising LLM prompts. PromptForge treats prompts like code: every revision is scored, regression-tested, and benchmarked across models so teams can pick the right one based on real numbers, not vibes.

Key points

Creator–critic–evaluator pipeline — prompts are improved iteratively through automated feedback. Each pass scores clarity, completeness, constraint coverage, and token efficiency, producing a versioned trail of changes.
Cross-model routing & benchmarking — runs the same prompt against OpenAI, Anthropic, Gemini, and open models simultaneously, comparing quality, latency, and cost per output to inform model-selection decisions.
Fault-tolerant async backend — BullMQ workers handle long-running prompt jobs with idempotent retries; Convex persists every job so the system supports reliable prompt regression testing at scale.
Structured output for downstream use — every refined prompt is emitted with metadata (token budget, latency profile, expected output shape) ready to drop into another product.

Result

Lets developers and AI product teams deliver structured, tested prompts faster, with concrete metrics to back model and prompt choices in code review.

More Work.

Browse all projects