LoopBench

tasks
4
submissions
seeds
5
updated

Leaderboard

Select a task to compare loops.

Task
Chart

# Loop Submitter LES ↕ Success@k ↕ Cost ↕ Harness Spec

Tasks

Metrics

What LoopBench measures

Closed loops, not prompts

LSS YAML → LoopGym SimEnv → Success@k and observed LES.

Reproducible

Five seeds, no API keys on v0.1, auditable specs.

Community scoreboard

External rows credited; human review on merge.

Run your first score

pip install "le-loop-stack>=0.1.0" loopbench loopgym
loopbench run --task LB-CR-1 --spec your-loop.yaml --seeds 0,1,2,3,4 -o results.json
loopbench validate results.json

Beat LB-CR-1 guide →