← Playground

Model benchmark

List models from an OpenAI-compatible endpoint (e.g. GET …/v1/models), choose five models and a task difficulty, then compare runs. Only the chat model name changes between episodes; prompts and environment settings are identical.

Configuration

Default API root matches Ollama’s OpenAI-compatible surface ( ollama.com/v1/models). For a local daemon use http://127.0.0.1:11434/v1.

Select five models

Results

Model Total reward Steps Error

Total reward by model

Steps to last transition

Cumulative reward over steps

Per-episode reward sequence (same task + seed per model).