Evaluations

Audience: Data team, tenant admin
Goal: Run systematic quality checks on natural-language query behavior.

Navigation note: Open /evaluations directly—the route is not in the Admin Portal sidebar (bookmark it or link from internal docs).

Workflow

Go to /evaluations.
Create or select an evaluation dataset (import CSV with labeled prompts and expected behaviors where supported).
Start a run against a datasource and planner configuration.
Review per-case results and comparison charts.
Cancel in-flight runs if needed.

Issue	What to try
Run stuck	Cancel and retry; check agent health in Operations
Import rejected	Validate CSV columns against template
Results differ from Explore	Confirm dataset uses same datasource and runtime flags