Evaluations
Audience: Data team, tenant admin
Goal: Run systematic quality checks on natural-language query behavior.
Navigation note: Open /evaluations directly—the route is not in the Admin Portal sidebar (bookmark it or link from internal docs).
Workflow
- Go to
/evaluations. - Create or select an evaluation dataset (import CSV with labeled prompts and expected behaviors where supported).
- Start a run against a datasource and planner configuration.
- Review per-case results and comparison charts.
- Cancel in-flight runs if needed.
When to use evaluations
- After semantic catalog or runtime changes
- Before promoting a BI app to a wider audience
- Regression testing across planner or model upgrades
Tips
- Keep datasets representative of real business questions.
- Version datasets when warehouse schema changes materially.
- Link failed cases to Explore replays via Operations deep links when available.
Troubleshooting
| Issue | What to try |
|---|---|
| Run stuck | Cancel and retry; check agent health in Operations |
| Import rejected | Validate CSV columns against template |
| Results differ from Explore | Confirm dataset uses same datasource and runtime flags |