{% extends "admin/base.html" %} {% block title %}Benchmark Report: {{ run.name }}{% endblock %} {% block content %} {% if is_pdf %}
| Model Identifier | Avg. Accuracy | Max Score | Standard Deviation |
|---|---|---|---|
| {{ model }} | {{ (s.avg * 100) | round(1) }}% | {{ (s.max * 100) | round(1) }}% | ±{{ s.std | round(3) }} |
| Model & Score | Response Output |
|---|---|
|
{{ model }} SCORE: {{ (run.results[model][i].score * 100) | round(1) }}% |
{{ run.results[model][i].model_answer_html | safe }}
{% if run.results[model][i].reasoning_html %}
Critique: {{ run.results[model][i].reasoning_html | safe }}
{% endif %}
|
Evaluated using {{ run.evaluator_config.type }} logic on {{ run.created_at.strftime('%Y-%m-%d %H:%M') }}