{% extends "admin/base.html" %} {% block title %}Model Evaluations{% endblock %} {% block header_title %}Capabilities Evaluation & Benchmarking{% endblock %} {% block content %}

Scientific Performance Verification

Deploy specialized datasets to test your cluster's reasoning, coding, and factual accuracy. Benchmarks allow you to compare different hardware/model combinations side-by-side using quantitative metrics.

Benchmark Datasets

{% for ds in datasets %}

{{ ds.name }}

{{ ds.description or 'No description' }}

{{ ds.content|length }} Tasks Gen: {{ ds.created_at.strftime('%Y-%m-%d') }}
{% else %}

No datasets found. Generate one to start benchmarking.

{% endfor %}

Benchmark Reports

{% for run in runs %}

{{ run.name }}

{% for model in run.models %} {{ model }} {% endfor %}
Timestamp
{{ run.created_at.strftime('%m-%d %H:%M') }}
{% else %}

No reports archived.

{% endfor %}
{% endblock %}