Evaluation Leaderboard
Interactive benchmark results viewer
Search
Stage
All
baseline
stage_0
stage_1
Training Type
All
BASELINE
CPT
DPO
SFT
Base Model
All
Mistral
Qwen
Options
Baselines only
Δ from base model
Δ from selected
Heat map
Normalize avgs by max
Compare to
Columns
Show/Hide
Visual_LC_Avg
LC_Avg
Overall_Avg
mmlongbench_doc
mmlongbench_doc_corrected
mmlb_131k
mmlb_32k
spiqa_eval
slidevqa
helmet
longbench_v2
dude
tablevqa
mmmu_pro
tinymmlu
mm_mt_bench
gpqa
tinygsm8k
Copy LaTeX
Reset
Showing:
58
Total:
58
Benchmarks:
18
Checkpoint
Visual_LC_Avg
LC_Avg
Overall_Avg
mmlongbench_doc
mmlongbench_doc_corrected
mmlb_131k
mmlb_32k
spiqa_eval
slidevqa
helmet
longbench_v2
dude
tablevqa
mmmu_pro
tinymmlu
mm_mt_bench
gpqa
tinygsm8k
Data Composition
×