添加数据集CARDBiomedBench #2071

bio-mlhui · 2025-05-02T13:01:30Z

添加CARDBiomedBench Benchmark评测 (1个子集+llmjudge)

包含2个文件:

datasets/CARDBiomedBench.py
configs/datasets/CARDBiomedBench/CARDBiomedBench_llmjudge_gen.py
该数据集有train/test/All 3个csv文件，目前只考虑了 All

        data_files = {'test': 'data/CARDBiomedBench.csv'}
        dataset = load_dataset(path, data_files=data_files, split='test')

Qwen2.5-1.5B 作为测试模型，Qwen2.5-72b作为LLM Judge, 抽取200个sample进行测试:

Before PR:

[✔ ] Pre-commit or other linting tools are used to fix the potential lint issues.
[ ✔] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
[✔ ] The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

CARDBiomedBench

9db1fea

mm-assistant bot assigned tonysy May 2, 2025