-
-
Save monk1337/0109d36c43cdde7ed2c687e06e7177a4 to your computer and use it in GitHub Desktop.
{ | |
"Dataset": [ | |
"multimedqa", | |
"medmcqa", | |
"medqa_4options", | |
"mmlu_anatomy", | |
"mmlu_clinical_knowledge", | |
"mmlu_college_biology", | |
"mmlu_college_medicine", | |
"mmlu_medical_genetics", | |
"mmlu_professional_medicine", | |
"pubmedqa" | |
], | |
"Mistral-7B-v0.1": [ | |
0.533002, | |
0.481951, | |
0.508248, | |
0.555556, | |
0.686792, | |
0.680556, | |
0.595376, | |
0.71, | |
0.683824, | |
0.754 | |
], | |
"Gemma": [ | |
0.254791, | |
0.217547, | |
0.254517, | |
0.244444, | |
0.264151, | |
0.263889, | |
0.346821, | |
0.24, | |
0.220588, | |
0.552 | |
] | |
} |
Thank you very much for your response, @monk1337! For the google/gemma-2b-it
model, May I know the prompt format for the PubMedQA benchmark?
Can we use the following prompt format?
"""
"<start_of_turn>user\nWrite down the best option after \n\nThe correct answer is . \n\nQuestion: Dyschesia can be provoked by inappropriate defecation movements. The aim of this prospective study was to demonstrate dysfunction of the anal sphincter and/or the musculus (m.) puborectalis in patients with dyschesia using anorectal endosonography.\n Twenty consecutive patients with a medical history of dyschesia and a control group of 20 healthy subjects underwent linear anorectal endosonography (Toshiba models IUV 5060 and PVL-625 RT). In both groups, the dimensions of the anal sphincter and the m. puborectalis were measured at rest, and during voluntary squeezing and straining. Statistical analysis was performed within and between the two groups.\n The anal sphincter became paradoxically shorter and/or thicker during straining (versus the resting state) in 85% of patients but in only 35% of control subjects. Changes in sphincter length were statistically significantly different (p<0.01, chi(2) test) in patients compared with control subjects. The m. puborectalis became paradoxically shorter and/or thicker during straining in 80% of patients but in only 30% of controls. Both the changes in length and thickness of the m. puborectalis were significantly different (p<0.01, chi(2) test) in patients versus control subjects.\n Is anorectal endosonography valuable in dyschesia?\n (A) yes\n (B) no\n (C) maybe<end_of_turn>\n<start_of_turn>model\n"
"""
Please note that Write down the best option after \n\nThe correct answer is .
is the instruction.
Thank you very much again!
@monk1337 May I know what is the prompt format for the PubMedQA benchmark
evaluation? Thank you very much in advance!
Hi, which Gemma model are you evaluating? This is from the first model they released. After releasing they found many bugs which they fixed with the help of the Unsloth team and others. I haven't evaluated the latest version; this evaluation was done just 10 minutes after the release of the first Gemma model.