CultriX-Github/MonaCeption-7B-SLERP-DPO-Nous.md Secret

Created April 14, 2024 10:34

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/CultriX-Github/02bd1b4a7af690caa071f82b3ee73bff.js"></script>
Save CultriX-Github/02bd1b4a7af690caa071f82b3ee73bff to your computer and use it in GitHub Desktop.

Download ZIP

Raw

MonaCeption-7B-SLERP-DPO-Nous.md

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
MonaCeption-7B-SLERP-DPO	45.57	76.99	78.98	50.19	62.93

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	27.17	±	2.80
		acc_norm	25.20	±	2.73
agieval_logiqa_en	0	acc	38.86	±	1.91
		acc_norm	38.71	±	1.91
agieval_lsat_ar	0	acc	24.78	±	2.85
		acc_norm	24.35	±	2.84
agieval_lsat_lr	0	acc	53.14	±	2.21
		acc_norm	54.51	±	2.21
agieval_lsat_rc	0	acc	68.03	±	2.85
		acc_norm	67.66	±	2.86
agieval_sat_en	0	acc	77.18	±	2.93
		acc_norm	78.16	±	2.89
agieval_sat_en_without_passage	0	acc	43.20	±	3.46
		acc_norm	44.17	±	3.47
agieval_sat_math	0	acc	33.64	±	3.19
		acc_norm	31.82	±	3.15

Average: 45.57%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	66.55	±	1.38
		acc_norm	67.41	±	1.37
arc_easy	0	acc	86.24	±	0.71
		acc_norm	80.05	±	0.82
boolq	1	acc	86.88	±	0.59
hellaswag	0	acc	69.17	±	0.46
		acc_norm	87.34	±	0.33
openbookqa	0	acc	39.00	±	2.18
		acc_norm	49.80	±	2.24
piqa	0	acc	83.24	±	0.87
		acc_norm	85.53	±	0.82
winogrande	0	acc	81.93	±	1.08

Average: 76.99%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	62.55	±	1.69
		mc2	78.98	±	1.36

Average: 78.98%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	59.47	±	3.57
bigbench_date_understanding	0	multiple_choice_grade	62.60	±	2.52
bigbench_disambiguation_qa	0	multiple_choice_grade	56.20	±	3.09
bigbench_geometric_shapes	0	multiple_choice_grade	23.68	±	2.25
		exact_str_match	0.84	±	0.48
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	33.80	±	2.12
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	22.57	±	1.58
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	61.00	±	2.82
bigbench_movie_recommendation	0	multiple_choice_grade	54.80	±	2.23
bigbench_navigate	0	multiple_choice_grade	56.80	±	1.57
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	69.30	±	1.03
bigbench_ruin_names	0	multiple_choice_grade	55.13	±	2.35
bigbench_salient_translation_error_detection	0	multiple_choice_grade	41.18	±	1.56
bigbench_snarks	0	multiple_choice_grade	71.82	±	3.35
bigbench_sports_understanding	0	multiple_choice_grade	75.25	±	1.37
bigbench_temporal_sequences	0	multiple_choice_grade	55.70	±	1.57
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	23.52	±	1.20
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	19.60	±	0.95
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	61.00	±	2.82

Average: 50.19%

Average score: 62.93%

Elapsed time: 03:30:28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment