CultriX CultriX-Github

Model	AGIEval	GPT4All	TruthfulQA	Bigbench
Llama-3.2-3B	25.76	Error: File does not exist	39.22	34.61

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	20.87	±	2.55
		acc_norm	23.23	±	2.65
agieval_logiqa_en	0	acc	23.96	±	1.67

Model	AGIEval	GPT4All	TruthfulQA	Bigbench
Llama-3.2-3B-DPO	27.06	Error: File does not exist	58.93	34.96

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	18.90	±	2.46
		acc_norm	20.87	±	2.55
agieval_logiqa_en	0	acc	26.11	±	1.72

Model	AGIEval	GPT4All	TruthfulQA	Bigbench
Llama3-8B-function-calling-uncensored-dareties	39.15	Error: File does not exist	54.99	42.52

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	24.41	±	2.70
		acc_norm	23.23	±	2.65
agieval_logiqa_en	0	acc	34.56	±	1.87

Model	AGIEval	GPT4All	TruthfulQA	Bigbench
Llama3-8B-function-calling-dpo-slerp	39.52	Error: File does not exist	56.01	42.8

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	25.98	±	2.76
		acc_norm	23.62	±	2.67
agieval_logiqa_en	0	acc	38.25	±	1.91

Model	AGIEval	GPT4All	TruthfulQA	Bigbench
Hermes-3-Llama-3.1-8B	41.51	Error: File does not exist	58.61	43.08

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	26.38	±	2.77
		acc_norm	25.20	±	2.73
agieval_logiqa_en	0	acc	39.02	±	1.91

Model	AGIEval	GPT4All	TruthfulQA	Bigbench
Llama3-8B-DPO	41.87	Error: File does not exist	71.38	44.5

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	21.65	±	2.59
		acc_norm	20.47	±	2.54
agieval_logiqa_en	0	acc	40.71	±	1.93

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
Phi-3-mini-4k-instruct	44.44	71.88	57.77	41.9	54

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	29.13	±	2.86
		acc_norm	28.74	±	2.85
agieval_logiqa_en	0	acc	42.86	±	1.94

	#!/usr/bin/env python3
	"""
	Refactored Q&A Dataset Generation Script
	========================================

	Features:
	- Separate configuration for generator vs. judge (API keys, endpoints, and models).
	- EnvironmentΓÇÉvariable and CLIΓÇÉdriven configuration.
	- Consistent use of pathlib for file paths.
	- Modular logging with debug mode.

	import os
	import requests
	import random
	import logging
	import re
	import time
	import json
	import matplotlib
	matplotlib.use('Agg') # Set the backend to 'Agg' before importing pyplot
	import matplotlib.pyplot as plt

	#!/bin/bash

	# Functions

	install_basic_packages() {
	echo "Installing basic packages..."
	apt update -y && apt install -y screen nano git git-lfs speedometer htop libaio-dev \|\| {
	echo "Failed to install basic packages" >&2
	exit 1
	}