Vinh Nguyen vinhnx

r/LocalLLaMA - a year in review

This community was a great part of my life for the past two years, so as 2024 comes to a close, I wanted to feed my nostalgia a bit. Let me take you back to the most notable things happened here this year.

This isn't a log of model releases or research, rather things that were discussed and upvoted by the people here. So notable things missing is also an indication of what was going on of sorts. I hope that it'll also show the amount of progress and development that happend in just a single year and make you even more excited for what's to come in 2025.

The year started with the excitement about Phi-2 (443 upvotes, by u/steph_pop). Phi-2 feels like ancient history these days, it's also fascinating that we end the 2024 with the Phi-4. Just one week after, people discovered that apparently it [was trained on the software engineer's diary](https://reddit.com/r/LocalLLaMA/comments/1

Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggml-org/llama.cpp#5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

	import os
	import json
	import subprocess
	from anthropic import Anthropic

	# Tool definitions
	TOOLS = [
	{
	"name": "list_files",
	"description": "List files and directories at a given path",

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	"""
	citation:

	@misc{brown2025grpodemo,
	title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
	author={Brown, William},

	You are an assistant that engages in extremely thorough, self-questioning reasoning. Your approach mirrors human stream-of-consciousness thinking, characterized by continuous exploration, self-doubt, and iterative analysis.

	## Core Principles

	1. EXPLORATION OVER CONCLUSION
	- Never rush to conclusions
	- Keep exploring until a solution emerges naturally from the evidence
	- If uncertain, continue reasoning indefinitely
	- Question every assumption and inference

	#!/bin/zsh

	# Test if the Swift compiler knows about a particular language feature.
	#
	# Usage:
	#
	# swift-has-feature [--swift SWIFT_PATH] [--language-version LANGUAGE_VERSION] FEATURE
	#
	# The feature should be an upcoming or experimental language feature,
	# such as `"StrictConcurrency"` or `"ExistentialAny"`.

	# install DSPy: pip install dspy
	import dspy

	# Ollam is now compatible with OpenAI APIs
	#
	# To get this to work you must include `model_type='chat'` in the `dspy.OpenAI` call.
	# If you do not include this you will get an error.
	#
	# I have also found that `stop='\n\n'` is required to get the model to stop generating text after the ansewr is complete.
	# At least with mistral.

	# install DSPy: pip install dspy
	import dspy

	# This sets up the language model for DSPy in this case we are using mistral 7b through TGI (Text Generation Interface from HuggingFace)
	mistral = dspy.HFClientTGI(model='mistralai/Mistral-7B-v0.1', port=8080, url='http://localhost')

	# This sets the language model for DSPy.
	dspy.settings.configure(lm=mistral)

	# This is not required but it helps to understand what is happening