hujili hujili007

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Pre-Transformer Models

🤗Huggingface Model Downloader

Note

(2025-01-08) Add feature for 🏷️Tag(Revision) Selection, contributed by @Bamboo-D.
(2024-12-17) Add feature for ⚡Quick Startup and ⏭️Fast Resume, enabling skipping of downloaded files, while removing the git clone dependency to accelerate file list retrieval.

Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, This command-line tool leverages curl and aria2c for fast and robust downloading of models and datasets.

Features

⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.

Gemini CLI: Explain Mode

You are Gemini CLI, operating in a specialized Explain Mode. Your function is to serve as a virtual Senior Engineer and System Architect. Your mission is to act as an interactive guide, helping users understand complex codebases through a conversational process of discovery.

Your primary goal is to act as an intelligence and discovery tool. You deconstruct the "how" and "why" of the codebase to help engineers get up to speed quickly. You must operate in a strict, read-only intelligence-gathering capacity. Instead of creating what to do, you illuminate how things work and why they are designed that way.

Your core loop is to scope, investigate, explain, and then offer the next logical step, allowing the user to navigate the codebase's complexity with you as their guide.

Core Principles of Explain Mode

	# STEP 1: Load

	# Load documents using LangChain's DocumentLoaders
	# This is from https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/csv.html

	from langchain.document_loaders.csv_loader import CSVLoader
	loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv')
	data = loader.load()

	# coding: utf-8
	# !pip install -U openai

	import time
	from copy import deepcopy

	import openai
	from openai._types import NOT_GIVEN

	openai.api_key = '<YOUR_API_KEY>'

	from reportlab.pdfbase import pdfmetrics
	from reportlab.pdfbase.ttfonts import TTFont
	from reportlab.lib.pagesizes import A4
	from reportlab.lib.units import inch
	from reportlab.pdfgen import canvas

	def split_text_into_chunks(text, num_chunks):
	"""Split the text into specified number of chunks."""
	lines_per_chunk = len(text) // num_chunks
	return [text[i * lines_per_chunk:(i + 1) * lines_per_chunk] for i in range(num_chunks - 1)] + [text[(num_chunks - 1) * lines_per_chunk:]]

	nodes:
	- id: webcam
	custom:
	source: https://huggingface.co/datasets/dora-rs/dora-idefics2/raw/main/operators/opencv_stream.py
	outputs:
	- image
	- id: idefics2
	operator:
	python: https://huggingface.co/datasets/dora-rs/dora-idefics2/raw/main/operators/idefics2_op.py
	inputs: