Huifeng Chen josherich

DeepSeek Moment in Lithography: A Call to Action

Introduction

For several months, I have been discussing the potential implications of a DeepSeek moment in lithography. This perspective has been met with skepticism and ridicule from certain members of the Twitter lithography community. While I am saddened to see my concerns validated, I feel it is necessary to share my insights now that the truth is becoming increasingly apparent.

The Current Landscape

Realization of the Inevitable

OP

Australian Voting System MEGA Thread

Introduction

This document serves to clarify various aspects of the Australian voting system, addressing common misconceptions and providing accurate information. In an era rife with misinformation, it is essential to equip ourselves with the correct knowledge so that we may share it with others.

Overview of the Preferential Voting System

OP

Introducing Automated Capability Discovery (ACD)

Automated Capability Discovery (ACD) is an innovative tool designed to automatically identify surprising new capabilities and failure modes in foundation models. This is achieved through a process known as "self-exploration," wherein models explore their own abilities.

Leadership: ACD is spearheaded by @cong_ml and @shengranhu.

Capability Reporting

OP

https://www.all-hands.dev/blog/vibe-coding-higher-quality-code

Enhancing Code Quality While Utilizing Coding Agents

Introduction

In the realm of software development, the integration of coding agents has become increasingly prevalent. A question often posed is: How can we effectively vibe with coding agents while ensuring high standards of code quality?

OP

Understanding DSPy: Key Insights and Principles

DSPy represents a significant advancement in the field of AI software development. However, its complexity can make it challenging to fully comprehend. This document aims to clarify the foundational principles of DSPy and outline its core tenets.

Introduction

The central thesis of DSPy is that while large language models (LLMs) and their methodologies will continue to evolve, this progress will not be uniform across all dimensions. Therefore, it is essential to identify:

The minimal set of fundamental abstractions that enable the development of downstream AI software that is "future-proof" and capable of adapting to advancements.

OP

Continuous Thought Machines: A New Frontier in Neural Architecture

Introduction

We are pleased to announce the release of our new research paper titled "Continuous Thought Machines" (CTMs). This work explores the significant role of timing and synchronization in neuronal computation, aspects that have been largely overlooked in contemporary neural networks. Our hypothesis is that neural timing is essential for the flexibility and adaptability observed in biological intelligence.

Proposed Neural Architecture

We introduce a novel architecture, the Continuous Thought Machines (CTMs), which is designed from the ground up to incorporate neural dynamics as a fundamental representation of intelligence. By prioritizing neural dynamics as a core component, CTMs are capable of performing adaptive computation naturally.

OP

Introducing Log-Linear Attention

In the realm of attention mechanisms, we are familiar with traditional Attention and its linear-time variants, such as linear attention and State Space Models. However, what exists in the intermediate space between these two paradigms? This document introduces Log-Linear Attention, a novel approach that offers significant advantages in both computational and memory efficiency.

Features of Log-Linear Attention

Log-Linear Attention is characterized by the following key features:

Log-linear time training

OP

Unpopular Opinion: The 3x+1 Conjecture Might Be False

Introduction

In this document, I present an argument that challenges the widely accepted belief in the validity of the 3x+1 Conjecture. This conjecture, also known as the Collatz Conjecture, posits that starting with any positive integer, repeated application of a particular function will eventually lead to the number 1.

Background

In my first paper, co-authored with Y. Sinai in 2002, I demonstrated that the paths generated by the 3x+1 function can be modeled as a geometric Brownian motion in a precise asymptotic sense, with a drift of log(3/4) < 0. This finding suggests that typical trajectories exhibit a decay pattern, supporting the previously established fact that almost every initial seed eventually reaches a value below itself. However, this process cannot be iterated indefinitely, as the paths may diverge into very sparse trajectories.

Personal Reflecti

OP

New Research: Approximating Language Model Training Data from Weights

Introduction

This document outlines the findings of recent research focused on understanding the amount of information available in open-weights models, specifically the DeepSeek R1 weights, which amount to 1.2 TB. The central question of this research is: What can we learn from all those bits?

Methodology

Our approach involves a novel method that reverses the fine-tuning of large language models (LLMs) to recover data. The following images illustrate key concepts of our methodology:

	import assert from 'node:assert';

	class HeapQueue {
	constructor(cmp) {
	this.cmp = (cmp \|\| function(a, b){ return a - b; });
	this.length = 0;
	this.data = [];
	}
	size() {
	return this.length;