For several months, I have been discussing the potential implications of a DeepSeek moment in lithography. This perspective has been met with skepticism and ridicule from certain members of the Twitter lithography community. While I am saddened to see my concerns validated, I feel it is necessary to share my insights now that the truth is becoming increasingly apparent.
import assert from 'node:assert'; | |
class HeapQueue { | |
constructor(cmp) { | |
this.cmp = (cmp || function(a, b){ return a - b; }); | |
this.length = 0; | |
this.data = []; | |
} | |
size() { | |
return this.length; |
This document serves to clarify various aspects of the Australian voting system, addressing common misconceptions and providing accurate information. In an era rife with misinformation, it is essential to equip ourselves with the correct knowledge so that we may share it with others.
Automated Capability Discovery (ACD) is an innovative tool designed to automatically identify surprising new capabilities and failure modes in foundation models. This is achieved through a process known as "self-exploration," wherein models explore their own abilities.
Leadership: ACD is spearheaded by @cong_ml and @shengranhu.
https://www.all-hands.dev/blog/vibe-coding-higher-quality-code
In the realm of software development, the integration of coding agents has become increasingly prevalent. A question often posed is: How can we effectively vibe with coding agents while ensuring high standards of code quality?
DSPy represents a significant advancement in the field of AI software development. However, its complexity can make it challenging to fully comprehend. This document aims to clarify the foundational principles of DSPy and outline its core tenets.
The central thesis of DSPy is that while large language models (LLMs) and their methodologies will continue to evolve, this progress will not be uniform across all dimensions. Therefore, it is essential to identify:
- The minimal set of fundamental abstractions that enable the development of downstream AI software that is "future-proof" and capable of adapting to advancements.
We are pleased to announce the release of our new research paper titled "Continuous Thought Machines" (CTMs). This work explores the significant role of timing and synchronization in neuronal computation, aspects that have been largely overlooked in contemporary neural networks. Our hypothesis is that neural timing is essential for the flexibility and adaptability observed in biological intelligence.
We introduce a novel architecture, the Continuous Thought Machines (CTMs), which is designed from the ground up to incorporate neural dynamics as a fundamental representation of intelligence. By prioritizing neural dynamics as a core component, CTMs are capable of performing adaptive computation naturally.
In the realm of attention mechanisms, we are familiar with traditional Attention and its linear-time variants, such as linear attention and State Space Models. However, what exists in the intermediate space between these two paradigms? This document introduces Log-Linear Attention, a novel approach that offers significant advantages in both computational and memory efficiency.
Log-Linear Attention is characterized by the following key features:
- Log-linear time training
In this document, I present an argument that challenges the widely accepted belief in the validity of the 3x+1 Conjecture. This conjecture, also known as the Collatz Conjecture, posits that starting with any positive integer, repeated application of a particular function will eventually lead to the number 1.
In my first paper, co-authored with Y. Sinai in 2002, I demonstrated that the paths generated by the 3x+1 function can be modeled as a geometric Brownian motion in a precise asymptotic sense, with a drift of log(3/4) < 0. This finding suggests that typical trajectories exhibit a decay pattern, supporting the previously established fact that almost every initial seed eventually reaches a value below itself. However, this process cannot be iterated indefinitely, as the paths may diverge into very sparse trajectories.
This document outlines the findings of recent research focused on understanding the amount of information available in open-weights models, specifically the DeepSeek R1 weights, which amount to 1.2 TB. The central question of this research is: What can we learn from all those bits?
Our approach involves a novel method that reverses the fine-tuning of large language models (LLMs) to recover data. The following images illustrate key concepts of our methodology: