You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Programming is about information not data, or: you might not need dependent types
Programming is about information not data, or: you might not need dependent types
When I took "Fundamentals of Computer Science" in college, my professor was
very adamant about the distinction between data and information and about how data doesn't have any inherent meaning.
At the time, it seemed a bit silly to me how much emphasis he put on such a seemingly insignificant difference.
In retrospect, I think he was exactly right about this and I wish more programmers took it to heart.
Data is something you can store in a computer, such as, let's say, the byte 0b01000001.
Accelerating Discrete Program Search with SUP Nodes
Fast Discrete Program Search 2
I am investigating how to use Bend (a parallel language) to accelerate Symbolic AI; in special, Discrete Program Search. Basically, think of it as an alternative to LLMs, GPTs, NNs, that is also capable of generating code, but by entirely different means. This kind of approach was never scaled with mass compute before - it wasn't possible! - but Bend changes this. So, my idea was to do it, and see where it goes.
Now, while I was implementing some candidate algorithms on Bend, I realized that, rather than mass parallelism, I could use an entirely different mechanism to speed things up: SUP Nodes. Basically, it is a feature that Bend inherited from its underlying model ("Interaction Combinators") that, in simple terms, allows us to combine multiple functions into a single superposed one, and apply them all to an argument "at the same time". In short, it allows us to call N functions at a fraction of the expected cost. Or, in simple terms: why parallelize when we can share?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Stable Diffusion's VAE is a neural network that encodes images into a compressed "latent" format and decodes them back. The encoder performs 48x lossy compression, and the decoder generates new detail to fill in the gaps.
(Calling this model a "VAE" is sort of a misnomer - it's an encoder with some very slight KL regularization, and a conditional GAN decoder)
This document is a big pile of various links with more info.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# WARNING! The script is meant to show how and what can be disabled. Don’t use it as it is, adapt it to your needs.
# Credit: Original idea and script disable.sh by pwnsdx https://gist.github.com/pwnsdx/d87b034c4c0210b988040ad2f85a68d3
# Disabling unwanted services on macOS Big Sur (11), macOS Monterey (12), macOS Ventura (13) and macOS Sonoma (14)
# Disabling SIP is required ("csrutil disable" from Terminal in Recovery)
# Modifications are written in /private/var/db/com.apple.xpc.launchd/ disabled.plist, disabled.501.plist
# To revert, delete /private/var/db/com.apple.xpc.launchd/ disabled.plist and disabled.501.plist and reboot; sudo rm -r /private/var/db/com.apple.xpc.launchd/*
So you want to generate images with neural networks. You're in luck! VAEs are here to save the day. They're simple to implement, they generate images in one inference step (unlike those awful slow autoregressive models) and (most importantly) VAEs are 🚀🎉🎂🥳 theoretically grounded 🚀🎉🎂🥳 (unlike those scary GANs - don't look at the GANs)!
The idea
The idea of VAE is so simple, even an AI chatbot could explain it:
Your goal is to train a "decoder" neural network that consumes blobs of random noise from a fixed distribution (like torch.randn(1024)), interprets that noise as decisions about what to generate, and produces corresponding real-looking images. You want to train this network with nice simple image-space MSE loss against your dataset of real images.
With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback".
I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters