Skip to content

Instantly share code, notes, and snippets.

@pkubik
Last active June 29, 2024 05:18
Show Gist options
  • Save pkubik/7f8358abbeaad8ce5bcae96a6e923529 to your computer and use it in GitHub Desktop.
Save pkubik/7f8358abbeaad8ce5bcae96a6e923529 to your computer and use it in GitHub Desktop.
GFlowNets

GFlowNets

Base paper: https://arxiv.org/abs/2106.04399

Extension: https://arxiv.org/pdf/2111.09266.pdf

Tutorial: https://milayb.notion.site/The-GFlowNet-Tutorial-95434ef0e2d94c24aab90e69b30be9b3

Lecture: https://www.youtube.com/watch?v=ZncOp1Dvw9o

Interview: https://www.youtube.com/watch?v=M49TMqK5uCE

Grid Toy example: https://github.com/GFNOrg/gflownet/blob/master/grid/toy_grid_dag.py

Pseudo-Value functions: https://arxiv.org/pdf/1910.06862.pdf

Training

Just sample trajectories on graph and ensure that they all preserve the property that the sum of inputs equals the sum of outputs (kind of Kirchhoff's current law).

The inputs and outputs of each node are computed by a neural network, that can be updated with backpropagation from the loss function based on the above property.

Comparison to MCMC

MCMC traverse search space randomly. GFlowNets use neural networks to find structure in the data. In an example, Bengio shows a function where all high reward modes in 2d space are aligned on a grid. Smart agent should look for other modes on the remaining grid points.

Comparison to RL

Deep RL maximize probability of choosing the best action, e.g. by optimizing likelihood of categorical distribution over actions.

GFlowNets try to properly estimate the sum of expected rewards at each state.

Comparison to Monte Carlo Tree Search

MCTS use pseudo-value function to estimate expected rewards in each state. It prefers modes with longer sequences of actions, because their final states can be usually constructed in more ways.

E.g. a set {1, 2, 3, 4}, can be obtained by sequentially adding the numbers in all possible permutations $4! = 24$. In a three the reward for this set would be added 24 times. For smaller sets, e.g. {1, 2} that would be only 2 times. Probability of the former is overestimated 12x, so the smaller set would need to be 12x more likely (have 12x larger reward) to be sampled with the same probability.

Hypergrid Experiments

A good example of artificial experiment design. Shows that GFlowNets can be trained off-policy, so without using actions generated by the network itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment