Base paper: https://arxiv.org/abs/2106.04399
Extension: https://arxiv.org/pdf/2111.09266.pdf
Tutorial: https://milayb.notion.site/The-GFlowNet-Tutorial-95434ef0e2d94c24aab90e69b30be9b3
Lecture: https://www.youtube.com/watch?v=ZncOp1Dvw9o
Interview: https://www.youtube.com/watch?v=M49TMqK5uCE
Grid Toy example: https://github.com/GFNOrg/gflownet/blob/master/grid/toy_grid_dag.py
Pseudo-Value functions: https://arxiv.org/pdf/1910.06862.pdf
Just sample trajectories on graph and ensure that they all preserve the property that the sum of inputs equals the sum of outputs (kind of Kirchhoff's current law).
The inputs and outputs of each node are computed by a neural network, that can be updated with backpropagation from the loss function based on the above property.
MCMC traverse search space randomly. GFlowNets use neural networks to find structure in the data. In an example, Bengio shows a function where all high reward modes in 2d space are aligned on a grid. Smart agent should look for other modes on the remaining grid points.
Deep RL maximize probability of choosing the best action, e.g. by optimizing likelihood of categorical distribution over actions.
GFlowNets try to properly estimate the sum of expected rewards at each state.
MCTS use pseudo-value function to estimate expected rewards in each state. It prefers modes with longer sequences of actions, because their final states can be usually constructed in more ways.
E.g. a set {1, 2, 3, 4}, can be obtained by sequentially adding
the numbers in all possible permutations
A good example of artificial experiment design. Shows that GFlowNets can be trained off-policy, so without using actions generated by the network itself.