GFlowNets

Training

Just sample trajectories on graph and ensure that they all preserve the property that the sum of inputs equals the sum of outputs (kind of Kirchhoff's current law).

The inputs and outputs of each node are computed by a neural network, that can be updated with backpropagation from the loss function based on the above property.

Comparison to MCMC

MCMC traverse search space randomly. GFlowNets use neural networks to find structure in the data. In an example, Bengio shows a function where all high reward modes in 2d space are aligned on a grid. Smart agent should look for other modes on the remaining grid points.

Comparison to RL

Deep RL maximize probability of choosing the best action, e.g. by optimizing likelihood of categorical distribution over actions.

GFlowNets try to properly estimate the sum of expected rewards at each state.

Comparison to Monte Carlo Tree Search

MCTS use pseudo-value function to estimate expected rewards in each state. It prefers modes with longer sequences of actions, because their final states can be usually constructed in more ways.

E.g. a set {1, 2, 3, 4}, can be obtained by sequentially adding the numbers in all possible permutations $4! = 24$. In a three the reward for this set would be added 24 times. For smaller sets, e.g. {1, 2} that would be only 2 times. Probability of the former is overestimated 12x, so the smaller set would need to be 12x more likely (have 12x larger reward) to be sampled with the same probability.

Hypergrid Experiments

A good example of artificial experiment design. Shows that GFlowNets can be trained off-policy, so without using actions generated by the network itself.

pkubik/gflownets.md