This article is mainly about understanding generative flow networks, a class of generative models proposed by bengio et.al. This article reflects my ongoing exploration of generative flow networks while sharing my learnings with a wider audience.

Any feadback or suggestions to this article is always welcomed. Happy to connect and discuss in detail.

Before delving directly into Generative Flow Networks, let's first understand what kind of problem it can solved, and other approaches that researches earlier used and how GFN replaces it.

To understand from a computer science perspective, Generative Flow Networks (GFNs) are well-suited to combinatorial optimization and sampling problems, where the search space is large. From a mathematical perspective, GFNs can be framed in terms of Markovian flows over a state space, where probability mass is conserved and distributed according to a function. But to understand in layman terms, GFNs are useful for problems where you build a solution step by step and there may be many valid solutions of varying quality, since they can learn to generate the most promising ones without exhaustively testing every possibility.

Let's understand this with a simple problem statement that involves graph. (cuz I love graphs :)

Besides, understanding by drawing some function distribution and trying to sample from that distribution is not that intuitive, atleast for me.

A simple problem: “Subset-sum near a target”

We have 𝑁 positive numbers x1,....xn. We need to build a subset S ⊆ {1,…,N} sequentially (left to right). Good solutions are those whose weight is close to a target.

A concrete tiny instance

Use 𝑁 = 10 w=[1,2,3,4,5,6,7,8,9,10], target 𝑇 = 25, 𝜎=3 .

Why this is a good toy example

No description has been provided for this image

Solutions

Two ways to solve it

  1. MCMC (e.g., Metropolis–Hastings):

    • Start from a random subset.
    • Propose flipping one element (include → exclude or exclude → include).
    • Accept or reject the move based on the ratio of rewards.
    • After burn-in, the chain samples subsets with probability proportional to (R(S)).
    • Limitation: moves are local, so mixing can be slow when the reward is sharply peaked.
  2. Generative Flow Networks (GFNs):

    • Learn a generative policy that decides include/skip at each step.
    • Probability of generating a subset is directly proportional to (R(S)).
    • Advantage: samples many diverse high-reward subsets in one shot, without getting stuck in local modes.

Solving the toy problem with MCMC

Now that we’ve defined the “subset-sum near a target” problem, how do we actually sample subsets. A classic approach is to use Markov Chain Monte Carlo (MCMC), specifically the Metropolis–Hastings (MH) algorithm. The idea is simple: instead of trying to list all 2N (2 raised to N) possible subsets, we build a random walk through the space of subsets, where the walk eventually spends most of its time on the “good” ones (the ones with higher reward).

In MCMC, usually you perform a reject and accept algorithms. more information can be learned from here. link1, link2, link3


Solving the toy problem with GFlowNets

While MCMC explores subsets by taking local random steps, GFlowNets take a very different perspective: instead of wandering through the state space, they learn a policy that generates entire solutions in one shot.

The key idea

Think of the subset-sum problem as a DAG of decisions.

  • Each node is a partial subset (e.g. “I’ve considered the first 3 numbers, my current sum is 7”).

  • Each edge is an action (“include” or “skip” the next number).

  • Every path from the root to a leaf corresponds to a complete subset.

GFlowNets define a flow of probability mass through this DAG.

  • The flow starts at the root.
  • At each state, the forward policy splits the flow between the outgoing actions.
  • By the time the flow reaches the leaves, the total flow arriving at a leaf is proportional to its reward 𝑅(𝑆)

In other words, GFlowNets are trained so that:

PGFN​(S)  R(S),

exactly the distribution we want.

Training signal

To enforce this, we use objectives like Trajectory Balance. For a trajectory τ from the root to subset 𝑆:

(logZ + ∑​logPF​(at​∣st​)  ∑​logPB (st​∣st+1​) −logR(S))^2

𝑃𝐹 = forward policy (learned). PB = backward policy (simple heuristic or learned) Z = normalization constant (also learned)

Minimizing this loss makes the flow consistent: the probability of generating a subset S matches its reward.

Why it works well here

This toy problem is multi-modal: there are many subsets whose sums are close to the target.

So instead of manually running a long Markov chain and hoping it explores enough, a GFN learns to jump directly to the right regions of the space.