Rl guided

  1. search problem
  2. supervised learning
  3. markov decision process solvable via RL

we interpret state and current partial program as the environment. Seeking lines of code as action.

RISC-V assembly language (interesting part -- why not HPL?) -- no control flow, no memory read/writes.

Problem: what kind of programs are used for benchmark? What will the spec look like? random programs with specific attributes.