Rl guided
- search problem
- supervised learning
- markov decision process solvable via RL
we interpret state and current partial program as the environment. Seeking lines of code as action.
RISC-V assembly language (interesting part -- why not HPL?) -- no control flow, no memory read/writes.
Problem: what kind of programs are used for benchmark? What will the spec look like? random programs with specific attributes.