Grammar rl

Program aliasing
Syntax correctness

ref: Parisotto

sequential LSTM-based language model, conditioned on an embedding of the IO pair and of the last predicated token.

beam search: Candidate sequences of words are scored based on their likelihood. It is common to use a greedy search or a beam search to locate candidate sequences of text. We will look at both of these decoding algorithms in this post.

Another popular heuristic is the beam search that expands upon the greedy search and returns a list of most likely output sequences. The key idea in beam search is that we can find K-optimal solutions, while greedy search is only 1. it is like a enhanced greedy search. Something called exposure bias makes semantically identical programs incorrectly penalized. Conditioning on the syntactical correctness -- what does it mean? eval lang supports loops and conditionals, but not variable assignment.

randomly sampled programs.

I have a question: what is the difference between baseline and non-baseline methods? What does it mean by the metric "generalization"?

NO high-perf in RL training.

This paper doesn't look very convincing for me ... kinda bad pitch.