Paper Reading Text Generation from Knowledge Graphs with Graph Transformers

High-level

Summary:

Address the problem of generating coherent multi-sentence texts from the output from an info extraction systems, and in particular a knowledge graph.

We introduce a novel graph transforming encoder which can leverage the relationship structure of such knowledge graphs without imposing linearization or hierarchical constraints.

I feel like the model is simplifying the graph first and then applying standard graph embedding based on GAT approach. GAT is a neighbor-aware attention mechanism.

Evaluation:

Takeaways:

Abstract GENeration DAtaset (AGENDA), a dataset of knowledge graphs paired with scientific abstracts.

Graph preparation — transforming the labeled graph into unlabelled one composed of both previous nodes and label nodes. Use context nodes to promote the information flow between disconnected parts of the graph.

GAT (Graph Attention Network): Start with self-attention of local neighborhoods of vertices.

“Attention is All your Need” makes it possible to do seq2seq modeling without RNN.

Practical Value

What you can learn from this to make your research better?

This could be useful in neural program synthesis.

Details and Problems From the presenters’ point of view, what questions might audience ask?

What is the core challenge?

What is a graph transformer?

The transformer views the encoded representation of the input as a set of key-value pairs, (K, V), both of dimension n (input sequence length); in the context of NMT, both the keys and values are the encoder hidden states. In the decoder, the previous output is compressed into a query (Q of dimension m) and the next output is produced by mapping this request and the set of keys and values.

How is attention layer used here?

What is self-attention (SA)?

The SA mechanism enables us to learn from the correlation between the current words and the previous part of the sentence.

Global/local attention and the targets word context.

What is multi-head?

Multi-head is just an ensemble of multiple attentions matrices.