The field of text generation has seen an explosion in the number of works using neural networks (more specifically LSTMs) to generate text. These methods generate sentences in a sequential word-by-word fashion where at each step you pick a word according to a probability distribution over them. Generating sentences from scratch has proven to be a really difficult task. Maintaining grammar is a struggle, and sentences tend to digress or follow some common phrases repeatedly. If you think about it, it seems really difficult if not impossible, to completely represent the meaning of an entire sentence into the small initial cell state of an LSTM which has the task of generating it. Stealing words from Professor Raymond Mooney’s talk1,
“You can’t cram the meaning of a whole sentence into a single vector!”
- Professor Mooney (slightly paraphrased)
It is refreshing to come across some recent works where people do not attempt to start from scratch, but take help from some reference text while generating. This blogpost gives a peek at two such cool works.
The specific problem addressed in this work2 is the generation of summaries for news articles. The main attraction of the paper is a model which can both copy words from the source article, as well as come up with new words which are not in the article while generating the summary. The soul of the technique is an attention distribution over the source article’s words, which varies with the generation of each new word in the summary. So, for generating each word, the network flips a biased coin and depending on the result either copies the word in the summary which is pointed to with the highest attention, or generates a new word on its own. Hence the name pointer-generator network. A major contribution of the work is that the architecture learns the bias of the coin (which changes before generating each word) on its own, so that it knows when to be confident and generate new words versus when to just copy. This is just supposed to be a simplistic explanation. See the paper for more details.
Experiments have shown that this model can learn to generate new words like beat in the summary by paying attention to related words like victorious and win in the source article3. Besides learning to generate new words, it also learns to make new sentences by cutting parts from different ones and joining them together - all with no explicit supervision other than the ground truth summaries of the articles! Of course it has some issues like sometimes it tends to just copy a lot and not generate much, and sometimes it joins clauses that together do not convey the correct sense. But overall it performs well and is a fantastic direction of work which shows how starting with a reference point improves the quality of generation.
This work4 uses one of my favourite neural architectures - variational autoencoders. People have tried autoencoding sentences earlier where an input sentence is mapped to a latent vector and a decoder learns to bring back the original sentence using just the vector. But again the performance was not good (read the statement by Prof. Mooney again). This work implements a model which learns to edit sentences in a controlled way to change them into new meaningful ones. This is done by using a conditional version of autoencoders, where you have to autoencode (eg. “The sandwich was nice.”) given some lexically similar sentence (eg. “The sandwich was great.”) and an edit vector which is a hint about what should change to go from to (remove great and add nice). They have modeled the edit vector as a concatenation of the sum of vector representations of words that should be added and words that should be removed. Although it looks like a big hint, note that the model has to learn where to add words while maintaining grammaticality, and summing the vector representations of words makes the hint fuzzy - both making the task far from trivial. After the training is done (which is another elaborate saga - see the paper), the decoder part can be used as a standalone neural editor which can generate new sentences from old ones with the given desired change.
One obvious thing is that the grammar of the generated text is much better because of a grammatically correct starting reference point. But another thing which is much much more dear to me, is the ability to see into how the text is getting formed and to control its generation. It is much easier to have an intuitive grasp of attention distributions and edit vectors to see how they are leading to the text being generated, than trying to ponder what might be sitting inside a single vector tasked with generating an entire sentence. With edit vectors, you even have the ability to control generation. That is why I love this area. Hoping to see more work in this area and be a part of it soon!
Note: I stumbled upon a podcast on upcoming techniques in NLP called NLP Highlights which is surprisingly up-to-date on the literature and keeps posting great content. You can find discussion about pointer-generator network here and Neural Editor here by Kelvin Guu himself.
Written by Kundan Krishna on 16th December 2017