Neural Machine Translation by Jointly Learning to Align and Translate

Authors: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Date: 1 Sep 2014

Citation: https://doi.org/10.48550/arXiv.1409.0473

Introduction

  • The paper “Neural Machine Translation” introduces attention which is a way of enhancing encoder-decoder architectures. It argues that current traditional encoder-decoder architectures are bottlenecked in performance by using a fixed-length vectors.

  • They propose improving this by allowing a model to automatically soft search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. This is applied to Neural Machine Translation (NMT) as an example.

The New Approach

  • In the new approach, the input sequence is first projected into multiple vectors and the attention mechanism learns to combine/choose from those to produce the output sequence.

  • In reality this means that there is an individual fixed-width representation (a specific context) for each of the input elements.

  • These element specific contexts are jointly learned in this sequence-to-sequence (seq2seq) task and are built of two components:

    1. Information about other elements surrounding element *i (*called annotations on each element i)

    2. Information about how strongly each element should impact the output token (i.e. the weights)

Summary:

  • The new proposed method outperforms regular RNN-based encoder-decoders

  • The new proposed method outperforms regular phrase-based systems

  • The performance of regular RNN-based encoder-decoders drop significantly on longer sequences, while the new proposed methods does not.

Previous
Previous

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding