What Role Does BERT Play in the Neural Machine Translation Encoder?Download PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Pre-trained language models have been widely applied in various natural language processing tasks. But when it comes to neural machine translation, things are a little different. The differences between the embedding spaces created by BERT and NMT encoder may be one of the main reasons for the difficulty of integrating pre-trained LMs into NMT models. Previous studies illustrate the best way of integration is introducing the output of BERT into the encoder with some extra modules. Nevertheless, it is still unrevealed whether these additional modules will affect the embedding spaces created by the NMT encoder or not and what kind of information the NMT encoder takes advantage of from the output of BERT. In this paper, we start by comparing the changes of embedding spaces after introducing BERT into the NMT encoder trained on different machine translation tasks. Although the changing trends of these embedding spaces vary, introducing BERT into the NMT encoder will not affect the space of the last layer significantly. Subsequent evaluation on several semantic and syntactic tasks proves the NMT encoder is facilitated by the rich syntactic information contained in the output of BERT to boost the translation quality.
Paper Type: long
0 Replies

Loading