Machine translation (MT) is the task of converting given text from one language to another language using automatic algorithms. MT helps a lot to facilitate communication between people from different countries, e.g. in sharing knowledge or in traveling around. It is one of the most difficult problems in NLP, which requires an excellent bilingual ability and a comprehensive understanding of semantics and syntax. No wonder MT is categorized as AI-complete and drives the advances in sequence-to-sequence models.
Multilingual Translation Models
MT quality varies a lot from language to language due to the different availability of bilingual data. For low-resource language pairs, the best way to promote the performance is by transferring knowledge from high-resource languages and utilizing monolingual data. Combining these strategies, we ultimately want a multilingual MT system that translates from any language to any language. The key factors are scalable deployment and effective interlingua for thousands of languages.
MT often misses subtle nuances or loses consistency of specific terms, while human translators may correct these mistakes easily. Computer-aided translation (CAT) studies the efficient interaction between a human and MT, ensuring the high quality and removing MT's artificial tone. CAT is indeed currently the most trustworthy solution for localizing commercial or confidential content.
A sentence may have multiple translations depending on its context. Surrounding sentences, the domain of the text, properties of the project, its visual scene... Anything regarding the semantics can be integrated into the translation model to disambiguate context-dependent lexical choices. However, we should be aware of its sparse influence and increasing complexity.