Swiss German / German Machine Translation

x

We all know: Germans write parken, the Swiss use parkieren. Could we extract the differences in lexis, morphosyntax and syntax systematically and automatically?

Last October, two research groups ([1,2]) have announced the exciting news of training end-to-end machine translation (MT) models with monolingual data only, which incurs the change in the fashion of how we leverage models to extract bilingual lexica and to perform MT in the near future. In particular, the proposed architecture is appealing for closely related languages or language variants with low resource, such as the Swiss High German (CH_DE, the standard language of written communication in Switzerland), and German High German (DE_DE).

The project aims at using the proposed architecture above (Figure 1 from [1]) to detect and interpret differences between the Swiss High German and German High German.

The project will build on the data and contribution from [3], where keyword extraction of both written German variants was performed using the method of document classification. The system to investigate and evaluate in this project takes monolingual inputs in both language variants, learns via a joint encoder to reconstruct the source sentences, and then uses two decoders (one for each variant) to translate from a noisy version of source sentences.

To evalute the system, we rely on (i) the BLEU score on translating features in [3] from one variant to another and (ii) the averaged BLEU score as proposed in [2]. As a byproduct we obtain a CH_DE-DE_DE bilingual mapping in the same latent space.

References

  1. [1] Artetxe, M., Labaka, G., Agirre, E., & Cho, K. (2017). Unsupervised neural machine translation, last revised 26 Feb, 2018, [arxiv].
  2. [2] Lample, G., Denoyer, L., & Ranzato, M. A. (2017). Unsupervised Machine Translation Using Monolingual Corpora Only, last revised 13 Apr 2018, [arxiv].
  3. [3] Schneider, G. (2018). Differences between Swiss High German and German German via data-driven methods. In: SwissText 2018: 3rd Swiss Text Analytics Conference, Winterthur, 12 June 2018 - 13 June 2018, Epub ahead of print, [zora].