WHERE GOOGLE FAILS

Vol.1, Issue 2, 2015, pp.115-132 Full text

Author: Maria Stambolieva


Affiliation: New Bulgarian University, Sofia, Bulgaria

Abstract
The paper presents ongoing research in contrastive corpus linguistics with envisaged applications in machine translation (MT) and with focus on Google Translate (GT) performance in English-Bulgarian translation. Structural patterns, forms or expressions where automatic translation fails are identified and analysed in view of creating a GT-editing tool providing improved target language output. The paper presents the corpus and the corpus analysis method applied, including the identification of inacceptable string types, their structural analysis and categorization. For each failure type, pre- or post-GT editing transformations are proposed. A first outline is proposed of a GT-editing tool consisting of a pre-GT editor performing string identification, substitution or deletion operations, a post-GT editor with a set of more complex string transformation rules and an additional module transferring structural information.

Key words: machine translation, pre-editing, post-editing, Google Translate, bitext, computational linguistics, corpus linguistics

Article history:
Received: 8 November 2015;
Reviewed: 20 November 2015;
Revised: 29 November2015;
Accepted: 30 November 2015;
Published: 31 December 2015

Citation (APA6):
Stambolieva, Maria. (2015). Where Google fails. English Studies at NBU, 1(2), 115-132. Retrieved from http://esnbu.org/data/files/2015/2015-2-8-stambolieva-pp115-132.pdf

Copyright © 2015 Maria Stambolieva


This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0), which permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited. If you want to use the work commercially, you must first get the author's permission.

References:

Arnold D.J., Balkan, L., Meijer, S., Lee Humphreys, R., & Sadler, L. (1994). Machine Translation: An Introductory Guide. Manchester/Oxford: NCC/Blackwell.

Danchev, A., & Alexieva, B. (1974). Izborat mezhdu minalo svarsheno i minalo nesvarsheno vreme pri prevoda na the Past Simple Tense ot anglijski na balgarski ezik. [`The choice between the Aorist and the Imperfect in the translation of the Past Simple Tense from English to Bulgarian`]. Faculty of Classical and Modern Languages Yearbook, LXVII(1), 249-329.

Desclés, J.-P. (1990). State, Event, Process and Topology. General Linguistics, 29(3), 159-200.

Nakov, P. (2012). Savremenen statisticheski mashinen prevod. Modern statistical machine translation. In M. Stambolieva (Ed.), Kompyutarna lingvistika 1, Problemi i perspektivi [Computer Linguistics 1, Problems and Perspectives] pp.110-155. Sofia: ANABELA. Retrieved from
http://people.ischool.berkeley.edu/~nakov/selected_papers_list/nakov_prevod_sp_Avtomatika_Informatika.pdf

Quigley, R. (2010). How does Google Translate work. Retrieved from http://www.themarysue.com/how-does-google-translate-work

Ridjanovic, M. (1969). A Synchronic Study of Verbal Aspect in English and Serbo-Croatian. (Doctoral dissertation). University of Michigan.

Sag, I., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. Lecture Notes in Computer Science, 2276, 1-15.

Searle, J. (1995). John R. Searle. In: S. Guttenplan (Ed.), A Companion to the Philosophy of Mind (pp. 544-550). Basil Blackwell Ltd., Oxford.

Slocum, J. (1985). A Survey of Machine Translation: Its History, Current Status and Future Prospects. Computational Linguistics, 11(1), 1-17.

Stambolieva, M. (2008). Building Up Aspect: A study of aspect and related categories in Bulgarian, with parallels in English and French. Peter Lang: Oxford, Bern, New York.

Stambolieva, M. (2012,). Parallel Corpora in Aspectual Studies of Non-Aspect Languages. Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, RANLP 2011, 39-42. Shoumen: Incoma Ltd. Retrieved from http://www.aclweb.org/anthology/W11-4306

Stambolieva, M. (2016). Angliyski ezik – Samouchitel I [English language – Self study I]. GRAMMA Publishers: Pleven (in print).

Teubert, W. (1997). Translation and the Corpus. In R. Marcinkeviciene, & Volz, N. (Eds.), Proceedings of the Second TELRI Seminar on Language Applications for a Multilingual Europe, 147-164. Kaunas: Lithuania.

Verkuyl, H. (1972). On the Compositional Nature of the Aspects. Reidel: Dordrecht.

Verkuyl, H. (1993). A Theory of Aspectuality: The interaction between temporal and atemporal structure. (Cambridge Studies in Linguistics). Cambridge: Cambridge University Press.