WHERE GOOGLE FAILS
Vol.1, Issue 2, 2015, pp.115-132 Full text
DOI: https://doi.org/10.33919/esnbu.15.2.8
Web of Science: 000449158800008
Author
Maria Stambolieva https://orcid.org/0000-0002-8596-273X
Affiliation:
New Bulgarian University, Sofia, Bulgaria
Abstract
The paper presents ongoing research in contrastive corpus linguistics with envisaged applications in machine translation (MT) and with focus on Google Translate (GT) performance in English-Bulgarian translation. Structural patterns, forms or expressions where automatic translation fails are identified and analysed in view of creating a GT-editing tool providing improved target language output. The paper presents the corpus and the corpus analysis method applied, including the identification of inacceptable string types, their structural analysis and categorization. For each failure type, pre- or post-GT editing transformations are proposed. A first outline is proposed of a GT-editing tool consisting of a pre-GT editor performing string identification, substitution or deletion operations, a post-GT editor with a set of more complex string transformation rules and an additional module transferring structural information.
Key words: machine translation, pre-editing, post-editing, Google Translate, bitext, computational linguistics, corpus linguistics
Article history:
Submitted: 8 November 2015;
Reviewed: 20 November 2015;
Revised: 29 November2015;
Accepted: 30 November 2015;
Published: 31 December 2015
Citation (APA):
Stambolieva, Maria. (2015). Where Google fails. English Studies at NBU, 1(2), 115-132. https://doi.org/10.33919/esnbu.15.2.8
Copyright © 2015 Maria Stambolieva
This open access article is published and distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0), which permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited. If you want to use the work commercially, you must first get the authors' permission.
References
Arnold D.J., Balkan, L., Meijer, S., Lee Humphreys, R., & Sadler, L. (1994). Machine Translation: An Introductory Guide. NCC/Blackwell.
Danchev, A., & Alexieva, B. (1974). Izborat mezhdu minalo svarsheno i minalo nesvarsheno vreme pri prevoda na the Past Simple Tense ot anglijski na balgarski ezik. [`The choice between the Aorist and the Imperfect in the translation of the Past Simple Tense from English to Bulgarian`]. Faculty of Classical and Modern Languages Yearbook, LXVII(1), 249-329.
Desclés, J.-P. (1990). State, Event, Process and Topology. General Linguistics, 29(3), 159-200.
Nakov, P. (2012). Savremenen statisticheski mashinen prevod. (Modern statistical machine translation). In M. Stambolieva (Ed.), Kompyutarna lingvistika 1, Problemi i perspektivi (Computer Linguistics 1, Problems and Perspectives) (pp. 110-155). ANABELA.
Quigley, R. (2010). How does Google Translate work. http://www.themarysue.com/how-does-google-translate-work
Ridjanovic, M. (1969). A Synchronic Study of Verbal Aspect in English and Serbo-Croatian. ['Unpublished doctoral dissertation, University of Michigan'].
Sag, I., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. Lecture Notes in Computer Science, 2276, 1-15. https://doi.org/10.1007/3-540-45715-1_1
Searle, J. (1995). John R. Searle. In S. Guttenplan (Ed.), A Companion to the Philosophy of Mind (pp. 544-550). Basil Blackwell Ltd. https://doi.org/10.1111/b.9780631199960.1995.00018.x
Slocum, J. (1985). A Survey of Machine Translation: Its History, Current Status and Future Prospects. Computational Linguistics, 11(1), 1-17.
Stambolieva, M. (2008). Building Up Aspect: A study of aspect and related categories in Bulgarian, with parallels in English and French. Peter Lang.
Stambolieva, M. (2012,). Parallel Corpora in Aspectual Studies of Non-Aspect Languages. Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, RANLP 2011, 39-42. Incoma Ltd. http://www.aclweb.org/anthology/W11-4306
Stambolieva, M. (2016). Angliyski ezik – Samouchitel I [English language - Self study I]. GRAMMA Publishers.
Teubert, W. (1997). Translation and the Corpus. In R. Marcinkeviciene, & Volz, N. (Eds.), Proceedings of the Second TELRI Seminar on Language Applications for a Multilingual Europe, 147-164. Lithuania.
Verkuyl, H. (1972). On the Compositional Nature of the Aspects. Reidel. https://doi.org/10.1007/978-94-017-2478-4
Verkuyl, H. (1993). A Theory of Aspectuality: The interaction between temporal and atemporal structure. Cambridge Studies in Linguistics. Cambridge University Press. https://doi.org/10.1017/CBO9780511597848