Vol.1, Issue 2, 2015, pp. 5-26 Full text

DOI: ☍

Author: Elena Tarasheva ☍

Affiliation: New Bulgarian University, Sofia, Bulgaria

The article reports research on the concept of key words as statistically significant items in a text or corpus. It reviews approaches to eliciting key words used in various software products for language analysis and the rationale for adopting them. Based on empirical data, a new method is proposed and tested on an exploratory corpus. The motivation and arguments for proposing the procedure are revealed, using comparisons between different languages. The adequacy of the results yielded by the different methods is tested via a mechanism developed with this research.

Key words: corpora, key words, chi-square, log likelihood, lemmas, lemmatization

Article history:
Received: 22 November 2015;
Reviewed: 14 December 2015;
Accepted: 21 December 2015;
Published: 31 December 2015

Citation (APA6):
Tarasheva, E. (2015). An alternative proposal for eliciting key words. English Studies at NBU, 1(2), 5-26. ☍

Copyright © 2015 Elena Tarasheva

This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0), which permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited. If you want to use the work commercially, you must first get the author's permission.


Baker, P. (2004). Querying keywords: questions of difference, frequency and sense in keywords analysis. Journal of English Linguistics, 32(4), 346-359. doi:10.1177/0075424204269894
View Article

Baker, P. (2006). Using Corpora in Discourse Analysis. Continuum.
Google Scholar

Davies, Mark. (2004). BYU-BNC. (Based on the British National Corpus from Oxford University Press). Available online at ☍

Dunning, T. (1993). Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19(1), 61-74.
Google Scholar

Hoey, M. (1991). Patterns of Lexis in Text. Oxford: Oxford University Press.
Google Scholar

Hoey, M. (2005). Lexical Priming: A new theory of words and language. Routledge.
Google Scholar

Kilgarriff, A. (1996). Which Words are Particularly Characteristic of Text? A Survey of Statistical Approaches. Information Technology Research Institute, University of Brighton. Retrieved from ☍

Kintsch, W., & van Dijk, T. (1978). Toward a model of text comprehension and production. Psychological Review, 85(5), 363–394. doi:10.1037/0033-295X.85.5.363
View Article

Oakes, M. (1998). Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.
Google Scholar

Phillips, M. (1989). Lexical structure of text (No. 12). English language research.
Google Scholar

Scott, M. (1997). PC Analysis of Key words - and Key Key Words. System, 25(2), 233-245.doi:10.1016/S0346-251X(97)00011-0
View Article

Scott, M (2001). Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs. In Mohsen Ghadessy, Alex Henry, Robert L. Roseberry (Eds.), Small corpus studies and ELT: theory and practice. John Benjamins B.V. doi:10.1075/scl.5
View Book

Scott, M. (2010). Problems in investigating keyness, or clearing the undergrowth and marking out trails…. In Bondi, M. & Scott, M. (Eds.), Keyness in texts (pp. 43–58). John Benjamins B.V. doi:10.1075/scl.41
View Book

Scott, M. (2012). WordSmith Tools version 6 ['Computer Software'], Stroud: Lexical Analysis Software. Retrieved from ☍

Scott, M. (2015). Wordsmith Tools Manual. Lexical Analysis Software Ltd. Retrieved from ☍

Scott, M., & Tribble, C. (2006). Textual Patterns: Key words and corpus analysis in language education. John Benjamins B.V. doi:10.1075/scl.22
View Book

Sinclair, J. (1996). The Search for Units of Meaning. Textus IX. 75-106.

Stubbs, M. (1996). Text and Corpus Analysis: Computer-Assisted Studies of Language and Culture. Oxford: Blackwell.
View Book

Stubbs, M. (2001). Words and Phrases: Corpus Studies of Lexical Semantics. London: Blackwell.
View Book

Stubbs, M. (2010). Three concepts of keywords. In Bondi, M. & Scott, M. (Eds.), Keyness in texts (pp. 21–42). John Benjamins B.V. doi:10.1075/scl.41
View Book

Tarasheva, E. (2011). Repetitions of Word Forms in Texts. Cambridge Scholars Publishing.
Google Books

Utka, A. (2004). Analysis of George Orwell’s novel 1984 by statistical methods of corpus linguistics.’ (Bachelor’s thesis, Kaunas Vytautas Magnus University, Kaunas, Lithuania). Retrieved from ☍

Williams, R. (1976/1983). Keywords: A Vocabulary of Culture and Society. London: Fontana Press.
Google Books -|- View Book