Skip to main navigation menu Skip to main content Skip to site footer


Vol. 26 No. 1 (2023)

Aspects of EFL University Learners’ Lexical and Phraseological Proficiency as Predictors of Writing Quality

January 18, 2022


This study aims to examine the relationship between the productive knowledge of some lexical and phraseological indices and the quality of English as a Foreign Language (EFL) learners’ writing. A sample of 120 expository essays, written by semesters 1 and 5 university students in a less proficient EFL context, are rated by human evaluators and automatically examined for the target indices. The results show that, unlike the index of lexical diversity, both indices of content word frequency and range could significantly discriminate between different proficiency levels. For the phraseological indices, both the proportions of rare and frequent bigrams yielded between-group differences, with higher proficiency students performing significantly better in both categories. Using a regression analysis, the results show that the use of rare and contextually restricted content words and the production of larger proportions of rare and frequent bigrams could be considered indicators of better writing proficiency. The study suggests implications for the teaching of EFL.


  1. Bestgen, Y. (2016a). Using collocational features to improve automated scoring of EFL texts. Proceedings of the 12th workshop on multiword expressions, 84-90.
  2. Bestgen, Y. (2016b). Evaluation automatique de textes : Validation interne et externe d'indices phraséologiques pour l'évaluation automatique de textes rédigés en anglais langue étrangère. Traitement automatique des langues, 57(3), 91-115.
  3. Bestgen, Y. (2019). Évaluation de textes en anglais langue étrangère et séries phraséologiques : comparaison de deux procédures automatiques librement accessibles. Revue française de linguistique appliquée, XXIV, 81-94.
  4. Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26, 28-41.
  5. Cohen, J. (1969). Statistical power analysis for the behavioral sciences. Academic Press.
  6. Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian Knot: The moving average type token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94-100.
  7. Cowie, A. P. (1994). Phraseology. In R. E. Asher (Ed.), The encyclopedia of language and linguistics (pp. 3168-3171). Oxford University Press.
  8. Cowie, A. P. (Ed.). (1998). Phraseology: Theory, analysis, and applications. Oxford University Press.
  9. Crossley, S. A., & Kyle, K. (2018). Assessing writing with the tool for the automatic analysis of lexical sophistication (TAALES). Assessing Writing, 38, 46-50.
  10. Crossley, S. A., & McNamara, D. S. (2012). Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading. 35(2). 115-135.
  11. Crossley, S. A., Cai, Z., & McNamara, D. (2012). Syntagmatic, paradigmatic, and automatic N-gram approaches to assessing essay quality. Proceedings of the 25th International Florida Artificial Intelligence Research Society Conference, FLAIRS-25 (pp. 214-219).
  12. Crossley, S. A., Salsbury, T., & McNamara, D. (2010). The development of polysemy and frequency use in English second language speakers. Language Learning, 60, 573-605.
  13. Crossley, S. A., Salsbury, T., & McNamara D. S. (2014). Assessing lexical proficiency using analytic ratings: A case for collocation accuracy. Applied Linguistics, 36(5), 570-590,
  14. Crossley, S. A., Subtirelu, N., & Salsbury, T. (2013). Frequency effects or context effects in second language word learning: What predicts early lexical production? Studies in Second Language Acquisition, 35(4), 727-755.
  15. Crossley, S. A., Kyle, K., Allen, L., Guo, L., & McNamara, D. S. (2014). Linguistic microfeatures to predict L2 writing proficiency: A case study in automated writing evaluation. Journal of Writing Assessment, 7(1).
  16. Daller, H., Milton, J., & Treffers-Daller, J. (Eds.). (2007). Modelling and assessing vocabulary knowledge. Cambridge University Press. doi:10.1017/CBO9780511667268
  17. Durrant, P. (2014). Corpus frequency and second language learners’ knowledge of collocations: A meta-analysis. International Journal of Corpus Linguistics, 19(4), 443-477.
  18. Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics in Language Teaching (IRAL), 47,157-177.
  19. Ellis, N. C. (2002a). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24(2), 143-188.
  20. Ellis, N. (2002b). Reflections on frequency effects in language processing. Studies in Second Language Acquisition, 24(2), 297-339.
  21. Evert, S. (2005). The statistics of word cooccurrences: Words pairs and collocations [Unpublished doctoral dissertation]. Universität Stuttgart.
  22. Evert, S. (2009). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (pp. 1211-1248). Mouton de Gruyter.
  23. Fergadiotis, G., Wright, H. H., & West, T. M. (2013). Measuring lexical diversity in narrative discourse of people with aphasia. American Journal of Speech-Language Pathology, 22(2), S397-S408. 10.1044/1058-0360(2013/12-0083)
  24. Fergadiotis, G., Wright, H. H., & Green, S. B. (2015). Psychometric evaluation of lexical diversity indices: Assessing length effects. Journal of Speech, Language, and Hearing Research: JSLHR, 58(3), 840-852.
  25. Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. In Studies in linguistic analysis (pp. 1-32). Blackwell.
  26. Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in corpus-based language learning research: Identifying, comparing, and interpreting the evidence. Language Learning, 67, 155-179.
  27. González, M. C. (2017). The contribution of lexical diversity to college-level writing. TESOL Journal, 8(4), 899-919.
  28. Granger, S., & Bestgen, Y. (2014). The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study. International Review of Applied Linguistics in Language Teaching (IRAL), 52(3), 229-252.
  29. Granger, S. & Bestgen, Y. (2017). Using collgrams to assess L2 phraseological development: A replication study. In P. de Haan, S. van Vuuren & R. de Vries (Eds.), Language, learners and levels: Progression and variation. corpora and language in use Proceedings 3 (pp. 385-408). Presses Universitaires de Louvain.
  30. Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13, 403-437.
  31. Gries, S. T. (2010). Dispersions and adjusted frequencies in corpora: Further explorations. In Corpus-linguistic applications (pp. 197–212). Brill Rodopi.
  32. Gries, S. T., & Ellis, N. (2015). Statistical measures for usage-based linguistics. Language Learning, 65(1), 228-255.
  33. Guo, L., Crossley, S. A., & McNamara, D. S. (2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218-238.
  34. Halliday, M. A. K. (1966). Lexis as a linguistic level. In C. Bazell, J. Catford, M. A. K. Halliday, & R. Robins (Eds.), In memory of J. R. Firth (pp. 148-162). Longman.
  35. Howarth, P. A. (1996). Phraseology in English academic writing. Max Niemeyer Verlag.
  36. Howarth, P. A. (1998). Phraseology and second language proficiency. Applied Linguistics, 19(1), 24-44.
  37. Hoey, M. (2005). Lexical priming: A new theory of words and language. Routledge.
  38. Jansen, T., Vögelin, C., Machts, N., Keller, S., Köller, O., & Möller, J. (2021). Judgment accuracy in experienced versus student teachers: Assessing essays in English as a foreign language. Teaching and Teacher Education, 97, 103216.
  39. Jarvis, S. (2013). Defining and measuring lexical diversity. In S. Jarvis & M. Daller (Eds.). Vocabulary knowledge: Human ratings and automated measures (pp. 13-44). John Benjamins Publishing.
  40. Jarvis, S. (2017). Grounding lexical diversity in human judgments. Language Testing, 34(4), 537-553.
  41. Jarvis, S. & Hashimoto, B. J., (2021). How operationalizations of word types affect measures of lexical diversity. International Journal of Learner Corpus Research, 7(1),163-194.
  42. Johnson, W. (1944). Studies in language behavior I: A program of research. Psychological Monographs, 56(2), 1-15.
  43. Jung, Y. J., Crossley, S. A., & McNamara, D. (2019). Predicting second language writing proficiency in learner texts using computational tools. Journal of Asia TEFL, 16(1), 37-52.
  44. Kim, M., Crossley, S.A. & Kyle, K. (2018). Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. The Modern Language Journal,102, 120-141.
  45. Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757-786.
  46. Kyle, K., & Crossley, S. A. (2016). The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing, 34, 12-24.
  47. Kyle, K., Crossley, S., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods, 50(3), 1030-1046.
  48. Kyle, K., Crossley, S. A., & Jarvis, S. (2020). Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly, 0(0), 1-17.
  49. Kyle, K., Allen, L., Guo, L., & McNamara, D. (2014). Linguistic microfeatures to predict L2 writing proficiency: A case study in automated writing evaluation. Journal of Writing Assessment, 7(1), 1-16.
  50. Laufer, B. (1994). The lexical profile of second language writing: Does it change over time? RELC Journal, 25(2), 21-33.
  51. Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3), 307-322.
  52. Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language development. Palgrave.
  53. McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity [Unpublished doctoral dissertation]. The University of Memphis.
  54. McCarthy, P. M., & Jarvis, S. (2007). Vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459-488.
  55. McCarthy, P. M., & Jarvis, S. (2010). MTLD, Vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392.
  56. McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of writing quality. Written Communication, 27(1), 57-86.
  57. Mel'cuk, I. (1995). Phrasemes in language and phraseology in linguistics. In M. Everaert, E.-J. van der Linden, A. Schenk & R. Schreuder (Eds.), Idiom: Structural and psychological perspectives (pp. 167-232). Lawrence Erlbaum Associates.
  58. Mel'cuk, I. (1998). Collocations and lexical functions. In A.P. Cowie (Ed.), Phraseology: Theory, analysis, and applications (pp. 23-53). Clarendon Press.
  59. Monteiro, K. R., Crossley, S. A., & Kyle, K. (2018). In search of new benchmarks: Using L2 lexical frequency and contextual diversity indices to assess second language writing. Applied Linguistics, 41(2), 280-300.
  60. Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of Applied Linguistics, 32, 130-149.
  61. Qian, Y. (2019). Dynamism of collocation in L2 English writing: A bigram-based study. International Review of Applied Linguistics in Language Teaching,0(0).
  62. Read, J. (2000). Assessing vocabulary. Cambridge University Press.
  63. Römer, U. (2009). The inseparability of lexis and grammar: Corpus linguistic perspectives. Annual Review of Cognitive Linguistics, 7(1), 140-162.
  64. Salsbury, T., Crossley, S. A., & McNamara, D. S. (2011). Psycholinguistic word information in second language oral discourse. Second Language Research 27(3), 343-360.
  65. Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. Palgrave Macmillan.
  66. Sinclair, J. M. (1987). Collocation: A progress report. In R. Steele & T. Threadgold (Eds.), Language topics (pp. 319-331). Benjamins.
  67. Sinclair, J. (1991). Corpus, concordance, collocation. Oxford University Press.
  68. Sinclair, J. (1996). The search for units of meaning. Textus, 9(1), 75-106.
  69. Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8(2), 209-243.
  70. Treffers-Daller, J. (2013). Measuring lexical diversity among L2 learners of French: an exploration of the validity of D, MTLD and HD-D as measures of language ability. S. Jarvis & M. Daller (Eds.). Vocabulary knowledge: Human ratings and automated measures (pp. 79-104). John Benjamins Publishing.
  71. Vidal, K., & Jarvis, S. (2018). Effects of English-medium instruction on Spanish students’ proficiency and lexical diversity in English. Language Teaching Research, 24(5), 568-587.
  72. Wolk, K., Wolk, A. & Marasek, K. (2017). Unsupervised tool for quantification of progress in L2 English phraseological. Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, 383-388.
  73. Wray, A. (2000). Formulaic sequences in second language teaching: Principles and practice. Applied Linguistics, 21(4), 463-489.
  74. Yoon, Hyung-Jo. (2018). The development of ESL writing quality and lexical proficiency: Suggestions for assessing writing achievement. Language Assessment Quarterly, 15(4), 387-405.
  75. Zenker, F., & Kyle, K. (2021). Investigating minimum text lengths for lexical diversity indices. Assessing Writing, 47, 100505.