Examining Rater Performance on the CELBAN Speaking: A Many-Facets Rasch Measurement Analysis

Peiyu Wang; Karen  Coetzee; Andrea Strachan; Sandra Monteiro; Liying  Cheng

doi:10.37213/cjal.2020.30436

Articles

Vol. 23 No. 2 (2020): Special Issue: The Canadian National Frameworks for English and French Language Proficiency: Application, Implication, and Impact

Examining Rater Performance on the CELBAN Speaking: A Many-Facets Rasch Measurement Analysis

Peiyu Wang^▸^▾
Karen Coetzee
Andrea Strachan
Sandra Monteiro
Liying Cheng

PDF

DOI: https://doi.org/10.37213/cjal.2020.30436
Submitted: December 16, 2019
Published: 2020-10-16

Abstract

Internationally educated nurses’ (IENs) English language proficiency is critical to professional licensure as communication is a key competency for safe practice. The Canadian English Language Benchmark Assessment for Nurses (CELBAN) is Canada’s only Canadian Language Benchmarks (CLB) referenced examination used in the context of healthcare regulation. This high-stakes assessment claims proof of proficiency for IENs seeking licensure in Canada and a measure of public safety for nursing regulators. Understanding the quality of rater performance when examination results are used for high-stakes decisions is crucial to maintaining speaking test quality as it involves judgement, and thus requires strong reliability evidence (Koizumi et al., 2017). This study examined rater performance on the CELBAN Speaking component using a Many-Facets Rasch Measurement (MFRM). Specifically, this study identified CELBAN rater reliability in terms of consistency and severity, rating bias, and use of rating scale. The study was based on a sample of 115 raters across eight test sites in Canada and results on 2698 examinations across four parallel versions. Findings demonstrated relatively high inter-rater reliability and intra-rater reliability, and that CLB-based speaking descriptors (CLB 6-9) provided sufficient information for raters to discriminate examinees’ oral proficiency. There was no influence of test site or test version, offering validity evidence to support test use for high-stakes purposes. Grammar, among the eight speaking criteria, was identified as the most difficult criterion on the scale, and the one demonstrating most rater bias. This study highlights the value of MFRM analysis in rater performance research with implications for rater training. This study is one of the first research studies using MFRM with a CLB-referenced high-stakes assessment within the Canadian context.

How to Cite

Wang, P., Coetzee, K. ., Strachan, A., Monteiro, S., & Cheng, L. . (2020). Examining Rater Performance on the CELBAN Speaking: A Many-Facets Rasch Measurement Analysis. Canadian Journal of Applied Linguistics, 23(2), 73–95. https://doi.org/10.37213/cjal.2020.30436

This work is licensed under a Creative Commons Attribution 4.0 International License.