Internationally educated nurses’ (IENs) English language proficiency is critical to professional licensure as communication is a key competency for safe practice. The Canadian English Language Benchmark Assessment for Nurses (CELBAN) is Canada’s only Canadian Language Benchmarks (CLB) referenced examination used in the context of healthcare regulation. This high-stakes assessment claims proof of proficiency for IENs seeking licensure in Canada and a measure of public safety for nursing regulators. Understanding the quality of rater performance when examination results are used for high-stakes decisions is crucial to maintaining speaking test quality as it involves judgement, and thus requires strong reliability evidence (Koizumi et al., 2017). This study examined rater performance on the CELBAN Speaking component using a Many-Facets Rasch Measurement (MFRM). Specifically, this study identified CELBAN rater reliability in terms of consistency and severity, rating bias, and use of rating scale. The study was based on a sample of 115 raters across eight test sites in Canada and results on 2698 examinations across four parallel versions. Findings demonstrated relatively high inter-rater reliability and intra-rater reliability, and that CLB-based speaking descriptors (CLB 6-9) provided sufficient information for raters to discriminate examinees’ oral proficiency. There was no influence of test site or test version, offering validity evidence to support test use for high-stakes purposes. Grammar, among the eight speaking criteria, was identified as the most difficult criterion on the scale, and the one demonstrating most rater bias. This study highlights the value of MFRM analysis in rater performance research with implications for rater training. This study is one of the first research studies using MFRM with a CLB-referenced high-stakes assessment within the Canadian context.