Home Articles Abstract
Research Article

An Analytical Review of Inter-Rater Reliability and Agreement

Cha, Jongseok · Kim, Yeongbae

Published: January 1994 · Vol. 23, No. 특별 · pp. 75-102
Full Text

Abstract

This study analyzed reliability, an important indicator in social science methodology. Reliability is classified into inter-observer reliability, measurement instrument reliability, and generalizability reliability. This study focuses on inter-rater reliability and inter-rater agreement, which fall under inter-observer reliability. Inter-rater reliability is used as an important indicator for verifying data objectivity in climate research, leadership research, job analysis research, and personnel appraisal and performance evaluation research. Various inter-rater reliability indices have been used in existing studies, and inter-rater reliability and inter-rater agreement need to be distinguished based on research purposes. Inter-rater reliability represents consistency, referring to the degree of correlation or proportional relationship among raters' evaluations. Inter-rater agreement represents consensus, referring to the extent to which raters assign identical evaluations to a given target. Among the indices measuring inter-rater reliability, the ICC has been evaluated as the best indicator, while r_(wg) has been assessed as the most reasonable measure of inter-rater agreement. To use these indices accurately, one must consider the number of evaluation targets, the unit of analysis, single-item versus multi-item measures, and one-way versus two-way analysis of variance. Based on a review of existing studies, in climate research, an ICC(1) value of 0.20 or above and an ICC(2) value of 0.60 or above are generally considered satisfactory levels. For r_(wg), a value of 0.80 or above is considered satisfactory. However, in other research domains, it is difficult to propose benchmark values due to the scarcity of studies employing reliability indices. Based on these analytical findings, theoretical and practical implications were presented, and important factors influencing evaluation and directions for future research were discussed.