Abstract:
The study explores the evaluation criteria of individual and groups of raters as well as
computes their inter rater and intra rater reliability on a national level high stakes
examination in the Punjab province of Pakistan. The study thus looks at the process
(the evaluation criteria) and the product (actual scores).Out of a total of nine
examination Boards in the Punjab, three are in the South Punjab (SP). 34 raters from
Multan, and 30 from DG Khan and Bahawalpur Board each totaling 94 markers,
evaluated 20 essays twice after a gap of at least six weeks. The raters were not
provided any rating scale to evaluate the essay since none is provided by the Boards
they work with. They were asked to evaluate the essays as they did in the real in the
examination centers and give an overall score to each essay. Only the seasoned raters
with a minimum of ten years experience were recruited for this study to control for the
‘experience’ variable. There were 5 essays on each of the four topics (A Picnic Party,
Patriotism, Co-education and Science). The essay topics were selected from the
previous examination questions.
After all the markers (n=94) had examined the same essay set (n=20) the second time
(Time 2) they were asked to write a short written commentary to explain their
evaluation criteria .The quantity of the response varied among individual and groups.
Out of 473 paragraphs, 265 were written by males and 208 by female respondents.
Following this task, 20 raters were interviewed using semi-structured format to crosscheck
and better understand the findings from the qualitative as well as quantitative
data.