Self Report Grades

The highest ranked influence, in the original Visible Learning, was Self-report grades with d=1.44 (Hattie's Rank=1). 

Hattie interprets this as advancing student achievement by 3+ years!

This is an AMAZING claim if true: that merely predicting your grade, somehow magically 'influences' your achievement to that extent.

With that sort of power, why aren't most schools doing this?


I hope my beloved 'under-achieving' Australian football team – The St Kilda Saints are listening (the studies were also on athletic performance),
'Boys you can make the finals next year just by predicting you will - you don't need to do all that hard training!'
Hattie used these 5 meta-analyses to get an average of d = 1.44. The Kuncel study is 1 meta-analysis but Hattie divided it into 2, which is unusual.

The studies are correlation studies and not true experiments as discussed in calculating effect sizes.


Authors/Self-report gradesYear#studiesstudentsMean (d)CLEVariable
Mabe & West198235135650.9365%self evaluation
Falchikov & Boud19895753320.4733%College
Ross1998111.63115%second language
Falchikov & Goldfinch20004842711.91135%College
Kuncel, Crede & Thomas200529562653.1219%GPA
Kuncel, Crede & Thomas2005290.642%self evaluation

Dr. Kristen Dicerbo looked at each of these studies here. A summary of what she found:
'Mabe & West (1982) This is a review of 55 studies. It is framed around the idea of understanding the validity of self-evaluation by correlating self-given grades to other grades. Are the self-given grades accurate? Are they related to more objective measures of achievement? They did NOT investigate whether changing the self-evaluation influences achievement. The average correlation between self-evaluation and achievement was .29, although across studies it ranged from -.26 to .80. The authors identify a number of ways to make the self-evaluations more accurate. 
Fachikov & Boud (1989) This review examined 57 studies. They employed the commonly-used effect size measure. However, again, there weren’t control and experimental groups or studies about the effect of changing someone’s self-grade/self-expectation. Rather, the self-grade was coded as the experimental group and teacher grade as the control. The mean effect size is .47, with a range from -.62 to 1.42 across studies. They also report the mean correlation between self-graded and other-graded was .39.
Ross (1998) This study examines self-assessment in the context of whether a self-assessment can be used for placement in language classes as opposed to giving placement tests. They report the correlation between self-report and objective scores across 60 studies reviewed as .63. They then report an effect size, but it is an effect size for the correlation coefficient, not the traditional meta-analysis effect size that compares a control and experimental group. This effect size (g) is 1.63. Again, they don’t compare doing it versus not or any effect on achievement. 
Falchikov & Goldfinch (2000) This study is actually about the relationship between peer grades and teacher grades. The overall correlation was .69, with a range from .14 to .99. Regardless, this study does not seem to fall into the category of self-assessment. 
Kuncel, Crede, & Thomas (2005) This paper again looked at the reliability and validity of self-assessment BUT just by looking at whether the GPA and SAT scores that students were reporting were their real scores. In other words, this isn’t even really a judgment of their own expectations, but whether they remember and accurately report known scores. They compare reported to actual results from 37 different samples. So, sure the effect size for reported versus actual GPA was 1.38, but that just means college students can pretty accurately report their already known GPAs. Interestingly, they were quite poor at reporting SAT scores, with effect sizes of .33 for Verbal and .12 for Math.'
I've read each of the meta-analyses and confirm Dr. Dicerbo's analysis.

Kuncel (2005) measured the student’s memory of their GPA score from a year or so previously; which is a measure of memory or honesty; not of students' predicting their future scores.

 Kuncel et al (2005, p64) state their aim:
'Since it is often difficult to get results transcripts of student PREVIOUS GPA’s from High School or College, the aim of this study is to see whether self-reported grades can be used as a substitute. This obviously has time-saving administration advantages.'
Falchikov & Goldfinch (2000, p288),
'We conceive of the present study as an investigation of the validity of peer marking.'
Mabe and West (1982, p281),
'The intent of this review is to develop general conclusions about the validity of self-evaluation of ability.'
Note: they measured over 20 different categories of achievement from scholastic, athletic, managerial to practical skills.

Falchikov and Boud (1989, p396) compared staff with student self-marking as the experimental group. They conclude (p420),
'most studies found positive effect sizes, indicating overrating on the part of student markers.'
Dicerbo concludes,
'It is clear that these studies only show that there is a correlation between students’ expected grades and their actual grades. That’s it (and sometimes not even that). They just say kids are pretty good judges of their current levels. They do not say anything about how to improve achievement. These studies are not intervention studies. In fact, if I were looking at all the studies about “influences” on achievement, I would not include this line of research. It is not about influencing achievement ...
In later work, Hattie is calling this finding self-expectation. The self-grading research seems to have gotten turned into the idea that these studies imply we should help kids prove those expectations wrong or that raising kids’ expectations will raise their achievement ...
That is not what the studies that produced the 1.44 effect size studied. They looked at the correlation of self-report to actual grades, often in the context of whether self-report could be substituted for other kinds of assessment. None of them studied the effect of changing those self-reports. As we all know, correlation does not imply causation. This research does not imply that self-expectations cause grades.'
Ironically, Kuncel et al (2005, p70), warn about misinterpretation of their work; 
'Precise coding and combination of data are critical for the production of a meta-analysis. If data examining fundamentally different samples or variables are unintentionally combined, it may jeopardise the findings. The result would be a mixing of potentially different studies that could yield an uninterpretable blend. Stated simply, this is the old debate about comparing oranges versus apples.'
I contacted Professor Kuncel to make sure I interpreted his study correctly, he replied that the conclusion of the study was:
'Generally, people exaggerate their accomplishments.'
Another major issue is the Ross (1998) study uses English Second Language students, a very small or abnormal subset of the total student population. The page on Effect Size goes into detail about how abnormal populations give rise to larger effect sizes. So inferences should NOT be made about the general student population.


Professor Pierre-Jérôme Bergeron (2017) insightfully identifies the overriding problem here,
'in addition to mixing multiple and incompatible dimensions, Hattie confounds two distinct populations: 
1) factors that influence academic success and 
2) studies conducted on these factors.'
The studies about self- report are clearly NOT about influencing academic success.

Bergeron (2017) continues,
'It is also with correlations that he obtains the so-called effect of self-reported grades, the strongest effect in the original version of Visible Learning. However, this turns out to be a set of correlations between reported grades and actual grades, a set which does not measure whatsoever the increase of academic success between groups who use self-reported grades and groups who do not conduct this type of self-examination.'
Professor Bergeron also warns of the conversion of correlation to an effect size - see Effect Size.


Schulmeister & Loviscach (2014) Critical comments on the study "Making learning visible" (Visible Learning).


Confirm Bergeron's concerns over the use of correlation and the conversion to an effect size.

Also, they note the value of d = 0.60 cited by Hattie from the Kuncel, Crede & Thomas study, cannot be found in the study nor reconstructed from the study (p6).

Professor Ivo Arnold
'The paper [Kuncel (2005)] should not have been included in the analysis. This example does raise questions regarding the remaining average effect sizes.' (p220).


Click here for other blogs commenting on this influence.

No comments:

Post a Comment