Hattie's Claims

1. Move from "what works" to "what works best".

Hattie emphasised that VL was NOT a "what works" recipe, as almost everything works,
"When teachers claim that they are having a positive effect on achievement or when a policy improves achievement, this is almost always a trivial claim: Virtually everything works. One only needs a pulse and we can improve achievement." (VL, p. 16).
Hattie argues that because "everything works" we then need to focus on "what works best",
"The major message is that we need a barometer of what works best..." (VL, preface)
"One aim of this book is to develop an explanatory story about the key influences on student learning - it is certainly not to build another 'what works' recipe." (VL, p. 6). 
"Instead of asking 'What works?' we should be asking 'What works best?'" (VL, p. 18)
Nielsen & Klitmøller (2017) report that Hattie's claim "Virtually everything works" is easily challenged by Hattie's own prime Feedback study, Kluger & DeNisi (1996) who report that 32% of feedback Does NOT work!

Marzano (2009) also notes this and goes further,
"I’ve observed the same phenomenon with virtually every strategy and every innovation I’ve examined. I’ve come to the conclusion that you can expect anywhere from 20% to 40% of the studies in any given area to report negative results." (p. 34)
Kraft (2023) in "Education Interventions Often Fail", also contradicts Hattie's claim "everything works", full tweet here:



Evans & Yuan (2022) using low- and middle-income countries get similar results to Kraft,
"We identify a median effect size of 0.10 standard deviations on learning and 0.07 standard deviations on access among randomized controlled trials."
Lortie-Forgues & Inglis (2019) also found a low average effect size of 0.06 with a large confidence width of 0.30 after analyzing 141 large-scale RCT,

Hattie recently has retreated a little from his original claim,
"Most things that a teacher could do in a classroom 'sorta' work..." (Hattie & Hamilton (2020), p. 3)

Then in 2021 retreated further,

"The same strategy that works today, does not work tomorrow." (@37.40min here)

2. Focus on Student Achievement.

Hattie Stated,
"Of course, there are many outcomes of schooling, such as attitudes, physical outcomes, belongingness, respect, citizenship, and the love of learning. This book focuses on student achievement, and that is a limitation of this review" (VL, p. 6).
The Peer Review shows that Hattie included many studies which used measures other than achievement, e.g., IQ, hyperactivity, engagement & behavior.

3. Remove the Complexity of Teaching by Grouping Studies Into Individual Categories Called "Influences", e.g., Feedback.

Nielsen & Klitmøller (2017) discuss in detail that studies may be measuring the combination of many influences. For example, using class size, how do you remove other influences from the study? For example, time on task, motivation, behaviour, teacher subject knowledge, feedback, home life, welfare, etc.

However, Hattie claimed,
"It is unlikely that many of the effects reported in this book are additive" (VL, p. 256).
4. Use Meta-analyses That Contain Multiple Studies For a Particular Influence.

Hattie stated,
"This book is based on a synthesis (a method referred to by some as meta-meta-analysis) of more than 800 meta-analyses about influences on learning that have now been completed, including many recent ones. It will develop a method such that the various innovations in these meta-analyses can be ranked from very positive to very negative effects on student achievement." (VL, p. 3)
5. Find the Effect Size (ES) Calculated From Student Achievement Tests.

Again Hattie stated, 
"Would it not be wonderful if we could create a single continuum of achievement effects, and locate all possible influences of achievement on this continuum?
...Influences on the left of this continuum are those that decrease achievement, and those on the right increase achievement. Those near the zero point have no influence on achievement outcomes. 
The next task was to adopt an appropriate scale so that as many outcomes as possible from thousands of studies are converted to this single scale. This was accomplished using effect sizes." (VL, p. 7)
Wecker et al. (2017, p. 28) report that, Hattie mistakenly included studies that do not measure academic performance.

Also, the Peer Review shows a teacher-designed test can have an effect size of over 4 times LARGER than a standardized test, so comparing effect sizes, as Hattie does, without taking this into account is totally misleading.

The huge difference in standardized vs teacher test effect sizes accounts for Systemic aspects, e.g., class size, uniform, summer school, etc., which use standardized tests, and have lower effect sizes compared to aspects like feedback, which use teacher-designed tests.

6. Use Studies On Real Students In Real Classrooms.

Hattie claimed,
"Nearly all studies in the book are based on real students in front of real teachers in real schools." (VL, preface p. ix)
Hattie also emphasized this in many of his presentations, e.g., 2007 ResearchEd in Melbourne,
"nearly all of it is based on what happens in regular classrooms by regular teachers... 99.+% is based on classrooms run by ordinary teachers, not like in Psychology where they use under-graduate students, they bring in outsiders and this kinda stuff." (@9mins)
This is clearly not the case as a detailed look at each study shows that Hattie used many studies on university students and on adults.

Hattie's recent 2020 paper "Feedback Revisited" is a quiet admission of this error as he details the removal of most of the original 23 studies on Feedback, as they were NOT about real kids in real classrooms.

7. Average ALL the ES Reported in Each Meta-analyses for a Particular Influence

Hattie stated,
"As an example of synthesizing meta-analyses, take an examination of five meta-analyses on homework: Cooper (1989; 1994); Cooper, Robinson, & Patall (2006); DeBaz (1994); Paschal, Weinstein, & Walberg (1984). Over these five meta-analyses there were 161 studies involving more than 100,000 students, which investigated the effects of homework on students' achievement. The average of all these effect sizes was d = 0.29, which can be used as the best typical effect size of the influence of homework on achievement. Thus, compared to classes without homework, the use of homework was associated with advancing children's achievement by approximately one year." (VL, p. 8)
Terry Wrigley (2018) warned about Hattie's averaging,
"What now stands proxy for a breadth of evidence is statistical averaging. This mathematical abstraction neglects the contribution of the practitioner’s accumulated experience, a sense of the students’ needs and wishes, and an understanding of social and cultural context... 
When ‘evidence’ is reduced to a mean effect size, the individual person or event is shut out, complexity is lost and values are erased" (p. 360).


The picture represents the problem well and confirms what teachers have been saying for a long time, e.g., Goldacre (2008) on meta-analysis in education,
"I think you’ll find it’s a bit more complicated than that."
8. This ONE Average ES Then Represents That Influence and Is Used to Derive a ranking from the #1 "what works best" strategy down to #138.

There is often wild variation in the results of the different studies. Averaging all these into one number as Hattie does loses the meaning of the original studies.

Wrigley (2018) quoting Gene Glass warns us about this,
"Indeed, Gene Glass, who originated the idea of meta-analysis, issued this sharp warning about heterogeneity: 'Our biggest challenge is to tame the wild variation in our findings not by decreeing this or that set of standard protocols but by describing and accounting for the variability in our findings. The result of a meta-analysis should never be an average; it should be a graph.'(Robinson, 2004: 29)" (p. 367).
Dr. Jim Thornton Professor of Obstetrics and Gynaecology,
"To a medical researcher, it seems bonkers that Hattie combines all studies of the same intervention into a single effect size."
9. The #1 Was "Student Self Report Grades" But Has Changed to "Collective Teacher Efficacy".

10. The Average ES for All of These Meta-analyses is 0.40 and This Hinge Point Represents One Year's Growth for a Student.

Hattie makes several claims about an ES = 0.40,
"We can set benchmarks of what progress looks like (preferably d=0.40 for every student, at least d=0.30, and certainly not less than d=0.20) per implementation or year." (VL, p. 240) 
"...the use of the "h-point" (d = 0.40) to demarcate the expected value of any innovations in schools is critical. Rather than using the zero point, which is hardly worthwhile, the standards for minimal success in schools should be more like d = 0.40." (VL, p.249) 
"The d = 0.40 is what I referred to in Visible Learning as the hinge-point (or h-point) for identifying what is and what is not effective." (Hattie, 2012, p.3) 
"d = 0.4 is what we can expect as growth per year on average." (Hattie, 2012, p. 14) & (Hattie presentation Melbourne Graduate School, 2011, @21minutes).
11. Use a barometer to Represent the Range of ES, Showing Influences Above 0.40 are "good" and those Below 0.40 "don't matter much".


Hattie explains,
'We need a barometer that addresses whether the various teaching methods, school reforms, and so on are worthwhile relative to possible alternatives. We need clear goalposts of excellence for all in our schools to aspire towards, and most importantly, for them to know when they get there. We need a barometer of success that helps teachers to understand which attributes of schooling assist students in attaining these goalposts.
For each of the many attributes investigated in the chapters in this book, the average of each influence is indexed by an arrow through one of the zones on the barometer. All influences above the h-point (d = 0.40) are labeled in the "Zone of desired effects" as these are the influences that have the greatest impact on student achievement outcomes.' (VL, p. 19)

No comments:

Post a Comment