Monday, 18 January 2016

Peer Reviews of Hattie's Visible Learning (VL).

"Our discipline needs to be saturated with critique of ideas; and it should be welcomed. Every paradigm or set of conjectures should be tested to destruction and its authors, adherents, and users of the ideas should face public accountability." (Hattie, 2017, p. 428).
The peer reviews are saturated with detailed critiques of Hattie's work but most educators do not seem to be aware of them.

My Aim

is to raise awareness of these critiques and investigate Hattie's claims in the spirit of Tom Bennett the founder of researchEd,
"There exists a good deal of poor, misleading or simply deceptive research in the ecosystem of school debate...
Where research contradicts the prevailing experiential wisdom of the practitioner, that needs to be accounted for, to the detriment of neither but for the ultimate benefit of the student or educator." (Bennett, 2016, p. 9).
The pages (right) reference over 50 peer reviews which detail a litany of major errors in VL.

The Peer Review

Has documented significant issues with Hattie's work ranging from flawed methodology, calculation errors, misrepresentation, questionable interpretation to conflicts of interest, e.g.,

Snook, Clark, Harker, O’Neill & O’Neill (2010) - "Potentially misleading."
Terhardt (2011) - is suspicious of Hattie's economic interests.
Topphol (2011) - "...the mistake is pervasive, systematic and so clear that it should be easy to reveal in the publishing process. It has not been... this suggests failure in quality assurance. Is this symptomatic of what is coming from this author..? I don't hope so, but I can't be sure."
Berk (2011) - "Statistical malpractice disguised as statistical razzle-dazzle."
Higgins & Simpson (2011) - "the process by which this number (effect size) has been derived has rendered it effectively meaningless." & Hattie has mixed-up the X/Y axis on his Funnel plot graph.
O'Neill (2012) - Hattie is a Policy Entrepreneur, he positions himself politically to champion, shape and benefit from school reform discourses.

Schulmeister & Loviscach (2014) - "Hattie pulls the wool over his audience’s eyes." & "Hattie’s method to compute the standard error of the averaged effect size as the mean of the individual standard errors ‒ if these are known at all ‒ is statistical nonsense."
Poulsen (2014) - "Do I believe in Hattie's results? No!"
Wrigley (2015) - "Bullying by Numbers."
O'Neill, Duffy & Fernando (2016) - Detail the huge undisclosed 3rd party payments to Hattie.
Wecker et al. (2016) - "A large proportion of the findings are subject to reasonable doubt."

Bergeron & Rivard (2017) - "Pseudo-Science... House of Cards."
Nilholm (2017) - "Hattie's analyzes need to be redone from the ground up."
Nielsen & Klitmøller (2017) - "Neither consistent nor systematic."
Shannahan (2017) - "potentially misleading."
See (2017) - "Lives may be damaged and opportunities lost."
Biesta (2017) - "more akin to pig farming than science."

Eacott (2018) - "A cult... a tragedy for Australian School Leadership."
Slavin (2018) - "Hattie is wrong."
McKnight & Whitburn (2018) - "The Visible Learning cult is not about teachers and students, but the Visible Learning brand."
Ashman (2018b) - "If true randomised controlled trials can generate misleading effect sizes like this, then what monsters wait under the bed of the meta-meta-analysis conducted by Hattie and the EEF?"
Janson (2018) - "little value can be attached to his findings."

Larsen (2019) - "Blindness."
Wiliam (2019) - "Has absolutely no role in educational policy making." 
Wiliam (2019b) - "Meta-meta-analyses, the kinds of things that Hattie & Marzano have done, I think have ZERO educational value!"
Simpson (2011, 2017, 2018, 2019) - "using these ranked meta-meta-analyses to drive educational policy is misguided."
Bakker et al. (2019) - "his lists of effect sizes ignore these points and are therefore misleading."
Zhao, Yong (2019) - "Hattie is the king of the misuse of effect sizes."

Slavin, Robert (2020) - "the value of a category of educational programs cannot be determined by its average effects on achievement. Rather, the value of the category should depend on the effectiveness of its best, replicated, and replicable examples."

Gorard et al. (2020) School decision‐makers around the world have been increasingly influenced by hyper‐analyses of prior evidence which synthesise the results of many meta‐analyses —such as those by Hattie (2008), described on its cover as revealing 'teaching’s Holy Grail', and similar attempts around the world. These are even more problematic because again they are combining very different kinds of studies, taking no account of their quality, or of the quality of the studies making up each meta‐analysis. Commentators are now realising and warning of their dangers..."

Kraft (2020) - "Effect sizes that are equal in magnitude are rarely equal in importance."
Wiliam (2020) - "There is no reason to trust any of the numbers in Visible Learning."
Wolf et al. (2020) - Effect sizes conducted by a program's developers are 80% larger than those done by independent evaluators (0.31 vs 0.14) with ~66% of the difference attributable to publication bias.

Slavin, Robert (2020b) - "the overall mean impacts reported by meta-analyses in education depend on how stringent the inclusion standards were, not how effective the interventions truly were."

Simpson (2021) - "...despite Cohen’s nomenclature, 'effect size' does not measure the size of an effect as needed for policy... Choice of sample, comparison treatment and measure can impact ES; at the extreme, educationally trivial interventions can have infinite ES..."
Wiliam (2021) - "we can discuss why those numbers in John Hattie’s Visible learning are just nonsense".
Nielsen & Klitmøller (2021) -  "by analyzing parts of the primary research and the meta-analysis upon which Hattie grounds his conclusions, we find both serious methodological challenges and validity problems."
Ashman (2021) - The Education Endowment Foundation's Toolkit is a complete mess.

Thomas Aastrup Rømer (2018) received the prestigious Nordic Educational Research Association, Ahlström Award (2019). For "Criticism of John Hattie's theory of Visible Learning". The Association states,
"...the paper makes a precise and subtle critique of Hattie‘s work, hence revealing several weaknesses in the methods and theoretical frameworks used by Hattie. Rømer and his critical contribution inform us that we should never take educational theories for granted; rather, educational theories should always be made subject to further research and debate."
Hattie's Contention in VL

Hattie claimed that complex educational influences could be isolated, measured & summarised with a statistic, the effect size (d), and then ranked, to determine "What Works Best" & "Know Thy Impact".

Also, he claimed that his effect sizes measured student achievement, however, the Peer Review shows that Hattie included many unrelated categories, e.g., IQ, hyperactivity & behavior. 

The Peer Review also shows a teacher designed test can have an effect size of over 4 times LARGER than a standardised test, so comparing effect sizes, as Hattie does, without taking this into account is totally misleading.

The huge difference in standardised vs teacher test effect sizes, account for Systemic aspects, e.g., class size, uniform, summer school, etc., which use standardised tests, and have lower effect sizes compared to aspects like, feedback, which use teacher designed tests.

Hattie emphasised that VL was NOT a "what works" recipe, as almost everything works - so all teachers need is a pulse!

But, this is easily challenged, e.g., the prime study that Hattie used for 'feedback' is Kluger & DeNisi (1996) who report that, 38% of feedback strategies don't work!

Marzano (2009) also notes this and goes further,
"I’ve observed the same phenomenon with virtually every strategy and every innovation I’ve examined. I’ve come to the conclusion that you can expect anywhere from 20% to 40% of the studies in any given area to report negative results." (p. 34)
Hattie goes on to argue that because "everything works" we then need to focus on "what works best", and the Effect Size determines this,
"The major message is that we need a barometer of what works best..." (VL, preface)
"One aim of this book is to develop an explanatory story about the key influences on student learning - it is certainly not to build another 'what works' recipe." (VL, p. 6). 
"When teachers claim that they are having a positive effect on achievement or when a policy improves achievement, this is almost always a trivial claim: Virtually everything works. One only needs a pulse and we can improve achievement." (VL p. 16)
 "Instead of asking 'What works?' we should be asking 'What works best?'" (VL, p. 18)
There are also serious contradictory statements. On the one hand, its a book about 'what works best'. Some influences systemic, e.g., uniform, but most are in the classroom, e.g. feedback, self report, etc., yet, in his preface he warns,
"It is not a book about classroom life, and does not speak to the nuances and details of what happens within classrooms."
Perhaps that accounts for Hattie's uncertain conclusion, 
"The model I will present... may well be speculative, but it aims to provide high levels of explanation for the many influences on student achievement as well as offer a platform to compare these influences in a meaningful way... I must emphasise that these ideas are clearly speculative" (VL, p. 4).
Hattie & Hamilton (2020) now give an even more ambiguous picture,
"Most things that a teacher could do in a classroom 'sorta' work..." (p. 3) 
Hattie's Defenses

Hattie consistently claims there is no critique of his work and no mistakes have been found, e.g., 
"What I find fascinating is that since I first published this back in the 1990s, no one has come up with a better explanation for the data... 
I am updating the meta-analysis all the time; I am up to 1400 now. I do that because I want to be the first to discover the error, the mistake." (Knudsen, 2017, p. 7).
Wrigley (2015) called this defense by Hattie as,
"Bullying by numbers"
Also, I find Hattie's claim hard to reconcile since many of the Peer Reviews, published significant issues & errors pre - 2017, e.g., Snook et al., (2009), Topphol (2011) & Higgins & Simpson (2011).

Then, Hattie often agreed with this critique saying, "Yes you must be sensitive to that" but then not address the issue at all, e.g., a common response from the Peer Reviewers, Prof Scott Eacott (2018),
 "Disappointingly, Hattie's response was in my opinion, inadequate" (p. 4).
Then, in an interview with Ollie Lovell (June 2018), Hattie did a complete back-flip with his mantra "the story, the story, the story", saying (what the peer review has been saying since 2009), that the numbers and rankings are too simplistic! 
"What's the story, not what's the numbers..."
"that’s why this will keep me in business to keep telling the story..." (Audio here).
Hattie then admits his rankings are misleading and does not rank anymore! (Audio here).
"it worked then it got misleading so I stopped it"
My observation is that Hattie tends to jump between his rankings and 'The Story, Story, Story' narrative, depending on who he is talking to.

An example, a few months after the Lovell interview, Hattie continued to rank and mislead in his series of  webinars in USA:

Does Hattie faithfully represent the research?

Most people assume he does, but a brief look at Hattie's representation of the Class Size research should raise some questions! (more details on page links on the right menu).

In 2005, Hattie got the attention of educational administrators by labelling 'reducing class size' a disaster then later as going backwards (2005 ACER Lecture & VL, p. 250). He continued with Pearson (2015) naming 'reducing class size' as one of the major distractions! Then again, in the TV series Revolution Schoolclaiming that, reducing class size does not make a difference to the quality of education!

The major class size study that Hattie used was by Glass & Smith (1979), they summarise their data in a graph and table:

The trend and the difference between good and poor quality research are clearly displayed. Glass & Smith conclude (p. 15),
"The curve for the well-controlled studies then, is probably the best representation of the class-size and achievement relationship...
A clear and strong relationship between class size and achievement has emerged... There is little doubt, that other things being equal, more is learned in smaller classes."
Hattie never mentions this "Story" and he consistently reports just the ONE average, e.g., Hattie stated,
"Glass and Smith (1979) reported an average effect of 0.09 based on 77 studies..." Blatchford (2016, p. 106)
I also contacted Prof Glass to ensure I interpreted his study correctly, he kindly replied,
"Averaging class size reduction effects over a range of reductions makes no sense to me. 
It's the curve that counts. 
Reductions from 40 to 30 bring about negligible achievement effects. From 20 to 10 is a different story.  
But Teacher Workload and its relationship to class size is what counts in my book."
Bergeron & Rivard (2017) reiterate,
"Hattie computes averages that do not make any sense."
Dylan Wiliam (2016b) concludes that, 
"…right now meta‐analysis is simply not a suitable technique for summarizing the relative effectiveness of different approaches to improving student learning..."
Wrigley (2015) in Bullying by Numbers, critiquing the EEF in particular but also Hattie,
"Teachers need to speak back to power, and one useful tool is to point to flaws in the use of data" (p. 3).
"Bullying by numbers has a restrictive effect on education, leads to superficial learning, and is seriously damaging teachers’ lives" (p. 6).
This Blog:

is broken up into different pages (menu on the right) designed so you can easily go to what interests you most.

A critique of Hattie's methodology - Effect Size, Student Achievement, CLE and other errors and A Year's Progress???

Then an analysis of particular influences. I would recommend starting with what was his highest ranked influence Self Report Grades and then look at the controversial Class Size.

What does ONE average mean? (no pun intended).

The easiest critique of Hattie's work to understand, is his averaging of a whole range of disparate meta-analyses into ONE effect size. 

This ONE average is contradictory to his latest mantra "The Story, the Story, the Story."

Class Size, above, is one example and another illustrative example is Feedback.

Dr. Jim Thornton Professor of Obstetrics and Gynaecology,
"To a medical researcher, it seems bonkers that Hattie combines all studies of the same intervention into a single effect size."
Terry Wrigley (2018) had already warned,
"What now stands proxy for a breadth of evidence is statistical averaging. This mathematical abstraction neglects the contribution of the practitioner’s accumulated experience, a sense of the students’ needs and wishes, and an understanding of social and cultural context... 
When ‘evidence’ is reduced to a mean effect size, the individual person or event is shut out, complexity is lost and values are erased" (p. 360).

The picture represents the problem well and confirms what teachers have been saying for a long time, e.g., Goldacre (2008) on meta-analysis in education,
"I think you’ll find it’s a bit more complicated than that."

Where To From Here?

The conflicts of interest by the major players such as Hattie and Marzano, are now too big to ignore - see here.

Teachers need to understand the basics of these research methods and Teacher Unions, who have the resources to independently assess evidence, should provide critical awareness, summaries and training for teachers, with a focus on QUALITY evidence.

Wrigley (2018) quoting Gene Glass, suggests a start,

"Indeed, Gene Glass, who originated the idea of meta-analysis, issued this sharp warning about heterogeneity: 'Our biggest challenge is to tame the wild variation in our findings not by decreeing this or that set of standard protocols but by describing and accounting for the variability in our findings. The result of a meta-analysis should never be an average; it should be a graph.'(Robinson, 2004: 29)" (p. 367).