Monday, 18 January 2016

Peer Reviews of Hattie's Visible Learning (VL).

"Our discipline needs to be saturated with critique of ideas; and it should be welcomed. Every paradigm or set of conjectures should be tested to destruction and its authors, adherents, and users of the ideas should face public accountability." (Hattie, 2017, p. 428).
The peer reviews are saturated with detailed critiques of Hattie's work but most educators do not seem to be aware of them.

My Aim

is to raise awareness of these critiques and investigate Hattie's claims in the spirit of Tom Bennett the founder of researchEd,
"There exists a good deal of poor, misleading or simply deceptive research in the ecosystem of school debate...
Where research contradicts the prevailing experiential wisdom of the practitioner, that needs to be accounted for, to the detriment of neither but for the ultimate benefit of the student or educator." (Bennett, 2016, p. 9).
The pages (right) reference over 50 peer reviews which detail a litany of major errors in VL.

The Peer Review

Has documented significant issues with Hattie's work ranging from flawed methodology, calculation errors, misrepresentation, questionable interpretation to conflicts of interest, e.g.,

Snook, Clark, Harker, O’Neill & O’Neill (2009) - "Hattie says that he is not concerned with the quality of the research in the 800 studies but, of course, quality is everything. Any meta-analysis that does not exclude poor or inadequate studies is misleading, and potentially damaging if it leads to ill-advised policy developments."
Terhardt (2011) - is suspicious of Hattie's economic interests.
Topphol (2011) - "...the mistake is pervasive, systematic and so clear that it should be easy to reveal in the publishing process. It has not been... this suggests failure in quality assurance. Is this symptomatic of what is coming from this author..? I don't hope so, but I can't be sure."
Berk (2011) - "Statistical malpractice disguised as statistical razzle-dazzle."
Higgins & Simpson (2011) - "the process by which this number (effect size) has been derived has rendered it effectively meaningless." & Hattie has mixed-up the X/Y axis on his Funnel plot graph.

O'Neill (2012) - Hattie is a Policy Entrepreneur, he positions himself politically to champion, shape and benefit from school reform discourses.
Lind (2013) - "Hattie synthesis are shortsighted and its conclusions problematic."

Schulmeister & Loviscach (2014) - "Hattie pulls the wool over his audience’s eyes." & "Hattie’s method to compute the standard error of the averaged effect size as the mean of the individual standard errors ‒ if these are known at all ‒ is statistical nonsense."
Poulsen (2014) - "Do I believe in Hattie's results? No!"
Wrigley (2015) - "Bullying by Numbers."
O'Neill, Duffy & Fernando (2016) - Detail the huge undisclosed 3rd party payments to Hattie.
Wecker et al. (2016) - "A large proportion of the findings are subject to reasonable doubt."

Bergeron & Rivard (2017) - "Pseudo-Science... House of Cards."
Nilholm (2017) - "Hattie's analyzes need to be redone from the ground up."
Nielsen & Klitmøller (2017) - "Neither consistent nor systematic."
Shannahan (2017) - "potentially misleading."
See (2017) - "Lives may be damaged and opportunities lost."
Biesta (2017) - "more akin to pig farming than science."
Proulx (2017) - Hattie's collection of feedback studies are not consistent with Hattie's definition of feedback.
Proulx (2017) - Hattie claims, when teachers see learning through the eyes of the student is at the heart of the concept Visible Learning. But, that statement, found at the beginning of the book and which few can oppose, has no support in his research data. In a word, no meta-analysis focuses on this dimension.

Eacott (2018) - "A cult... a tragedy for Australian School Leadership."
Slavin (2018) - "Hattie is wrong."
McKnight & Whitburn (2018) - "The Visible Learning cult is not about teachers and students, but the Visible Learning brand."
Ashman (2018b) - "If true randomised controlled trials can generate misleading effect sizes like this, then what monsters wait under the bed of the meta-meta-analysis conducted by Hattie and the EEF?"
Janson (2018) - "little value can be attached to his findings."

Larsen (2019) - "Blindness."
Wiliam (2019) - "Has absolutely no role in educational policy making." 
Wiliam (2019b) - "Meta-meta-analyses, the kinds of things that Hattie & Marzano have done, I think have ZERO educational value!"
Simpson (2011, 2017, 2018, 2019) - "using these ranked meta-meta-analyses to drive educational policy is misguided."
Bakker et al. (2019) - "his lists of effect sizes ignore these points and are therefore misleading."
Zhao, Yong (2019) - "Hattie is the king of the misuse of effect sizes."

Slavin, Robert (2020) - "the value of a category of educational programs cannot be determined by its average effects on achievement. Rather, the value of the category should depend on the effectiveness of its best, replicated, and replicable examples."

Gorard et al. (2020) - "School decision‐makers around the world have been increasingly influenced by hyper‐analyses of prior evidence which synthesise the results of many meta‐analyses-such as those by Hattie (2008), described on its cover as revealing 'teaching’s Holy Grail', and similar attempts around the world. These are even more problematic because again they are combining very different kinds of studies, taking no account of their quality, or of the quality of the studies making up each meta‐analysis. Commentators are now realising and warning of their dangers"

Kraft (2020) - "Effect sizes that are equal in magnitude are rarely equal in importance."
Larsen & Hattie (2020) - "what I think is really misleading, and in the worst case wrong, science, if you reduce a complex phenomenon to a simplistic explanation and a colorful and seductive image."
Wiliam (2020) - "There is no reason to trust any of the numbers in Visible Learning."
Wolf et al. (2020) - Effect sizes conducted by a program's developers are 80% larger than those done by independent evaluators (0.31 vs 0.14) with ~66% of the difference attributable to publication bias.

Slavin (2020b) - "the overall mean impacts reported by meta-analyses in education depend on how stringent the inclusion standards were, not how effective the interventions truly were."

Simpson (2021) - "despite Cohen’s nomenclature, 'effect size' does not measure the size of an effect as needed for policy... Choice of sample, comparison treatment and measure can impact ES; at the extreme, educationally trivial interventions can have infinite ES..."
Wiliam (2021) - "we can discuss why those numbers in John Hattie’s Visible learning are just nonsense".
Nielsen & Klitmøller (2021) - "by analyzing parts of the primary research and the meta-analysis upon which Hattie grounds his conclusions, we find both serious methodological challenges and validity problems."
Ashman (2021) - The Education Endowment Foundation's Toolkit is a complete mess.
Sundar & Agarwal (2021) - "there are several statistical concerns with his calculation methods. We urge teachers to recognize that Hattie’s scores can not be equated to what a majority of the research community calculates and interprets as effect sizes."
Ktaft (2021) - "It is much easier to produce large improvements in teachers' self-efficacy than in the achievement of their students. In my view, this renders universal effect size benchmarks impractical."

Ashman (2022). "I no longer accept the validity of Hattie’s methods."

Thomas Aastrup Rømer (2018) received the prestigious Nordic Educational Research Association, Ahlström Award (2019). For "Criticism of John Hattie's theory of Visible Learning". The Association states,
"...the paper makes a precise and subtle critique of Hattie‘s work, hence revealing several weaknesses in the methods and theoretical frameworks used by Hattie. Rømer and his critical contribution inform us that we should never take educational theories for granted; rather, educational theories should always be made subject to further research and debate."
Hattie's major claim in VL - What Works Best

Hattie emphasised that VL was NOT a "what works" recipe, as almost everything works,
"When teachers claim that they are having a positive effect on achievement or when a policy improves achievement, this is almost always a trivial claim: Virtually everything works. One only needs a pulse and we can improve achievement." (VL p. 16).
Hattie argues that because "everything works" we then need to focus on "what works best",
"The major message is that we need a barometer of what works best..." (VL, preface)
"One aim of this book is to develop an explanatory story about the key influences on student learning - it is certainly not to build another 'what works' recipe." (VL, p. 6). 
"Instead of asking 'What works?' we should be asking 'What works best?'" (VL, p. 18)
Yet, Hattie's claim "Virtually everything works" is easily challenged by many of the studies Hattie cites, e.g., Hattie's prime Feedback study, Kluger & DeNisi (1996) report that OVER 38% of feedback Does NOT work! (Note: Hattie mistakenly reports 32% (VL, p. 175)).

Marzano (2009) also notes this and goes further,
"I’ve observed the same phenomenon with virtually every strategy and every innovation I’ve examined. I’ve come to the conclusion that you can expect anywhere from 20% to 40% of the studies in any given area to report negative results." (p. 34)
Hattie recently has retreated from this claim,
"Most things that a teacher could do in a classroom 'sorta' work..." (Hattie & Hamilton (2020), p. 3)
The Effect Size Determines What Works Best

Hattie in VL, then claimed that complex educational influences could be isolated and measured in terms of Student Achievement via a simple statistic, the effect size (d), and then compared & ranked, to determine "What Works Best".

Hattie also claimed that his effect sizes measured student achievement, however, the Peer Review shows that Hattie included many unrelated categories, e.g., IQ, hyperactivity, engagement & behavior. 

The Peer Review also shows a teacher designed test can have an effect size of over 4 times LARGER than a standardised test, so comparing effect sizes, as Hattie does, without taking this into account is totally misleading.

The huge difference in standardised vs teacher test effect sizes account for Systemic aspects, e.g., class size, uniform, summer school, etc., which use standardised tests, and have lower effect sizes compared to aspects like, feedback, which use teacher designed tests.


There are also significant contradictory statements. On the one hand, its a book about "what works best", yet, in his preface he warns,
"It is not a book about classroom life, and does not speak to the nuances and details of what happens within classrooms."
Perhaps that accounts for Hattie's uncertainty, 
"The model I will present... may well be speculative, but it aims to provide high levels of explanation for the many influences on student achievement as well as offer a platform to compare these influences in a meaningful way... I must emphasise that these ideas are clearly speculative" (VL, p. 4).
Hattie's Defenses

Hattie consistently claims that no mistakes have been found in his work, e.g., 
"What I find fascinating is that since I first published this back in the 1990s, no one has come up with a better explanation for the data... 
I am updating the meta-analysis all the time; I am up to 1400 now. I do that because I want to be the first to discover the error, the mistake." (Knudsen, 2017, p. 7).
Wrigley (2015) called this defense by Hattie as,
"Bullying by numbers"
Also, I find Hattie's claim hard to reconcile since many of the Peer Reviews published significant issues & errors pre - 2017, e.g., Snook et al. (2009), Topphol (2011) & Higgins & Simpson (2011).

Then, Hattie often agreed with the critique saying, "Yes you must be sensitive to that" but then not address the issue at all, e.g., a common response from the peer reviewers, e.g., Prof Scott Eacott (2018),
 "Disappointingly, Hattie's response was in my opinion, inadequate" (p. 4).
Then, in an interview with Ollie Lovell (June 2018), Hattie did a complete back-flip with his mantra "the story, the story, the story", saying (what the peer review has been saying since 2009), that the numbers and rankings are too simplistic! 
"What's the story, not what's the numbers..."
"that’s why this will keep me in business to keep telling the story..." (Audio here).
Hattie then admits his rankings are misleading and does not rank anymore! (Audio here).
"it worked then it got misleading so I stopped it"
My observation is that Hattie tends to jump between his rankings and 'The Story, Story, Story' narrative, depending on who he is talking to.

An example, a few months after the Lovell interview, Hattie continued to rank and mislead in his series of  webinars in USA:

Does Hattie faithfully represent the research?

Most people assume he does, but a brief look at Hattie's representation of the Class Size research should raise some questions! (more details on page links on the right menu).

In 2005, Hattie got the attention of educational administrators by labelling 'reducing class size' a disaster then later as going backwards (2005 ACER Lecture & VL, p. 250). He continued with Pearson (2015) naming 'reducing class size' as one of the major distractions! Then again, in the TV series Revolution Schoolclaiming that, reducing class size does not make a difference to the quality of education!

The major class size study that Hattie used was by Glass & Smith (1979), they summarise their data in a graph and table:

The trend and the difference between good and poor quality research are clearly displayed. Glass & Smith conclude (p. 15),
"The curve for the well-controlled studies then, is probably the best representation of the class-size and achievement relationship...
A clear and strong relationship between class size and achievement has emerged... There is little doubt, that other things being equal, more is learned in smaller classes."
Hattie never mentions this "Story" and he consistently reports just the ONE average, e.g., Hattie stated,
"Glass and Smith (1979) reported an average effect of 0.09 based on 77 studies..." Blatchford (2016, p. 106)
I also contacted Prof Glass to ensure I interpreted his study correctly, he kindly replied,
"Averaging class size reduction effects over a range of reductions makes no sense to me. 
It's the curve that counts. 
Reductions from 40 to 30 bring about negligible achievement effects. From 20 to 10 is a different story.  
But Teacher Workload and its relationship to class size is what counts in my book."
Bergeron & Rivard (2017) reiterate,
"Hattie computes averages that do not make any sense."
Dylan Wiliam (2016b) concludes that, 
"…right now meta‐analysis is simply not a suitable technique for summarizing the relative effectiveness of different approaches to improving student learning..."
Wrigley (2015) in Bullying by Numbers, critiquing the EEF in particular but also Hattie,
"Teachers need to speak back to power, and one useful tool is to point to flaws in the use of data" (p. 3).
"Bullying by numbers has a restrictive effect on education, leads to superficial learning, and is seriously damaging teachers’ lives" (p. 6).
This Blog:

is broken up into different pages (menu on the right) designed so you can easily go to what interests you most.

A critique of Hattie's methodology - Effect Size, Student Achievement, CLE and other errors and A Year's Progress???

Then an analysis of particular influences. I would recommend starting with what was his highest ranked influence Self Report Grades and then look at the controversial Class Size.

What does ONE average mean? (no pun intended).

The easiest critique of Hattie's work to understand, is his averaging of a whole range of disparate meta-analyses into ONE effect size. 

This ONE average is contradictory to his latest mantra "The Story, the Story, the Story."

Class Size, above, is one example and another illustrative example is Feedback.

Dr. Jim Thornton Professor of Obstetrics and Gynaecology,
"To a medical researcher, it seems bonkers that Hattie combines all studies of the same intervention into a single effect size."
Terry Wrigley (2018) had already warned,
"What now stands proxy for a breadth of evidence is statistical averaging. This mathematical abstraction neglects the contribution of the practitioner’s accumulated experience, a sense of the students’ needs and wishes, and an understanding of social and cultural context... 
When ‘evidence’ is reduced to a mean effect size, the individual person or event is shut out, complexity is lost and values are erased" (p. 360).

The picture represents the problem well and confirms what teachers have been saying for a long time, e.g., Goldacre (2008) on meta-analysis in education,
"I think you’ll find it’s a bit more complicated than that."

Where To From Here?

The conflicts of interest by the major players such as Hattie and Marzano, are now too big to ignore - see here.

Teachers need to understand the basics of these research methods and Teacher Unions, who have the resources to independently assess evidence, should provide critical awareness, summaries and training for teachers, with a focus on QUALITY evidence.

Wrigley (2018) quoting Gene Glass, suggests a start,

"Indeed, Gene Glass, who originated the idea of meta-analysis, issued this sharp warning about heterogeneity: 'Our biggest challenge is to tame the wild variation in our findings not by decreeing this or that set of standard protocols but by describing and accounting for the variability in our findings. The result of a meta-analysis should never be an average; it should be a graph.'(Robinson, 2004: 29)" (p. 367).