Monday, 18 January 2016

An investigation of the evidence John Hattie presents in Visible Learning

At the 2005 ACER conference (p5) Hattie said,
'We must contest the evidence – as that is the basis of a common understanding of progression.'
Then in Visible Learning [VL] he quotes Karl Popper (p4)
'Those amongst us unwilling to expose their ideas to the hazard of refutation do not take part in the scientific game.'
Tom Bennett, the founder of researchEd,  wrote an influential paper The School Research Lead, where he states (p9),
'There exists a good deal of poor, misleading or simply deceptive research in the ecosystem of school debate...
Where research contradicts the prevailing experiential wisdom of the practitioner, that needs to be accounted for, to the detriment of neither but for the ultimate benefit of the student or educator.'
Prof John O'Neill wrote a detailed letter to the New Zealand Education minister - see here,
'At the very least, the problems below should give you and your officials pause for thought rather than unquestioningly accepting Professor Hattie’ research at face-value, as appears to have been the case.'
Prof Adrian Simpson's detailed analysis of the calculation of effect sizes, The misdirection of public policy: comparing and combining standardised effect sizes states (p451), 
'The numerical summaries used to develop the toolkit (or the alternative ‘barometer of influences’: Hattie 2009) are not a measure of educational impact because larger numbers produced from this process are not indicative of larger educational impact. Instead, areas which rank highly in Marzano (1998), Hattie (2009) and Higgins et al. (2013) are those in which researchers can design more sensitive experiments. 
As such, using these ranked meta-meta-analyses to drive educational policy is misguided.'
Schulmeister & Loviscach (2014) Errors in John Hattie’s “Visible Learning”.
'To think that didactics can be presented as a clear ranking order of effect sizes. It is a dangerous illusion. To an extreme degree, the effect of a specific intervention depends on the circumstances. Focusing on the mean effect sizes and ignoring their considerable variations and condensing the data to a seeming exact ranking order, Hattie pulls the wool over his audience’s eyes.'
Prof Rømer (2016) in Criticism of Hattie's theory about Visible learning,
'On the whole, Visible Learning is not a theory of learning in its own right, nor is it an educational theory. Visible learning, on the other hand, is what happens when pedagogy and learning are exposed to a relatively unexplained evaluation theory' (p1, translated from Danish).
Prof Dylan Wiliam in Leadership for teacher learning, concludes that, 
'…right now meta‐analysis is simply not a suitable technique for summarizing the relative effectiveness of different approaches to improving student learning…'
Again in Getting educational research rightProf Wiliam writes,
'Teachers, leaders and policymakers all need to be critical consumers of research.'
Greg Ashman in his excellent blog 'The article that England’s Chartered College will not print', analyses the Education Endowment Foundation’s (EEF) online toolkit, and says one strand stands out from the rest - the implementation ‘meta-cognition and self-regulation’. But what is it?

Ashman uses an explanation from Kevan Collins, Chief Executive of the EEF, ‘Meta-cognition is getting beyond – above the actual thing – to have a better sense of it.'

After analysing their research and finding many issues, Ashman does a clever twist and says, 'the EEF should look at their research, and get, ‘beyond – above the actual thing – to have a better sense of it,’ and then break it apart.'

The Aim of this Blog:

is to be a critical consumer of research and contest the evidence that Hattie presents in his 2009 book Visible Learning [VL] by using independent peer reviews and by analysing the studies to get, ‘beyond – above the actual thing – to have a better sense of it,’ and then break it apart.'

The blog is broken up into different pages (menu on the right) designed so you can easily go to what interests you most

Firstly a critique of Hattie's methodology - Effect Size, Student Achievement, CLE and other errors, A Year's Progress and Validity/Reliability. 

Then an analysis of particular influences. I would recommend starting with what was his highest ranked influence Self Report Grades and then look at the controversial Class Size.

In his interview with Hanne Knudsen (2017) John Hattie: I’m a statistician, I’m not a theoretician Hattie states,
'What I find fascinating is that since I first published this back in the 1990s, no one has come up with a better explanation for the data... 
I am updating the meta-analysis all the time; I am up to 1400 now. I do that because I want to be the first to discover the error, the mistake' (p7). 
I find these comments hard to reconcile since, as you will see, many scholars have published peer reviews identifying significant problems in Hattie's work and have called into question his entire model.

I also recommend teachers look at the section - A Years Progress? It analyses what I think is Hattie's most dangerous idea that an effect size of 0.4 = 1 year's student progress.

Contributions are welcome. Many of the controversial influences only have 1-3 meta-analyses to read. I can provide you copies of most of the research used.

Hattie's Aim:

'The model I will present... may well be speculative, but it aims to provide high levels of explanation for the many influences on student achievement as well as offer a platform to compare these influences in a meaningful way... I must emphasise that these ideas are clearly speculative' (VL, p4).
Hattie uses the Effect Size (d) statistic to interpret, compare and rank educational influences.

The effect size is supposed to measure the change in student achievement; a controversial topic in and of itself (there are many totally different concepts of what achievement is - see here).


The peer reviews have documented significant issues with Hattie's work ranging from flawed methodology, calculation errors, misrepresentation to questionable inference and interpretation.

Simpson (2017) and Bergeron (2017) detail methodological differences showing the effect size for the SAME experiment can differ enormously (0 to infinity!) depending on how it is calculated. So comparing effect sizes across different studies is meaningless!

Glass (1977) and Slavin (2016) also show this with Prof Slavin concluding,
'These differences say nothing about the impact on children, but are completely due to differences in study design.'
Misrepresentation, calculation errors, questionable inference, and interpretation occur in a variety of ways. The most serious is Hattie's use of studies that do not measure what he claims they do. This occurs in 3 ways:

Firstly, many studies do not measure achievement but something else, e.g., IQ, hyperactivity, behavior, and engagement. See Student Achievement for more details.

Secondly, most studies do not compare groups of students that control for the particular influence that Hattie claims. There is a litany of examples, e.g., self-report grades, reducing disruptive behavior, welfare, diet, Teacher Training, Mentoring, etc.

Bergeron (2017) insightfully identifies this problem,
'in addition to mixing multiple and incompatible dimensions, Hattie confounds two distinct populations: 
1) factors that influence academic success and 
2) studies conducted on these factors.' 
Lervåg &  Melby-Lervåg (2014) also discuss this issue,
'... one [Hattie] has not investigated how a concrete measure tested in school affects the students' skills, but the connection between different relationships.'
Thirdly, Hattie used ONE average to represent each meta-analysis, yet each meta-analysis represented from 4 up to 4000 studies (Marzano). 

But, apart from giving equal weight to each average, the big question is, 

what does ONE average mean? (no pun intended)

The clear example is Class Size:

Gene Glass and Mary Lee Smith (1979), 1 of the 3 meta-analyses that Hattie uses for class size, summarise their data in a graph and table:

The trend and the difference between good and poor quality research are clearly displayed. Gene Glass and Mary Lee Smith conclude (p15),
'The curve for the well-controlled studies then, is probably the best representation of the class-size and achievement relationship...
A clear and strong relationship between class size and achievement has emerged... There is little doubt, that other things being equal, more is learned in smaller classes.'

Hattie calculated one average from the above table (3rd Column) of d = 0.09 and used this to represent the whole meta-analysis (but the average is d = 0.25).

In his collaboration with Pearson (2015) - What Works in Schools - the politics of distraction, he names class size as one of the major distractions.

In previous presentations, he consistently labelled class size a disaster or as going backwards (2005 ACER presentation).

Yet it is ironic that the author of the class size study, Professor Gene Glass, who also invented the meta-analysis methodology, wrote a book with 20 other distinguished academics contradicting Hattie, 50 Myths and Lies That Threaten America's Public Schools: The Real Crisis in Education.

In Myth #17: Class size does not matter; reducing class sizes will not result in more learning, the 21 academics collaboratively say,
'Fiscal conservatives contend, in the face of overwhelming evidence to the contrary, that students learn as well in large classes as in small.'
I contacted Prof Glass to ensure I interpreted his study correctly, he kindly replied,
'Averaging class size reduction effects over a range of reductions makes no sense to me. 
It's the curve that counts. 
Reductions from 40 to 30 bring about negligible achievement effects. From 20 to 10 is a different story.  
But Teacher Workload and its relationship to class size is what counts in my book.'
Bergeron (2017) reiterates,
'Hattie computes averages that do not make any sense.'
The next major problem is moderating variables

Prof Dylan Wiliam casts significant doubt on Hattie's entire model by arguing that the age of the students and the time over which each study runs is an important component contributing to the effect size. 

Professor Dylan Wiliam summarises, 
'the effect sizes proposed by Hattie are, at least in the context of schooling, just plain wrong. Anyone who thinks they can generate an effect size on student learning in secondary schools above 0.5 is talking nonsense.'
The massive data collected to construct the United States Department of Education effect size benchmarks support Prof Wiliam's contention.

These show a huge variation in effect sizes from younger to older students. Which demonstrates that age is a HUGE moderating variable since, in order to compare effect sizes, studies need to control for the age of the students and the time over which the study ran. Otherwise, differences in effect size can be due to the age of the students measured!

Blatchford et al (2016, p96) state that Hattie's comparing of effect sizes, 
'is not really a fair test.'
Wecker et al (2016, p35)
'the methodological claims arising from Hattie's approach, and the overall appropriateness of this approach suggest a fairly clear conclusion: a large proportion of the findings are subject to reasonable doubt.'
Prof Pierre-Jérôme Bergeron
'When taking the necessary in-depth look at Visible Learning with the eye of an expert, we find not a mighty castle but a fragile house of cards that quickly falls apart...

To believe Hattie is to have a blind spot in one’s critical thinking when assessing scientific rigour. To promote his work is to unfortunately fall into the promotion of pseudoscience. Finally, to persist in defending Hattie after becoming aware of the serious critique of his methodology constitutes willful blindness.'
Dr. Neil Hooley, in his review of Hattie - talks about the complexity of classrooms and the difficulty of controlling variables, 
'Under these circumstances, the measure of effect size is highly dubious' (p44).
Dr. Mandy Lupton on Problem Based Learning,
'The studies have different effect sizes for different contexts and different levels of schooling, thus averaging these into one metric is meaningless.'
Poulsen (2014) in John Hattie: A Revolutionary Educational Researcher?
'Do I believe in Hattie's results? No! I do not dare it' (p6).
Schulmeister & Loviscach (2014) Errors in John Hattie’s “Visible Learning”.
'If one corrects the errors mentioned above, list positions take big leaps up or down.Even more concerning is the absurd precision this ranking conveys. It only shows the averages of effect sizes but not their considerable variation within every group formed by Hattie and even more so within every individual meta-analysis.' 
Dr. Jim Thornton Professor of Obstetrics and Gynaecology at Nottingham University said,
'To a medical researcher, it seems bonkers that Hattie combines all studies of the same intervention into a single effect size. Why should “sitting in rows”, for example, have the same effect on primary children as on university students, on maths as on art teaching, on behaviour outcomes as on knowledge outcomes? In medicine it would be like combining trials of steroids to treat rheumatoid arthritis, effective, with trials of steroids to treat pneumonia, harmful, and concluding that steroids have no effect! I keep expecting someone to tell me I’ve misread Hattie.' 

Why has Hattie become so popular?

In his excellent analysis School Leadership and the cult of the guru: the neo-Taylorism of Hattie, Professor Scott Eacott says,
'Hattie’s work has provided school leaders with data that appeal to their administrative pursuits' (p3). 
'The uncritical acceptance of his work as the definitive word on what works in schooling, particularly by large professional associations such as ACEL, is highly problematic' (p11).
Professor Gunn Imsen (2011) is also concerned about this,
'The Hattie fever is held by equally keen politicians, municipal bureaucrats and leaders who strive to achieve quantitative results in their target management systems, which are part of a paper mill that is stifling school Norway. The best medicine against the fever is that Norwegian teachers take back the faith in themselves, their own judgment and trust in their own skills in the work of good teaching for the students. And that the school authorities support them in this.'
Professor Thomas Rømer (2016) concurs in the Danish context (p4).

The Rise of the Policy Entrepreneur:

Science begins with skepticism, however, in the hierarchical leadership structures of Educational Institutions skeptical teachers are not valued, although ironically, the skeptical skills of questioning and analysis are valued in students.  This paves the way for the many 'snake oil' remedies and the rise of policy entrepreneurs who 'shape and benefit from school reform discourses'.

Professor John O'Neill in analysing Hattie's influence on New Zealand Education Policy describes the process well:
'public policy discourse becomes problematic when the terms used are ambiguous, unclear or vague' (p1). 
[The] 'discourse seeks to portray the public sector as ‘ineffective, unresponsive, sloppy, risk-averse and innovation-resistant’ yet at the same time it promotes celebration of public sector 'heroes' of reform and new kinds of public sector 'excellence.
Relatedly, Mintrom (2000) has written persuasively in the American context, of the way in which ‘policy entrepreneurs’ position themselves politically to champion, shape and benefit from school reform discourses' (p2).
Hattie's recent public presentation in the TV documentary Revolution School confirms Professor O'Neill's analysis. Dan Haesler reports Hattie's remedy cost the school around $60,000.

Professor Ewald Terhardt (2011, p434)
'A part of the criticism on Hattie condemns his close links to the New Zealand Government and is suspicious of his own economic interests in the spread of his assessment and training programme (asTTle).'

We need to move from evidence to QUALITY of evidence:

There must now be at least some hesitation in accepting Hattie's work as the definitive statement on Teaching.

Beng Huat See, in her paper, Evaluating the evidence in evidence-based policy and practice: Examples from systematic reviews of literature, suggests the direction where educational research must now go,
'This paper evaluates the quality of evidence behind some well-known education programmes... It shows that much of the evidence is weak, and fundamental flaws in research are not uncommon. This is a serious problem if teaching practices and important policy decisions are made based on such flawed evidence.

Lives may be damaged and opportunities missed.

...funders of research and research bodies need to insist on quality research and fund only those that meet the minimum quality criteria.'
The debate must now shift from Evidence to Quality of Evidence.

The US Dept of Education has done this and has developed clearly defined quality criteria in their What Works Clearing House.

Most of the meta-analyses that Hattie used would NOT satisfy these quality criteria, see here.

News Flash:

Medical researchers decide to use Hattie's methods for ranking influences on people's Health. Current results are:

Self-report health (expectation)1.44
Doctor/patient relationship0.72
Home environment0.57
Socio/economic status0.57
Number of beds in ward0.30
Home visitation0.29
Doctor/patient ratio0.21
Doctor training0.11
Govt versus Private Hospital0.03
Intensive Care-1.99

A Teacher's Lament:

Gabbie Stroud resigned from her teaching position and wrote:
'Teaching – good teaching - is both a science and an art. Yet in Australia today [it]… is considered something purely technical and methodical that can be rationalised and weighed.

But quality teaching isn't borne of tiered 'professional standards'. It cannot be reduced to a formula or discrete parts. It cannot be compartmentalised into boxes and 'checked off'. Good teaching comes from professionals who are valued. It comes from teachers who know their students, who build relationships, who meet learners at their point of need and who recognise that there's nothing standard about the journey of learning. We cannot forget the art of teaching – without it, schools become factories, students become products and teachers: nothing more than machinery.'
Whilst it may be simpler and easier to see teaching as a set of discreet influences, the evidence shows that these influences interact in ways in which no-one, as yet, can quantify. It is the combining of influences in a complex way that defines the 'art' of teaching.  


  1. I suggest you read the book 'Make It Stick' which gives a number of research-based strategies for helping students remember what they have previously learned.

  2. thanks for the tip, i will try to find the book

  3. wow excellent analysis

  4. Thanks this is really informative, I'm amazed at all the mistakes Hattie makes.