Monday, 18 January 2016

An investigation of the evidence John Hattie presents in Visible Learning

"Our discipline needs to be saturated with critique of ideas; and it should be welcomed. Every paradigm or set of conjectures should be tested to destruction and its authors, adherents, and users of the ideas should face public accountability." (John Hattie, 2017, p 428).
The peer reviews are saturated with detailed critiques of Hattie's work but most educators do not seem to be aware of them. 

My aim is to raise awareness of these critiques and investigate Hattie's claims in the spirit of Tom Bennett, the founder of researchEd,
'There exists a good deal of poor, misleading or simply deceptive research in the ecosystem of school debate...
Where research contradicts the prevailing experiential wisdom of the practitioner, that needs to be accounted for, to the detriment of neither but for the ultimate benefit of the student or educator.' The School Research Lead (p9).
The first level of critique by scholars is to determine:

Has Hattie faithfully represented the research?

Most people assume he does, but a brief look at Hattie's representation of the Class Size research should raise some questions! (more details on page links on the right menu).

In Hattie's collaboration with Pearson (2015) - What Doesn't Work in Education - the politics of distraction, he names 'reducing class size' as one of the major distractions!

In previous presentations, he consistently labelled 'reducing class size' as a disaster or as going backwards (2005 ACER Lecture & 2008 Nuthall lecture).

The commercial arm of Hattie's Visible Learning, the company Corwin, continue (as of September 2018) to promote Hattie's original claim, that "reducing class size does not work!"

However, if you look at the studies Hattie referenced for these claims, you will find a different story, a story more consistent with teachers' experience. Hattie's and Corwin's polemic is easily challenged!

For example, 1 of the 3 meta-analyses that Hattie used was by Gene Glass and Mary Lee Smith (1979), they summarise their data in a graph and table:


The trend and the difference between good and poor quality research are clearly displayed. Gene Glass and Mary Lee Smith conclude (p15),
'The curve for the well-controlled studies then, is probably the best representation of the class-size and achievement relationship...
A clear and strong relationship between class size and achievement has emerged... There is little doubt, that other things being equal, more is learned in smaller classes.'




Hattie calculated one average from the above table (3rd Column) of d = 0.09 and used this to represent the whole meta-analysis (but this is incorrect the average is d = 0.25). But even if Hattie calculated correctly, there are major issues with his interpretation.

I contacted Prof Glass to ensure I interpreted his study correctly, he kindly replied,
'Averaging class size reduction effects over a range of reductions makes no sense to me. 
It's the curve that counts. 
Reductions from 40 to 30 bring about negligible achievement effects. From 20 to 10 is a different story.  
But Teacher Workload and its relationship to class size is what counts in my book.'
Bergeron (2017) reiterates,
'Hattie computes averages that do not make any sense.'
Prof Peter Blatchford Class Size Eastern and Western perspectives (2016) states,

'Given the important influence these [Hattie & others] reports seem to be having in government and regional education policies, they need to be carefully scrutinised in order to be sure about the claims that are made' (p93).
Prof Scott Eacott (2017) in School Leadership and the cult of the guru: the neo-Taylorism of Hattie, describes the uncritical worship of Visible Learning as,
'a tragedy for Australian school leadership' (p1).
Prof Adrian Simpson's detailed analysis of the calculation of effect sizes, The misdirection of public policy: comparing and combining standardised effect sizes states (p451), 
'The numerical summaries used to develop the toolkit (or the alternative ‘barometer of influences’: Hattie 2009) are not a measure of educational impact because larger numbers produced from this process are not indicative of larger educational impact. Instead, areas which rank highly in Marzano (1998), Hattie (2009) and Higgins et al. (2013) are those in which researchers can design more sensitive experiments. 
As such, using these ranked meta-meta-analyses to drive educational policy is misguided.'
Prof Dylan Wiliam in Leadership for teacher learning, concludes that, 
'…right now meta‐analysis is simply not a suitable technique for summarizing the relative effectiveness of different approaches to improving student learning…'
Again in Getting educational research rightProf Wiliam writes,
'Teachers, leaders and policymakers all need to be critical consumers of research.'
McKnight & Whitburn (2018) in Seven reasons to question the hegemony of Visible Learning under the heading 'Visible Learning courts fascism',
'Should a single doctrine ever govern teachers, classrooms and schools? If we, the authors, were to develop a challenge to Visible Learning, should we be delighted to trademark it, copyright its materials and instigate a counter army of Not-so-Visible Learning acolytes? How often do professional learning programs on Visible Learning include critiques of Visible Learning? We have sought and found no evidence of reflexivity' (p15).
Prof Terry Wrigely (2015) in Bullying by Numbers, critiquing the EEF in particular but also Hattie,
'Teachers need to speak back to power, and one useful tool is to point to flaws in the use of data' (p3).
'Bullying by numbers has a restrictive effect on education, leads to superficial learning, and is seriously damaging teachers’ lives' (p6).
Meta-analysis in education:
"I think you’ll find it’s a bit more complicated than that" (Goldacre, 2008).

The Aim of this Blog:


is to be a critical consumer of research and contest the evidence that Hattie presents in his 2009 book Visible Learning [VL] by using independent peer reviews and by analysing the studies Hattie referenced.


Over 40 peer reviews with detailed critiques of Hattie's work have been collected so far - see References.


The blog is broken up into different pages (menu on the right) designed so you can easily go to what interests you most.


Firstly a critique of Hattie's methodology - Effect Size, Student Achievement, CLE and other errors and A Year's Progress???


Then an analysis of particular influences. I would recommend starting with what was his highest ranked influence Self Report Grades and then look at the controversial Class Size.


In his interview with Hanne Knudsen (2017) John Hattie: I’m a statistician, I’m not a theoretician Hattie states,

'What I find fascinating is that since I first published this back in the 1990s, no one has come up with a better explanation for the data... 
I am updating the meta-analysis all the time; I am up to 1400 now. I do that because I want to be the first to discover the error, the mistake' (p7). 
I find these comments hard to reconcile since, as you will see, many scholars have published peer reviews identifying significant problems in Hattie's work and have called into question his entire model.

I also recommend teachers look at the section - A Years Progress? It analyses what I think is Hattie's most dangerous idea that an effect size of 0.4 = 1 year's student progress.


Contributions are welcome. Many of the controversial influences only have 1-3 meta-analyses to read.



Summary:


The peer reviews have documented significant issues with Hattie's work ranging from flawed methodology, calculation errors, misrepresentation to questionable inference and interpretation.


Simpson (2017) and Bergeron (2017) detail methodological differences showing the effect size for the SAME experiment can differ enormously (0 to infinity!) depending on how it is calculated. So comparing effect sizes across different studies is meaningless!


Glass (1977) and Slavin (2016) also show this with Prof Slavin concluding,

'These differences say nothing about the impact on children, but are completely due to differences in study design.'
Misrepresentation, calculation errors, questionable inference, and interpretation occur in a variety of ways. The most serious is Hattie's use of studies that do not measure what he claims they do. This occurs in 3 ways:

Firstly, many studies do not measure achievement but something else, e.g., IQ, hyperactivity, behavior, and engagement. See Student Achievement for more details.


Secondly, most studies do not compare groups of students that control for the particular influence that Hattie claims. There is a litany of examples, e.g., self-report grades, reducing disruptive behavior, welfare, diet, Teacher Training, Mentoring, etc.



Bergeron (2017) insightfully identifies this problem,
'in addition to mixing multiple and incompatible dimensions, Hattie confounds two distinct populations: 
1) factors that influence academic success and 
2) studies conducted on these factors.' 
Lervåg &  Melby-Lervåg (2014) also raise this issue,
'Hattie has not investigated how a concrete measure tested in school affects the students' skills, but the connection between different relationships.'
Thirdly, Hattie used ONE average to represent each meta-analysis, yet each meta-analysis represented from 4 up to 4000 studies (Marzano). 



But, apart from giving equal weight to each average, the big question is, 


what does ONE average mean? (no pun intended).


Nepper Larsen (2014) Know thy impact – blind spots in John Hattie’s evidence credo.

'there is (or seems to be) an intrinsic conflict - perhaps even a logical contradiction and a paradox - in the set-up of Visible Learning for Teachers (Hattie 2012), and in the book’s backbone arguments and pedagogical advice. Proudly and stoutly as a devoted king of statistics Hattie presents his overwhelming 240 million data analyses, but a vigilant reader will notice that he is essentially a practitioner and a thinker who proclaims that each teacher must have an eye for the unique student.
Hattie does not see and does not want to know that the life and thought of this very student cannot be generalized and transformed into a best-practice-induced ideal type. Therefore the student is rather likely to disappear the more the meta-studies accumulate and pile up – and the more they get transformed into universal clues and keys for many nations’ educational political actions' (p6-7).
Terry Wrigley (2018) The power of ‘evidence’: Reliable science or a set of blunt tools? 
'What now stands proxy for a breadth of evidence is statistical averaging. This mathematical abstraction neglects the contribution of the practitioner’s accumulated experience, a sense of the students’ needs and wishes, and an understanding of social and cultural context... 
When ‘evidence’ is reduced to a mean effect size, the individual person or event is shut out, complexity is lost and values are erased' (p360).
Wrigley goes on to quote Gene Glass, 

'Indeed, Gene Glass, who originated the idea of meta-analysis, issued this sharp warning about heterogeneity: "Our biggest challenge is to tame the wild variation in our findings not by decreeing this or that set of standard protocols but by describing and accounting for the variability in our findings. The result of a meta-analysis should never be an average; it should be a graph."(Robinson, 2004: 29)' (p367).
The next major problem is the moderating variables

Prof Dylan Wiliam casts significant doubt on Hattie's entire model by arguing that the age of the students and the time over which each study runs is an important component contributing to the effect size. 


Professor Dylan Wiliam summarises, 

'the effect sizes proposed by Hattie are, at least in the context of schooling, just plain wrong. Anyone who thinks they can generate an effect size on student learning in secondary schools above 0.5 is talking nonsense.'
The massive data collected to construct the United States Department of Education effect size benchmarks support Prof Wiliam's contention.

These show a huge variation in effect sizes from younger to older students. Which demonstrates that age is a HUGE moderating variable since, in order to compare effect sizes, studies need to control for the age of the students and the time over which the study ran. Otherwise, differences in effect size can be due to the age of the students measured!



Hattie's Aim:

'The model I will present... may well be speculative, but it aims to provide high levels of explanation for the many influences on student achievement as well as offer a platform to compare these influences in a meaningful way... I must emphasise that these ideas are clearly speculative' (VL, p4).

Hattie uses the Effect Size (d) statistic to interpret, compare and rank educational influences.

The effect size is supposed to measure the change in student achievement; a controversial topic in and of itself (there are many totally different concepts of what achievement is - see here).



NEWS FLASH:


Hattie admits his rankings are misleading and does not rank anymore! Ollie Lovell interview with Hattie in June 2018 (1hr 21min 45sec) here.


Examples of Peer Reviews:


Blatchford et al (2016) state that Hattie's comparing of effect sizes, 

'is not really a fair test' (p96).
Wecker et al (2016),
'the methodological claims arising from Hattie's approach, and the overall appropriateness of this approach suggest a fairly clear conclusion: a large proportion of the findings are subject to reasonable doubt' (p35).
Prof Pierre-Jérôme Bergeron,
'When taking the necessary in-depth look at Visible Learning with the eye of an expert, we find not a mighty castle but a fragile house of cards that quickly falls apart...

To believe Hattie is to have a blind spot in one’s critical thinking when assessing scientific rigour. To promote his work is to unfortunately fall into the promotion of pseudoscience. Finally, to persist in defending Hattie after becoming aware of the serious critique of his methodology constitutes willful blindness.'
Prof Terry Wrigely (2015) in Bullying by Numbers,
'Its method is based on stirring together hundreds of meta-analyses reporting on many thousands of pieces of research to measure the effectiveness of interventions.  
This is like claiming that a hammer is the best way to crack a nut, but without distinguishing between coconuts and peanuts, or saying whether the experiment used a sledgehammer or the inflatable plastic one that you won at the fair' (p5).
Dr. Neil Hooley, in his review of Hattie - talks about the complexity of classrooms and the difficulty of controlling variables, 
'Under these circumstances, the measure of effect size is highly dubious' (p44).
Schulmeister & Loviscach (2014) Errors in John Hattie’s “Visible Learning”,
'To think that didactics can be presented as a clear ranking order of effect sizes. It is a dangerous illusion. To an extreme degree, the effect of a specific intervention depends on the circumstances. Focusing on the mean effect sizes and ignoring their considerable variations and condensing the data to a seeming exact ranking order, Hattie pulls the wool over his audience’s eyes.'
Rømer (2016) in Criticism of Hattie's theory about Visible learning,
'On the whole, Visible Learning is not a theory of learning in its own right, nor is it an educational theory. Visible learning, on the other hand, is what happens when pedagogy and learning are exposed to a relatively unexplained evaluation theory' (p1, translated from Danish).
Nepper Larsen (2014) Know thy impact – blind spots in John Hattie’s evidence credo,
'The first among several flaws in Hattie's book: what is an effect?
John Hattie never explains what the substance of an effect is. What is an effect’s ontology, its way of being in the world? Does it consist of something as simple as a correct answer on a multiple-choice task, the absence of arithmetic and spelling errors? And may all the power of teaching and learning processes (including abstract and imaginative thinking, history of ideas and concepts, historical knowledge, dedicated experiments, hands-on insights, sudden lucidity, social and language criticism, profound existential discussions, social bonding, and personal, social, and cultural challenges) all translate into an effect score without loss? Such basic and highly important philosophical and methodological questions do not seem to concern the evidence preaching practitioner and missionary Hattie.
Figures taken out of contexts say absolutely nothing, and Hattie never contextualizes his procedures' (p3-4);
'Teachers get identified as the primary and indispensable learning factor and thereby as a public, expensive, and untrustworthy potential enemy. This amounts to scape goat projection par excellence... ' (p11).
'The concluding remark must be that the advantage of John Hattie’s evidence credo is that is so banal, mundane and trivial that even educational planners and economists can understand it' (p12).
Prof John O'Neill wrote a detailed letter to the New Zealand Education minister - see here,
'At the very least, the problems below should give you and your officials pause for thought rather than unquestioningly accepting Professor Hattie’s research at face-value, as appears to have been the case.'
Nilholm, Claes (2017) Is John Hattie in Blue Sword? (The Blue Sword was a fantasy novel),
'Hattie provides very scarce information about his approach. This makes it very difficult to replicate his analyses. The ability to replicate an analysis is considered by many as a crucial determinant of scientific work... 
there is some evidence that his thoughts lead in many ways in the wrong direction' (p3).
Dr. Mandy Lupton on Problem Based Learning,
'The studies have different effect sizes for different contexts and different levels of schooling, thus averaging these into one metric is meaningless.'
Poulsen (2014) in John Hattie: A Revolutionary Educational Researcher?
'Do I believe in Hattie's results? No! I do not dare it' (p6).
Schulmeister & Loviscach (2014) Errors in John Hattie’s “Visible Learning”,
'If one corrects the errors mentioned above, list positions take big leaps up or down. Even more concerning is the absurd precision this ranking conveys. It only shows the averages of effect sizes but not their considerable variation within every group formed by Hattie and even more so within every individual meta-analysis.' 
Dr. Jim Thornton Professor of Obstetrics and Gynaecology at Nottingham University said,
'To a medical researcher, it seems bonkers that Hattie combines all studies of the same intervention into a single effect size. Why should “sitting in rows”, for example, have the same effect on primary children as on university students, on maths as on art teaching, on behaviour outcomes as on knowledge outcomes? In medicine it would be like combining trials of steroids to treat rheumatoid arthritis, effective, with trials of steroids to treat pneumonia, harmful, and concluding that steroids have no effect! I keep expecting someone to tell me I’ve misread Hattie.' 

Why has Hattie become so popular?


In his excellent analysis in School Leadership and the cult of the guru: the neo-Taylorism of Hattie, Professor Scott Eacott says,

'Hattie’s work has provided school leaders with data that appeal to their administrative pursuits' (p3). 
'The uncritical acceptance of his work as the definitive word on what works in schooling, particularly by large professional associations such as ACEL, is highly problematic' (p11).
 McKnight & Whitburn (2018) concur with Eacott,
'In speaking to teachers, we have found that many have concerns that are similar to ours, but that they are silenced by senior staff in their schools, who have hitched their own branding to particular bandwagons. There are dangers to educational freedoms and to teacher professionalism when schools have paid for pedagogy' (p20).
Professor Gunn Imsen (2011) is also concerned about this,
'The Hattie fever is held by equally keen politicians, municipal bureaucrats and leaders who strive to achieve quantitative results in their target management systems, which are part of a paper mill that is stifling school Norway. The best medicine against the fever is that Norwegian teachers take back the faith in themselves, their own judgment and trust in their own skills in the work of good teaching for the students. And that the school authorities support them in this.'
Professor Thomas Rømer (2016) concurs in the Danish context (p4).

The Rise of the Policy Entrepreneur:


Science begins with skepticism, however, in the hierarchical leadership structures of Educational Institutions skeptical teachers are not valued, although ironically, the skeptical skills of questioning and analysis are valued in students.  This paves the way for the many 'snake oil' remedies and the rise of policy entrepreneurs who 'shape and benefit from school reform discourses'.


Professor John O'Neill in analysing Hattie's influence on New Zealand Education Policy describes the process well:

'public policy discourse becomes problematic when the terms used are ambiguous, unclear or vague' (p1). 
[The] 'discourse seeks to portray the public sector as ‘ineffective, unresponsive, sloppy, risk-averse and innovation-resistant’ yet at the same time it promotes celebration of public sector 'heroes' of reform and new kinds of public sector 'excellence.
Relatedly, Mintrom (2000) has written persuasively in the American context, of the way in which ‘policy entrepreneurs’ position themselves politically to champion, shape and benefit from school reform discourses' (p2).
Hattie's recent public presentation in the TV documentary Revolution School confirms Professor O'Neill's analysis. Dan Haesler reports Hattie's remedy cost the school around $60,000.

McKnight & Whitburn (2018) in Seven reasons to question the hegemony of Visible Learning are also concerned about Hattie's portrayal in this TV series as,

'the potential saviour of public education and redeemer of recalcitrant teachers' (p2-3).
They also question the financial conflict of interest of Visible Learning,
'Where are the flows of capital around Visible Learning? Where is capital and what kinds of capital are accruing for those producing “Visible Learning” as a brand? What material and financial benefits flow on to teachers and students?' (p6).
Professor Ewald Terhardt (2011, p434)
'A part of the criticism on Hattie condemns his close links to the New Zealand Government and is suspicious of his own economic interests in the spread of his assessment and training programme (asTTle).'
Professor Gene Glass with 20 other distinguished academics also concurs with John O'Neill in, 50 Myths and Lies That Threaten America's Public Schools: The Real Crisis in Education.
'The mythical failure of public education has been created and perpetuated in large part by political and economic interests that stand to gain from the destruction of the traditional system. There is an intentional misrepresentation of facts through a rapidly expanding variety of organizations and media that reach deep into the psyche of the nation's citizenry. These myths must be debunked. Our method of debunking these myths and lies is to argue against their logic, or to criticize the data supporting the myth, or to present more credible contradictory data' (p4).

We need to move from evidence to QUALITY of evidence:


There must now be at least some hesitation in accepting Hattie's work as the definitive statement on Teaching.

Beng Huat See, in her paper, Evaluating the evidence in evidence-based policy and practice: Examples from systematic reviews of literature, suggests the direction where educational research must now go,
'This paper evaluates the quality of evidence behind some well-known education programmes... It shows that much of the evidence is weak, and fundamental flaws in research are not uncommon. This is a serious problem if teaching practices and important policy decisions are made based on such flawed evidence.

Lives may be damaged and opportunities missed.

...funders of research and research bodies need to insist on quality research and fund only those that meet the minimum quality criteria.'
The debate must now shift from Evidence to Quality of Evidence.

The US Dept of Education has done this and has developed clearly defined quality criteria in their What Works Clearing House.

Most of the meta-analyses that Hattie used would NOT satisfy these quality criteria, see here.

News Flash:

Medical researchers decide to use Hattie's methods for ranking influences on people's Health. Current results are:

Influenced
Viagra10.00
Prozac9.10
Surgery4.00
Vitamins3.66
Self-report health (expectation)1.44
Feedback0.73
Doctor/patient relationship0.72
Home environment0.57
Socio/economic status0.57
Number of beds in ward0.30
Home visitation0.29
Doctor/patient ratio0.21
Doctor training0.11
Govt versus Private Hospital0.03
Steroids-0.05
Physiotherapy-0.06
Acupuncture-1.08
Intensive Care-1.99


A Teacher's Lament:


Gabbie Stroud resigned from her teaching position and wrote:

'Teaching – good teaching - is both a science and an art. Yet in Australia today [it]… is considered something purely technical and methodical that can be rationalised and weighed.

But quality teaching isn't borne of tiered 'professional standards'. It cannot be reduced to a formula or discrete parts. It cannot be compartmentalised into boxes and 'checked off'. Good teaching comes from professionals who are valued. It comes from teachers who know their students, who build relationships, who meet learners at their point of need and who recognise that there's nothing standard about the journey of learning. We cannot forget the art of teaching – without it, schools become factories, students become products and teachers: nothing more than machinery.'
Whilst it may be simpler and easier to see teaching as a set of discreet influences, the evidence shows that these influences interact in ways in which no-one, as yet, can quantify. It is the combining of influences in a complex way that defines the 'art' of teaching.  

4 comments:

  1. I suggest you read the book 'Make It Stick' which gives a number of research-based strategies for helping students remember what they have previously learned.

    ReplyDelete
  2. thanks for the tip, i will try to find the book

    ReplyDelete
  3. wow excellent analysis

    ReplyDelete
  4. Thanks this is really informative, I'm amazed at all the mistakes Hattie makes.

    ReplyDelete