Monday, 18 January 2016

A synthesis of critiques of Visible Learning

Hattie states the aim of his 2009 book, Visible Learning [VL]: "The model I will present ... may well be speculative, but it aims to provide high levels of explanation for the many influences on student achievement as well as offer a platform to compare these influences in a meaningful way... I must emphasise that these ideas are clearly speculative" (p4).

Hattie uses two statistics: Effect Size (d) and Common Language Effect Size (CLE) to interpret, compare and rank educational influences. However, subsequent peer reviews showed he calculated all CLE's incorrectly, as a result, he now focuses on using an effect size d = 0.4 as his hinge point, claiming this is equivalent to a year’s progress. Although, there are significant problems with this interpretation.

The effect size is supposed to measure the change in student achievement. However, each study measured achievement differently. Also, many studies did not measure achievement at all, but rather something else e.g., IQ, hyperactivity, behaviour, and engagement. This is the classic problem of comparing apples with oranges and has led many scholars to question the validity and reliability of Hattie's effect sizes and rankings, e.g., Higgins and Simpson (2011):“We argue the process by which this number has been derived has rendered it practically meaningless“ (p199). They advise exercising "mega-caution" (p200). Blatchford, et al (2016) state that Hattie's comparing of effect sizes, "is not really a fair test" (p96).

Yet, Hattie’s rankings have taken on “gospel” status due to: the major promotion by politicians, administrators and principals (it's in their interest, e.g. class size), very little contesting by teachers (they don't have the time, or who is going to challenge the principal?) and limited access to scholarly critiques.

But, he prefaced his book with significant doubt "I must emphasise these are clearly speculative” (p4).

Nevertheless, his reservation has changed to an authority and certainty that is at odds with the caution that most of the authors of his studies recommend, e.g., class size and ability group. Caution due to lack of quality studies, inability to control variables, major differences in how achievement is measured and the many confounding variables. Also, there is significant critique by scholars who identify the many errors that Hattie makes; from major calculation errors and excessive inference to misrepresenting studies, e.g., Higgins and Simpson (2011).

Hattie limits himself to a particular study design called a meta-analysis and does not include other study designs, e.g., case study. In many ways, a meta-analysis uses 'broad brush strokes' while a case study goes into more detail and is more consistent with teacher classroom experience. As a result, he prefaces VL with the limitation,

"It is not a book about classroom life, and does not speak to the nuances and details of what happens within classrooms."

However, many influences such as class size, teacher subject knowledge, teacher training, ability grouping, student control, mentoring, teacher immediacy, problem-based learning, exercise, welfare, and homework are considered to be about classroom life but Hattie has given them a low ranking. Also, in his presentations, he describes many of these low ranked influences as DISASTERS! 

This seems DEFY widespread teacher experience. 

At the 2005 ACER conference (p14) Hattie said, "We must contest the evidence – as that is the basis of a common understanding of progression." Then in VL he quotes Karl Popper: "Those amongst us unwilling to expose their ideas to the hazard of refutation do not take part in the scientific game" (p4).

Is Hattie’s evidence stronger than other researchers or widespread teacher experience?

A summary of the major issues scholars have found with Hattie's work (details on the page links on the right):

  • Hattie misrepresents studies by including irrelevant studies in particular influences, e.g. peer evaluation in 'self-report' and studies on emotionally disturbed students in 'reducing disruptive behaviour'.
  • Hattie often reports the opposite conclusion to that of the actual authors of the studies he reports on, e.g. 'class-size', 'teacher training', 'diet' and 'reducing disruptive behaviour'.
  • Hattie jumbled together and averaged the effect sizes of different measurements of student achievement, teacher tests, IQ, standardised tests and physical tests like rallying a tennis ball against the wall.
  • Hattie jumbled together and averaged effect sizes for studies that do not use achievement but something else, e.g. hyperactivity in the Diet study, i.e., he uses these as proxies for achievement, which he advised NOT to do in his 2005 ACER presentation.
  • The studies are often about non-school or abnormal populations, e.g., doctors, nurses, university students, tradesmen, pre-school children, and 'emotionally/behaviorally' disturbed students.
  • The US Education Dept benchmark effect sizes per year level, indicate another layer of complexity in interpreting effect sizes - studies need to control for age of students as well as the time over which the study runs. Hattie does not do this.
  • Related to the US benchmarks is Hattie's use of d = 0.40 as the hinge point of judgements about what is a 'good' or 'bad' influence. The U.S. benchmarks show this is misleading.
  • Most of the studies Hattie uses are not high quality randomised controlled studies but the much, much poorer quality correlation studies.
  • Most scholars are cautious/doubtful in attributing causation to separate influences in the precise surgical way in which Hattie infers. This is because of the unknown effect of outside influences or confounds.
  • Hattie makes a number of major calculation errors.
  • Professor John O'Neill is critical of Hattie's public presentations: "public policy discourse becomes problematic when the terms used are ambiguous, unclear or vague" (p1). The "discourse seeks to portray the public sector as ‘ineffective, unresponsive, sloppy, risk-averse and innovation-resistant’ yet at the same time it promotes celebration of public sector 'heroes' of reform and new kinds of public sector 'excellence'. Relatedly, Mintrom (2000) has written persuasively in the American context, of the way in which ‘policy entrepreneurs’ position themselves politically to champion, shape and benefit from school reform discourses" (p2).

Whilst it may be simpler and easier to see teaching as a set of discreet influences, the evidence shows that these influences interact in ways in which no-one, as yet, can quantify. It is the combining of influences in a complex way that defines the 'art' of teaching. 

Gabbie Stroud resigned from her teaching position and wrote:
"Teaching – good teaching - is both a science and an art. Yet in Australia today [it]… is considered something purely technical and methodical that can be rationalised and weighed.

But quality teaching isn't borne of tiered 'professional standards'. It cannot be reduced to a formula or discrete parts. It cannot be compartmentalised into boxes and 'checked off'. Good teaching comes from professionals who are valued. It comes from teachers who know their students, who build relationships, who meet learners at their point of need and who recognise that there's nothing standard about the journey of learning. We cannot forget the art of teaching – without it, schools become factories, students become products and teachers: nothing more than machinery."

Now, Hattie asks us to "contest the evidence" so we need to investigate the quality of the evidence that Hattie presents and scrutinise his subsequent interpretation.

Generally, Hattie dismisses the need for quality and makes the astonishing caveat, that there is, "... no reason to throw out studies automatically because of lower quality” (p11).

Emeritus Professor Ivan Snook, et al: "Hattie says that he is not concerned with the quality of the research ..., of course, quality is everything. Any meta-analysis that does not exclude poor or inadequate studies is misleading, and potentially damaging if it leads to ill-advised policy developments. He also needs to be sure that restricting his data base to meta-analyses did not lead to the omission of significant studies of the variables he is interested in" (p2).

Professor John O'Neill writes a significant letter to the NZ Education Minister regarding the poor quality of Hattie's research, in particular, the overuse of studies about University, graduate or pre-school students and the danger of making classroom policy decision without consulting other forms of evidence, e.g., case and naturalistic studies. "The method of the synthesis and, consequently, the rank ordering are highly problematic" (p7).

The class size meta-analyses are an example of Hattie's misrepresentation. That is, he interprets the meta-analysis differently to the actual authors of the study. I was surprised to find this a common issue in VL.

For example, Glass and Smith (1979), 1 of the 3 studies that Hattie uses for class size, summarise their data in a graph and table:

The trend and the difference between good and poor quality research are clearly displayed. The authors conclude,

"A clear and strong relationship between class size and achievement has emerged... There is little doubt, that other things being equal, more is learned in smaller classes" (p15).

Hattie uses an average (which is another issue discussed) from the above table of d = 0.09 (although it seems the average is closer to d = 0.25). Hattie concludes class size has minimal impact on student learning. In fact, he goes further than this, in his 2005 ACER presentation (using this research) he calls class size a DISASTER! Other times he interprets d < 0.40 as "going backwards"!

Another example is Hattie's interpretation of the studies used in 'decreasing disruptive behaviour'. Hattie used Reid et al (2004) and interprets their effect size of d = -0.69 as decreasing disruptive behaviour REDUCES student achievement by nearly a standard deviation! How can this be? 

Reid et al (2004) compared the achievement of students labelled with 'emotional/behavioural' disturbance (EBD) with a 'normative' group. They used a range of measures to determine EBD, e.g., students who are currently in programs for severe behaviour problems e.g., psychiatric hospitals (p132). The effect size was calculated by using (EBD achievement) - (Normative achievement) / SD (p133).

The negative effect size indicates the EBD group performed well below the normative group. The authors conclude: "students with EBD performed at a significantly lower level than did students without those disabilities across academic subjects and settings" (p130).

Hattie clearly misrepresents this study, as it is not investigating 'decreasing disruptive behaviour' as a teaching strategy or influence. 

The U.S. Department of Education has set up the National Center for Education Research whose focus is to investigate the quality of educational research - nearly 12,000 research papers have been reviewed. Their results are published in the What Works Clearing House. They also publish a Teacher Practice Guide which differs markedly from Hattie's results - see Other Researchers.

Importantly they focus on the QUALITY of the research and reserve their highest ratings for research that use randomised division of students into a control and an experimental group. Where students are non-randomly divided into a control and experimental group for what they term a quasi-experiment, a moderate rating is used. However, the two groups must have some sort of equivalence measure before the intervention. A low rating is used for other research design methods - e.g., correlation studies.

Given most of the research that Hattie uses is correlation based, he has skillfully managed to sidestep the quality debate within school circles (but not within the academic community - see References).

Extraordinary claims require extraordinary evidence.” Carl Sagan

Hattie concludes the ‘best’ influence is self-reported grades with d=1.44. Which Hattie interprets as advancing student achievement by 3+ years!

This is an AMAZING claim if true: that merely predicting your grade, somehow magically improves your achievement to that extent. I hope my beloved “under-achieving” Australian football team – The St Kilda Saints are listening – “boys you can make the finals next year just by predicting you will - you don't need to do all that hard training!"

Professor Dylan William, agrees, "... the effect sizes proposed by Hattie are, at least in the context of schooling, just plain wrong. Anyone who thinks they can generate an effect size on student learning in secondary schools above 0.5 is talking nonsense." The US National effect size benchmarks support Professor William's contention.

A more thorough analysis of the studies that Hattie uses will be helpful. Starting with the background to the research - Effect Size, Student Achievement, CLE,  a Year's Progress and Validity &Reliability. Then the two highest influences - self-report grades and Piagetian programs and compare with the classic math's teacher's technique of worked examples.

Other influences will be added as I or someone else can read the studies- so far: Class Size,  Concentration, Creativity, Peer InfluencesTeacher Immediacy, MentoringDiet, Ability GroupingStudent Control, Teacher Training and Behavior.

I'm particularly looking for maths teaching strategies: problem-solving, time on task, etc. If you can help and contribute please let me know.

A 4 part series about Kambrya College aired on Australian TV called Revolution School. The school used John Hattie as a consultant, but it appeared somewhat contradictory that the School used strategies that rank lowly on Hattie's scale of influence.

John Oliver gives a funny overview of the problems with Scientific Studies: