Class Size

Effect Size d= 0.21  (Hattie's Rank=132)

Prof Peter Blatchford, the lead author of one of the most comprehensive peer reviews of class size so far, Class Size Eastern and Western perspectives (2016) states,
'One reason for the prevalence of the unimportant view are several highly influential reports which have set in motion a set of messages that have generated a life of their own, separate from the research evidence, and have led to a set of taken for granted assumptions about class size effects.
Given the important influence these reports seem to be having in government and regional education policies, they need to be carefully scrutinised in order to be sure about the claims that are made' (p93).
Blatchford names Hattie's interpretation & summary of some studies as the major source of the evidence provided by these reports.

Interestingly Hattie later in the same book concedes,
'The evidence is reasonably convincing - reducing class size does enhance student achievement' (p113).
However, in the TV series Revolution School (part 3, 1min 20sec) Hattie claims reducing class size does not make a difference to the quality of education!

Worse, in Hattie's many public presentations from 2005-2015, he promoted the view that 'reducing class size' is a "disaster" and a "distraction" - see details below. 

Yet, most teachers would agree with Eddie Woo, the teacher who was named Australian of the Year, who said (video here - start at 41min):

Don't tell me that class size does not make a difference!

Hattie's "disaster" claims were based on the following three meta-analyses:

AuthorsYearNo. studiesstudentsMean (d)CLEVariable
Gene V Glass & Mary Lee Smith197977520,8990.096%Class size
McGiverin et al1999100.3424%Class size
Goldstein, Yang, Omar, &Thompson2000929,4400.2014%Class size

Does Hattie Misrepresent the 3 studies?

1. Gene Glass and Mary Lee Smith (1979) investigate a range of comparisons of class sizes of 40 versus 30 to classes of 1 versus 40. 

Hattie calculates an average by combing all class size reductions to get a low value of d = 0.09. 

Although this is another Hattie error as the average is 0.25 (see table below).

But, the class size reductions are totally different, so what does this ONE average mean?

If you look at this meta-analysis in more detail a totally different picture emerges, which is not represented by using this one average (Hattie only uses the one incorrect average).

The summary table on page 11, shows more detail:

Then on page 15-

A key finding from the above graph is the difference between well and poorly controlled studies.

Mary Lee Smith and Gene Glass conclude (p15):
'The curve for the well-controlled studies then, is probably the best representation of the class-size and achievement relationship...
A clear and strong relationship between class size and achievement has emerged... There is little doubt, that other things being equal, more is learned in smaller classes.'
In response to the effect of tutoring (a class size of 1) that may skew the relationship, Glass and Smith say (p15),
'When all those comparisons for which S = 1 were removed, the curve ... for well-controlled studies was even steeper than that shown; this finding is contrary to the claim that tutoring studies skewed the curve unnaturally.'
They also detail (p13):
'The class size and achievement relationship seems consistently stronger in the secondary grades than in the elementary grades.'
I contacted Prof Glass to ensure I interpreted his study correctly, he kindly replied,
'Averaging class size reduction effects over a range of reductions makes no sense to me. 
It's the curve that counts. 
Reductions from 40 to 30 bring about negligible achievement effects. From 20 to 10 is a different story. 
But Teacher Workload and its relationship to class size is what counts in my book.'
Wrigley (2018) verifies this, by quoting Gene Glass, 
'Indeed, Gene Glass, who originated the idea of meta-analysis, issued this sharp warning about heterogeneity: "Our biggest challenge is to tame the wild variation in our findings not by decreeing this or that set of standard protocols but by describing and accounting for the variability in our findings. The result of a meta-analysis should never be an average; it should be a graph."(Robinson, 2004: 29)' (p367).
Bergeron (2017) reiterates,
'Hattie computes averages that do not make any sense.'
Hattie in a recent interview with Hanne Knudsen (2017) John Hattie: I’m a statistician, I’m not a theoretician said,
'If, for example, a meta-analysis came out that showed e.g. that class size had a huge effect on learning, my model is wrong. I worry all the time about falsifiability' (p7).
Yet, it is ironic that the author of the class size study, Professor Gene Glass, who also invented the meta-analysis methodology, wrote a book with 20 other distinguished academics contradicting Hattie, '50 Myths and Lies That Threaten America's Public Schools: The Real Crisis in Education'.

In Myth #17: Class size does not matter; reducing class sizes will not result in more learning, the 21 academics collaboratively say,
'Fiscal conservatives contend, in the face of overwhelming evidence to the contrary, that students learn as well in large classes as in small... So for which students are large classes okay? Only the children of the poor?'
Thibault (2017) Is John Hattie's Visible Learning so visible? also questions Hattie's method of using one average to represent a range of studies (translation to English),
'We are entitled to wonder about the representativeness of such results: by wanting to measure an overall effect for subgroups various with various characteristics, this effect does not faithfully represent any of the subgroups that it encompasses! 
... by combining all the data as well as the particular context that is associated with each study, we eliminate the specificities of each context, which for many give meaning to the study itself!'
2. McGiverin et al (1989) state that, the lack of experimental control and diverse definitions of large and small are among the reasons cited for inconsistent findings regarding class size (p49).

In addition, they are critical of the Glass (1979) study for not using pragmatic class sizes. As a result, their study focused on second-year students with properly controlled studies using experimental and control groups (although not randomly assigned). They decided a more pragmatic definition of a large class size is about 26 and a small class size is about 19 (p49).

They introduce a caveat by quoting Berger (1981, p49). 
'Focusing on class size alone is like trying to determine the optimal amount of butter in a recipe without knowing the nature of the other ingredients.'
Whilst they get a reasonably high d = 0.34 they advise caution in the interpretation of this result (p54). Also, they make special mention of the confounding variables - the Hawthorne effect, novelty, and self- fulfilling prophecy.

3. Goldstein et al (2000) state their aim: 
'The present paper focuses more on the methodology of meta-analyses than on the substantive issues of class size per se.'
For a more detailed discussion on class size, they recommend looking at their previous papers (p400).

Summary of results from page 401:

studyYearno. studentseffect size -dexperiment typesmall classnormal class
11251non random1530
24256?16, 2330,37
9 STAR12644randomised13-1722-25
weighted ave0.23

A comparison of the studies shows different definitions for small and normal classes, e.g. study 2 defines 23 as a small class whereas in study 9 it is a normal class. 

Nielsen & Klitmøller (2017) in 'Blind spots in Visible Learning - Critical comments on "Hattie revolution"', discuss the disparate definitions of large and small classes in different studies (p7).

So comparing the effect size from different studies is not comparing the same thing!

The authors comment on another problem we have seen throughout VL (p403), 
'we have the additional problem that different achievement tests were used in each study and this will generally introduce further, unknown, variation.'
From Goldstein et al (2003)
'A reduction in class size from 30 to 20 pupils resulted in an increase in attainment of approximately 0.35 standard deviations for the low attainers, 0.2 standard deviations for the middle attainers, and 0.15 standard deviations for the high attainers' (p17).

So once again, the detail of the study is lost when Hattie uses ONE averaged effect size d value to represent that study.

Hattie's Interpretation:

In his recent collaboration with Pearson (2015) - What Doesn't Works in Education - the politics of distraction, he names class size as one of the major distractions. In previous presentations, he consistently labelled class size a "disaster" or as "going backwards" (Hattie's 2005 ACER presentation):

Yet, in another article in 2015 responding to critiques of his work he concludes:
'The main message remains, be cautious, interpret in light of the evidence, search for moderators, take care in developing stories, ... '
Using polemic language like 'disasters' is not being very cautious!

Corwin, the commercial arm of Hattie's Visible Learning continue to promote "reduction of class size does not work" (Sept 2018).

Yet, in July 2018 in an interview with Ollie Lovell, when pressed about administrators who use effect sizes for comparisons, Hattie retorts,
"that's too simplisitc I wouldn't do that"
Interview segment - here.

But, for many years Hattie has done that! His rankings, his public lectures that class size is a 'disaster' and the continued promotion by Corwin, saying 'class size does not work.'

Some of the most dangerous, simplistic comments by Hattie are,
'I would go further and claim that those students who do not achieve at least a 0.40 improvement in a year are going backwards...' (VL, p250). 
When unchallenged, he takes this one step further,
'teachers who only attain up to .40 are those “below average” in their influence' (Hattie 2010, p87).
At least, in what I think is the most comprehensive peer review of class size so far, Class Size Eastern and Western perspectives (2016), Hattie retreats from his polemic and concedes,
'The evidence is reasonably convincing - reducing class size does enhance student achievement' (p113).
He should instruct Corwin, to do the same!

But, Hattie then cleverly shifts the debate,
'Why is the (positive) effect so small?' (p105).
In the interview with Ollie Lovell Hattie answers his own question when he admits standardised tests are too narrow and get low effect sizes. Strangely, he forgets his class size studies used standardised tests!

Interview segment - here.

Prof Adrian Simpson also raises the standardised test issue in 'The misdirection of public policy: comparing and combining standardised effect sizes', showing standardised tests get low effect sizes. Simpson also goes further and shows that changing the test can yield effect sizes from 0 to infinity for the same intervention!

Simpson poses this as one of the reasons influences like 'feedback' have high effect sizes (they use specific tests) while influence like 'class size' have low effect sizes as narrow standardised tests are used.

Simpson also details that sampling from smaller populations is another major reason why effects of influences such as 'feedback', 'meta-cognition', etc are high while the effects for whole school influences - 'class size', 'summer school', etc are low (p463),
'One cannot compare standardised mean differences between sets of studies which tend to use restricted ranges of participants with researcher designed, tightly focussed measures and sets of studies which tend to use a wide range of participants and use standardised tests as measures.'
Bergeron (2017) and Slavin (2016) also confirm Simpson's analysis. Prof Slavin has a blog devoted to this question here.

Another answer to 'why class size effect sizes' are low, is pretty obvious when you look at the tables above. Hattie derives his lowest effect size of 0.09 (incorrectly). Then when you average very small effect sizes from class sizes of 40 down to 30 with large effect sizes of 20 down to 15 you get a low average.

Hattie's Interpretation Is Used by Politicians for Public Policy:

'Hattie’s work has provided school leaders with data that appeal to their administrative pursuits' Eacott (2017, p3).
The Australian Government in 2015, used Hattie to block significant funding to redress the socio-economic imbalance in Australian Schools - called the Gonski Review.

Professor Blatchford comments about this,
'When Christopher Pyne [the then Australian Education Minister] talked about prioritising teacher quality, rather than reducing class sizes, he set up a false and simplistic dichotomy' (p16, AEU News).
From New Zealand, a similar example, where Professor John O'Neill writes a significant letter to the NZ Minister of Education on the problem of using Hattie's research for class size policy.

Further in, Material fallacies of education research evidence and public policy adviceProfessor O'Neill states,
'... the Minister of Education declined to rule out increases in class size. In short, this was because the ‘independent observation’ of Treasury and the research findings of an influential government adviser, Professor John Hattie, were that schooling policy should instead focus on improving the quality of teaching.'
Writing about Hattie's class size research O'Neill warns that,
'Much of the terminology is ambiguous and inconsistently used by politicians, officials and academic advisers. The propositions are not demonstrably true – indeed, there is evidence to suggest they are false in crucial respects. The conclusion is, at best, uncertain because it does not take into account confounding evidence that larger classes do adversely affect teaching, learning and student achievement' (p2).
I am concerned about the unwavering confidence that Hattie displays when he talks about class size, given the caution and reservation that the scholars of each of his 3 studies discuss as well as other reputable scholars around the world. Reservations due to the lack of quality studies, the inability to control variables, the major differences in how achievement is measuredmajor confounding variables and benchmark effect sizes.

The Largest Analysis and Peer Review of the Class Size Research (so far):

Class Size Eastern and Western perspectives (2016), edited by Prof Blatchford et al. Note: Prof Blatchford has a dedicated website to class size research -

The editor's state,
'there are in fact relatively few high-quality dedicated studies of class size and this is odd and unfortunate given the public profile of the class size debate and the need for firm evidence based on purposefully designed research fit for purpose' (p275).
'What often gets overlooked in debates about class size is that CSR is not in itself an educational initiative like other interventions with which it is often (and in a sense unfairly) compared, for example, reciprocal teaching, teaching metacognitive strategies, direct instruction and repeated reading programmes; it is just a reduction of the number of pupils in a classroom' (p276).
Prof Blatchford warns again about correlation studies, 
'Essentially the problem is the familiar one of mistaking correlation for causality. We cannot conclude that a relationship between class size and academic performance means that one is causally related to the other' (p94).
The editors conclude, 
'the chapters in this book are only a start and much more research is needed on ways in which class size is related to other classroom processes. This has implications for research methods: we need more systematic studies, e.g. which use systematic classroom observations, but also high-quality multi-method studies, in order to capture these less easily measured factors.

There is some disagreement about which groups are involved but often studies find it is low attaining and disadvantaged students who benefit the most. Blatchford et al (2011) found evidence that smaller classes helped low attaining students at secondary level in terms of classroom engagement. Hattie (Chapter 7) develops the view that we might expect low attaining students to benefit from small classes in terms of developing self regulation strategies' (p278).
Blatchford concludes, 
'The aim is move beyond the rather tired debates about whether class size affects pupil performance and instead move things on by developing an integrative framework for better understanding the relationships between class size and teaching, with important practical benefits for education world wide' (p102).
Hattie's contribution to the book (Chapter 7):

Hattie appears to be an outlier in this book. Of the 17 scholars who have contributed to the book ONLY Hattie myopically uses the effect size statistic to fully interpret the research. All the others use contextual and detailed features of the research to reach the conclusion that class size is important and significant.

At least the weight of scholarship has caused Hattie to retreat from his polemic on reducing class size as 'a disaster' and 'going backwards' and he finally concedes, 
'The evidence is reasonably convincing - reducing class size does enhance student achievement' (p113).
But, Hattie cleverly reframes the issue to 
'Why is the (positive) effect so small?' (p105).
Given the significant amount of critique about Hattie's methodology - the lack of quality studies, the use of, disparate measures of student achievement, university students or pre-school children, correlation, the inconsistent definition of small and large class sizes, indiscriminate averaging, benchmark effect sizes, etc, etc. I was disappointed that Hattie did not address any of these issues. But rather focused on attacking Dr. David Zyngier's meta-review
'Zyngier's review misses the elephant in the room' (p106).
But if Zyngier misses the elephant in the room, then so do all the other 16 researchers contributing to the book. For example, in the following chapter (8) Finn & Shanahan, display what they believe to be significant findings (p124):

Hattie once again sidesteps the SIGNIFICANT issues raised by Zyngier (+ many others): e.g., the control of variables - the differing definition of large and small classes. Studies also differ on how to measure class size, some studies use a student/teacher ratio (STR) which includes many non-teaching staff like the principal, welfare staff,  library, etc. 
'Past research has too often conflated STR with class size' (p4).
Blatchford, et al (2016), also comment on this STR problem, 
'they are not a valid measure of the number of pupils in a class at a given moment' (p95).
Hattie just re-states that meta-analyses provide a reasonably robust estimate and myopically focuses on the effect size statistic. But he provides no defence for the validity issues. However, he concedes STR and class size are different, but he does not resolve the validity issue of using these disparate measures and just fobs off the argument by using a red herring - STR and Class size are related (p112) but he provides no evidence for this claim. 

Given the importance of class size research, STR and Class size need to be MORE than just related.

They need to be the SAME!!!!

Hattie includes a 4th study to his effect size average, Shin and Chung (2009) - effect size d = 0.20. But he conveniently does not inform the reader that this study re-analysed the same data (the Tennessee STAR study) as the previous meta-analyses that he used.

Ironically, Shin and Chung warn against creating an effect size from repeated use of the same data, 
'If a study has multiple effect sizes, the same sample can be repeatedly used. Repeated use of the same sample is, however, a violation of the independent assumption' (p14).
They also warn, 
'we found too many Tennessee STAR studies... We worry about the dependence issue' (p15).
It seems to me Hattie's strategy is to take the focus off the scrutiny of his evidence and re-direct our attention elsewhere - a strategy for politicians, NOT for researchers!

Join the group 'Class Size Matters' - here.

Teacher Morale:

Blatchford et al (2016), comment on the  associated issue of teacher morale and class size,
'Virtually all class size studies report that teacher morale is higher in small classes than in larger classes. The personal preference for small classes was demonstrated by STAR third-grade teachers interviewed at the end of the school year. Teachers were asked whether they would prefer a small class with 15 students or a $2,500 salary increase. Seventy percent of all teachers and 81 percent of those who had taught small classes chose the small class option over a salary increase' (p129).
Prof Gene Glass agrees, 
'Teacher Workload and its relationship to class size is what counts in my book.'

Blatchford et al, challenge the statements of the head of PISA, Andreas Schleicher,
'there was reference to ten myths of education, as expressed by Andreas Schleicher, one of which was the myth that smaller classes benefited academic performance. The editors of this book tend to side with Berliner and Glass (2014) who address what they see as the 50 myths and lies which threaten American public schools. Myth no. 17 in their list is the belief that reducing class size will not result in more learning' (p275).

Other Commentary

The Australian Education Union has published a comprehensive analysis of the class size research. They summarise that reducing class size does seem to improve student outcomes. Also, they highlight the problems with Hattie's methodology:
'The critics have cited the methodological problem of synthesising a whole range of meta-studies each with their own series of primary studies. There is no quality control separating out the good research studies from the bad ones. The different assumptions, definitions, study conditions and methodologies used by these primary studies mean that Hattie’s meta-analysis of the meta-analyses is a homogenisation which may distort the evidence (comparing apples with oranges)' (p13).

'The 0.21 effect he claims for class size is an average so that some studies may have found a significantly higher effect than that. For example, ‘gold standard’ primary research studies (using randomised scientific methodology) such as the Tennessee STAR project recorded a range of effect sizes including some at 0.62, 0.64 and 0.66, clearly well above the ‘hinge-point’ and the same as most variables which Hattie regards as very important' (p14).
From Professor John O'Neill's AMAZING letter. O'Neill quotes from a detailed case/naturalistic study by Blatchford (2011),
"Professor Blatchford makes the point that class size effects are ‘multiple’. For children at the beginning of schooling, there are significant potential gains in reading and maths in smaller classes. Children from ethnic minorities and children who start behind their peers benefit most. There is also a positive effect on behaviour, engagement and achievement, particularly for low achievers, where classes are smaller in the lower secondary school" (p10).
Leading researcher, Professor Dylan Wiliam states that the evidence is pretty clear that if you teach smaller classes you get better results. The problem is smaller classes cost a lot more (7min into full lecture).

Also, many scholars point out the irony in Hattie's view, that class size is a distraction - because the number of students in a class limits the ability of teachers to implement the kinds of changes that Hattie shows have the biggest effect, e.g., formative evaluation, micro teaching, behavior, feedback, teacher-student relationships, etc.

For example,  Dr. David Zyngier in his meta-review - 
'The strongest hypothesis about why small classes work concerns students’ classroom behaviour. Evidence is mounting that students in small classes are more engaged in learning activities, and exhibit less disruptive behaviour' (p17).

Each of these studies also discusses their limitations. In particular, Goldstein et al (2000) emphasise the issue, that has emerged for all of Hattie's synthesis; 
'... we have the additional problem that different achievement tests were used in each study, and this will generally introduce further, unknown, variation' (p403).
Goldstein et al (2003) go into detail about the problems of comparing correlation studies with random controlled experiments;
'… correlational studies that ... examined relationships between class size and children’s achievements at one point in time, are difficult to interpret because of uncertainties over whether other factors (e.g., non-random allocation of pupils to classes) might confound the results' (p3).
Goldstein et al (1998) point out another major confounding variable: 
'There is a tendency for schools to allocate lower achieving children to be in smaller classes. This bias means a considerable number of large cross-sectional studies (correlational) need to be ignored due to validity requirements' (p256).
Robert Slavin, Best-Evidence Synthesis: An Alternative to Meta-Analytic and Traditional Reviews (1986) also discusses this issue; a “best evidence synthesis” of any education policy should encourage decision makers to favor results from studies with high internal and external validity—that is, randomized field trials involving large numbers of students, schools, and districts. Note: Glass and Smith graph the difference between high and low-quality studies below.

Slavin also discusses the major issue of the disparate ways in which achievement is measured; one achievement test consisted of rallying a tennis ball against a wall as many times as possible in 30 seconds. Other studies used a total treatment time of only 30 minutes. Other studies used only post-secondary students (p7).

Dr. David Zyngier, has published an excellent meta-review on class size - Class size and academic results, with a focus on children from culturally, linguistically and economically disenfranchised communities.
'Noticeably, of the papers included in this review, only three authors supported the notion that smaller class sizes did not produce better outcomes to justify the expenditure' (p3).

'The highly selective nature of the research supporting current policy advice to both state and federal ministers of education in Australia is based on flawed research. The class size debate should now be more about weighing up the cost-benefit of class size reductions, and how best to achieve the desired outcomes of improved academic achievement for all children, regardless of their background. Further analysis of the cost-benefit of targeted CSR is therefore essential' (p16).
'Recognised in the education research community as the most reliable and valid research on the impact of class size reductions at that time, the Tennessee STAR project was a large series of randomised studies, followed up in Wisconsin by the SAGE project. After four years, it was clear that smaller classes did produce substantial improvement in early learning and cognitive studies, and that the effect of small class size on the achievement of minority children was initially about double that observed for majority children' (p7).
Zyngier concludes:
'Findings suggest that smaller class sizes in the first four years of school can have an important and lasting impact on student achievement, especially for children from culturally, linguistically and economically disenfranchised communities' (p1).
Professor Ivan Snook et al, in their peer review of Hattie, also comment in detail about class size. They also discuss the STAR study reporting effect sizes did reach 0.66. They conclude: 
'The point of mentioning these studies is not to 'prove' that Hattie is 'wrong' but to indicate that drawing policy conclusions about the unimportance of class size would be premature and possibly very damaging to the education of children particularly, young children and lower ability children. A much wider and in depth debate is needed' (p10).
Dr. Neil Hooley, in his review of Hattie - Making judgments about John Hattie's effect size talks about the complexity of classrooms and the difficulty of controlling variables, on the issue of class size he says, 
'Under these circumstances, the measure of effect size is highly dubious' (p44).
Dan Haesler has a detailed look at class size and other issues.

Kelvin Smythe provides a detailed review of Hattie's research on Class size.


  1. And my meta review of research also concludes that CSR had enormous impact on student outcomes in particular for children from disadvantage communities. See ANZSOG journal online Evidence Base

    1. Thanks David, your evidence about the the quality of teaching and learning as it relates to class size is very relevant. I've included some of your findings and insights into the commentary about class size. thank you

  2. All teachers should read this! Thank you.

  3. I always thought Hattie was wrong on this, every teacher knows class size makes a difference.