Class Size

Effect Size d= 0.21  (Hattie's Rank=132)

Prof Peter Blatchford, the lead author of one of the most comprehensive peer reviews of class size so far, Class Size Eastern and Western perspectives (2016) states,
'One reason for the prevalence of the unimportant view are several highly influential reports which have set in motion a set of messages that have generated a life of their own, separate from the research evidence, and have led to a set of taken for granted assumptions about class size effects.
Given the important influence these reports seem to be having in government and regional education policies, they need to be carefully scrutinised in order to be sure about the claims that are made' (p93).
Blatchford names Hattie's meta-analyses as a major source of the evidence provided by these reports.

Interestingly Hattie later in the same book concedes,
'The evidence is reasonably convincing - reducing class size does enhance student achievement' (p113).
However, in VL and his presentation with Pearson (2015) he seemed to have a different view (he called class size a disaster and a distraction) when he used the following three meta-analyses:

AuthorsYearNo. studiesstudentsMean (d)CLEVariable
Gene V Glass & Mary Lee Smith197977520,8990.096%Class size
McGiverin et al1999100.3424%Class size
Goldstein, Yang, Omar, &Thompson2000929,4400.2014%Class size

Does Hattie Misrepresent the 3 studies?

1. Gene Glass and Mary Lee Smith (1979) investigate a range of comparisons of class sizes of 40 versus 30 to classes of 1 versus 40. 

Hattie calculates an average by combing all class size reductions to get a low value of d = 0.09. 

Although this is another Hattie error as the average is 0.25 (see table below).

But, given that the class size reductions are totally different, the question must be asked what does this average mean?

If you look at this meta-analysis in more detail a totally different picture emerges, which is not represented by using this one average (Hattie only uses the one incorrect average).

The summary table on page 11, shows more detail:

Then on page 15-

A key finding from the above graph is the difference between well and poorly controlled studies.

Mary Lee Smith and Gene Glass conclude (p15):
'The curve for the well-controlled studies then, is probably the best representation of the class-size and achievement relationship...
A clear and strong relationship between class size and achievement has emerged... There is little doubt, that other things being equal, more is learned in smaller classes.'
In response to the effect of tutoring (a class size of 1) that may skew the relationship, Glass and Smith say (p15),
'When all those comparisons for which S = 1 were removed, the curve ... for well-controlled studies was even steeper than that shown; this finding is contrary to the claim that tutoring studies skewed the curve unnaturally.'
They also detail (p13):
'The class size and achievement relationship seems consistently stronger in the secondary grades than in the elementary grades.'
I contacted Prof Glass to ensure I interpreted his study correctly, he kindly replied,
'Averaging class size reduction effects over a range of reductions makes no sense to me. 
It's the curve that counts. 
Reductions from 40 to 30 bring about negligible achievement effects. From 20 to 10 is a different story. 
But Teacher Workload and its relationship to class size is what counts in my book.'
Wrigley (2018) verifies this, by quoting Gene Glass, 
'Indeed, Gene Glass, who originated the idea of meta-analysis, issued this sharp warning about heterogeneity: "Our biggest challenge is to tame the wild variation in our findings not by decreeing this or that set of standard protocols but by describing and accounting for the variability in our findings. The result of a meta-analysis should never be an average; it should be a graph."(Robinson, 2004: 29)' (p367).
Bergeron (2017) reiterates,
'Hattie computes averages that do not make any sense.'
Hattie in a recent interview with Hanne Knudsen (2017) John Hattie: I’m a statistician, I’m not a theoretician said,
'If, for example, a meta-analysis came out that showed e.g. that class size had a huge effect on learning, my model is wrong. I worry all the time about falsifiability' (p7).
Yet, it is ironic that the author of the class size study, Professor Gene Glass, who also invented the meta-analysis methodology, wrote a book with 20 other distinguished academics contradicting Hattie, '50 Myths and Lies That Threaten America's Public Schools: The Real Crisis in Education'.

In Myth #17: Class size does not matter; reducing class sizes will not result in more learning, the 21 academics collaboratively say,
'Fiscal conservatives contend, in the face of overwhelming evidence to the contrary, that students learn as well in large classes as in small... So for which students are large classes okay? Only the children of the poor?'
Thibault (2017) Is John Hattie's Visible Learning so visible? also questions Hattie's method of using one average to represent a range of studies (translation to English),
'We are entitled to wonder about the representativeness of such results: by wanting to measure an overall effect for subgroups various with various characteristics, this effect does not faithfully represent any of the subgroups that it encompasses! 
... by combining all the data as well as the particular context that is associated with each study, we eliminate the specificities of each context, which for many give meaning to the study itself!'
2. McGiverin et al (1989) state that, the lack of experimental control and diverse definitions of large and small are among the reasons cited for inconsistent findings regarding class size (p49).

In addition, they are critical of the Glass (1979) study for not using pragmatic class sizes. As a result, their study focused on second-year students with properly controlled studies using experimental and control groups (although not randomly assigned). They decided a more pragmatic definition of a large class size is about 26 and a small class size is about 19 (p49).

They introduce a caveat by quoting Berger (1981, p49). 
'Focusing on class size alone is like trying to determine the optimal amount of butter in a recipe without knowing the nature of the other ingredients.'
Whilst they get a reasonably high d = 0.34 they advise caution in the interpretation of this result (p54). Also, they make special mention of the confounding variables - the Hawthorne effect, novelty, and self- fulfilling prophecy.

3. Goldstein et al (2000) state their aim: 
'The present paper focuses more on the methodology of meta-analyses than on the substantive issues of class size per se.'
For a more detailed discussion on class size, they recommend looking at their previous papers (p400).

Summary of results from page 401:

studyYearno. studentseffect size -dexperiment typesmall classnormal class
11251non random1530
24256?16, 2330,37
9 STAR12644randomised13-1722-25
weighted ave0.23

A comparison of the studies shows different definitions for small and normal classes, e.g. study 2 defines 23 as a small class whereas in study 9 it is a normal class. 

Nielsen & Klitmøller (2017) in 'Blind spots in Visible Learning - Critical comments on "Hattie revolution"', discuss the disparate definitions of large and small classes in different studies (p7).

So comparing the effect size from different studies is not comparing the same thing!

The authors comment on another problem we have seen throughout VL (p403), 
'we have the additional problem that different achievement tests were used in each study and this will generally introduce further, unknown, variation.'
From Goldstein et al (2003)
'A reduction in class size from 30 to 20 pupils resulted in an increase in attainment of approximately 0.35 standard deviations for the low attainers, 0.2 standard deviations for the middle attainers, and 0.15 standard deviations for the high attainers' (p17).

So once again, the detail of the study is lost when Hattie uses ONE averaged effect size d value to represent that study.

Hattie's Interpretation:

In his recent collaboration with Pearson (2015) - "What Works in Schools - the politics of distraction", he names class size as one of the major distractions. In previous presentations, he consistently labelled class size a "disaster" or as "going backwards" (Hattie's 2005 ACER presentation):

Yet, in another article in 2015 responding to critiques of his work he concludes:
'The main message remains, be cautious, interpret in light of the evidence, search for moderators, take care in developing stories, ... '
Using polemic language like 'distractions' is not being very cautious!

Yet, in what I think is the most comprehensive peer review of class size so far, Class Size Eastern and Western perspectives (2016), Hattie retreats from the above polemic and concedes,
'The evidence is reasonably convincing - reducing class size does enhance student achievement' (p113).
Hattie then cleverly shifts the debate,
'Why is the (positive) effect so small?' (p105).
One of the answers to that question is pretty obvious when you look at the table above where Hattie derives his lowest effect size of 0.09. When you average very small effect sizes from class sizes of 40 down to 30 with large effect sizes of 20 down to 15 you get a low average.

Prof Adrian Simpson also insightfully explains in 'The misdirection of public policy: comparing and combining standardised effect sizes', that sampling from smaller populations is a major reason why effects of influences such as feedback, meta-cognition, etc are high while the effects for whole school influences - class size, summer school, etc are low (p463),
'One cannot compare standardised mean differences between sets of studies which tend to use restricted ranges of participants with researcher designed, tightly focussed measures and sets of studies which tend to use a wide range of participants and use standardised tests as measures.'
Bergeron (2017) and Slavin (2016) also confirm Simpson's analysis. Prof Slavin has a blog devoted to this question here.

Another reason the effect size could be low is that each researcher used a different standard deviation (SD) in the effect size calculation. For example, Prof Gene Glass mostly used the control group SD or the within-class SD and he warned about comparing studies which use different SD's in his seminal paper, Integrating Findings: The Meta-Analysis of Research (1977)

Glass shows that since the effect size is calculated by dividing by the standard deviation (see formulas above) the standard deviation that is chosen can change the effect size in a significant way!

Glass gives this example (p370):
'The definition of ES appears uncomplicated, but heterogeneous group variances cause substantial difficulties. Suppose that experimental and control groups have means and standard deviations as follows:
The measure of experimental effect could be calculated either by use of Se or Sc or some combination of the two, such as an average or the square root of the average of their squares or whatever. The differences in effect sizes ensuing from such choices are huge:
The third basis of standardization—the average standard deviation—probably should be eliminated as merely a mindless statistical reaction to a perplexing choice. It must be acknowledged that both the remaining 1.00 and 0.20 are correct; neither can be ruled out as false... However, the control group mean is only one-fifth standard deviation below the mean of the experimental group when measured in control group standard deviations; thus, the average experimental group subject exceeds 58 percent of the subjects in the control group. These facts are neither contradictory nor inconsistent; rather they are two distinct features of a finding which cannot be captured by one number.'
Note: A few years after Gene Glass wrote this Cohen (1988) added another method to calculate standard deviation - the 'pooled standard deviation' which averages the variances first then finds the standard deviation. This seems to be the accepted method now and using this would get d = 0.39.

As can be seen in this example the effect size can be either 0.20, 0.33, 0.39 or 1 for the same data!

If comparing effect sizes across studies, as Hattie does, then Gene Glass warns,
'If some attempt is not made to deal with this problem, a source of inexplicable and annoying variance will be left in a group of effect-size measures' (p372).
Hattie does not do this.

Hattie's Interpretation Is Used by Politicians for Public Policy:

'Hattie’s work has provided school leaders with data that appeal to their administrative pursuits' Eacott (2017, p3).
The Australian Government in 2015, used Hattie to block significant funding to redress the socio-economic imbalance in Australian Schools - called the Gonski Review.

Professor Blatchford comments about this,
'When Christopher Pyne [the then Australian Education Minister] talked about prioritising teacher quality, rather than reducing class sizes, he set up a false and simplistic dichotomy' (p16, AEU News).
From New Zealand, a similar example, where Professor John O'Neill writes a significant letter to the NZ Minister of Education on the problem of using Hattie's research for class size policy.

Further in, Material fallacies of education research evidence and public policy adviceProfessor O'Neill states,
'... the Minister of Education declined to rule out increases in class size. In short, this was because the ‘independent observation’ of Treasury and the research findings of an influential government adviser, Professor John Hattie, were that schooling policy should instead focus on improving the quality of teaching.'
Writing about Hattie's class size research O'Neill warns that,
'Much of the terminology is ambiguous and inconsistently used by politicians, officials and academic advisers. The propositions are not demonstrably true – indeed, there is evidence to suggest they are false in crucial respects. The conclusion is, at best, uncertain because it does not take into account confounding evidence that larger classes do adversely affect teaching, learning and student achievement' (p2).
I am concerned about the unwavering confidence that Hattie displays when he talks about class size, given the caution and reservation that the scholars of each of his 3 studies discuss as well as other reputable scholars around the world. Reservations due to the lack of quality studies, the inability to control variables, the major differences in how achievement is measuredmajor confounding variables and benchmark effect sizes.

The Largest Analysis and Peer Review of the Class Size Research (so far):

Class Size Eastern and Western perspectives (2016), edited by Prof Blatchford et al. Note: Prof Blatchford has a dedicated website to class size research -

The editor's state,
'there are in fact relatively few high-quality dedicated studies of class size and this is odd and unfortunate given the public profile of the class size debate and the need for firm evidence based on purposefully designed research fit for purpose' (p275).
'What often gets overlooked in debates about class size is that CSR is not in itself an educational initiative like other interventions with which it is often (and in a sense unfairly) compared, for example, reciprocal teaching, teaching metacognitive strategies, direct instruction and repeated reading programmes; it is just a reduction of the number of pupils in a classroom' (p276).
Prof Blatchford warns again about correlation studies, 
'Essentially the problem is the familiar one of mistaking correlation for causality. We cannot conclude that a relationship between class size and academic performance means that one is causally related to the other' (p94).
The editors conclude, 
'the chapters in this book are only a start and much more research is needed on ways in which class size is related to other classroom processes. This has implications for research methods: we need more systematic studies, e.g. which use systematic classroom observations, but also high-quality multi-method studies, in order to capture these less easily measured factors.

There is some disagreement about which groups are involved but often studies find it is low attaining and disadvantaged students who benefit the most. Blatchford et al (2011) found evidence that smaller classes helped low attaining students at secondary level in terms of classroom engagement. Hattie (Chapter 7) develops the view that we might expect low attaining students to benefit from small classes in terms of developing self regulation strategies' (p278).
Blatchford concludes, 
'The aim is move beyond the rather tired debates about whether class size affects pupil performance and instead move things on by developing an integrative framework for better understanding the relationships between class size and teaching, with important practical benefits for education world wide' (p102).
Hattie's contribution to the book (Chapter 7):

Hattie appears to be an outlier in this book. Of the 17 scholars who have contributed to the book ONLY Hattie myopically uses the effect size statistic to fully interpret the research. All the others use contextual and detailed features of the research to reach the conclusion that class size is important and significant.

At least the weight of scholarship has caused Hattie to retreat from his polemic on reducing class size as 'a disaster' and 'going backwards' and he finally concedes, 
'The evidence is reasonably convincing - reducing class size does enhance student achievement' (p113).
But, Hattie cleverly reframes the issue to 
'Why is the (positive) effect so small?' (p105).
Given the significant amount of critique about Hattie's methodology - the lack of quality studies, the use of, disparate measures of student achievement, university students or pre-school children, correlation, the inconsistent definition of small and large class sizes, indiscriminate averaging, benchmark effect sizes, etc, etc. I was disappointed that Hattie did not address any of these issues. But rather focused on attacking Dr. David Zyngier's meta-review
'Zyngier's review misses the elephant in the room' (p106).
But if Zyngier misses the elephant in the room, then so do all the other 16 researchers contributing to the book. For example, in the following chapter (8) Finn & Shanahan, display what they believe to be significant findings (p124):

Hattie once again sidesteps the SIGNIFICANT issues raised by Zyngier (+ many others): e.g., the control of variables - the differing definition of large and small classes. Studies also differ on how to measure class size, some studies use a student/teacher ratio (STR) which includes many non-teaching staff like the principal, welfare staff,  library, etc. 
'Past research has too often conflated STR with class size' (p4).
Blatchford, et al (2016), also comment on this STR problem, 
'they are not a valid measure of the number of pupils in a class at a given moment' (p95).
Hattie just re-states that meta-analyses provide a reasonably robust estimate and myopically focuses on the effect size statistic. But he provides no defence for the validity issues. However, he concedes STR and class size are different, but he does not resolve the validity issue of using these disparate measures and just fobs off the argument by using a red herring - STR and Class size are related (p112) but he provides no evidence for this claim. 

Given the importance of class size research, STR and Class size need to be MORE than just related.

They need to be the SAME!!!!

Hattie includes a 4th study to his effect size average, Shin and Chung (2009) - effect size d = 0.20. But he conveniently does not inform the reader that this study re-analysed the same data (the Tennessee STAR study) as the previous meta-analyses that he used.

Ironically, Shin and Chung warn against creating an effect size from repeated use of the same data, 
'If a study has multiple effect sizes, the same sample can be repeatedly used. Repeated use of the same sample is, however, a violation of the independent assumption' (p14).
They also warn, 
'we found too many Tennessee STAR studies... We worry about the dependence issue' (p15).
It seems to me Hattie's strategy is to take the focus off the scrutiny of his evidence and re-direct our attention elsewhere - a strategy for politicians, NOT for researchers!

Join the group 'Class Size Matters' - here.

Teacher Morale:

Blatchford et al (2016), comment on the  associated issue of teacher morale and class size,
'Virtually all class size studies report that teacher morale is higher in small classes than in larger classes. The personal preference for small classes was demonstrated by STAR third-grade teachers interviewed at the end of the school year. Teachers were asked whether they would prefer a small class with 15 students or a $2,500 salary increase. Seventy percent of all teachers and 81 percent of those who had taught small classes chose the small class option over a salary increase' (p129).
Prof Gene Glass agrees, 
'Teacher Workload and its relationship to class size is what counts in my book.'

Blatchford et al, challenge the statements of the head of PISA, Andreas Schleicher,
'there was reference to ten myths of education, as expressed by Andreas Schleicher, one of which was the myth that smaller classes benefited academic performance. The editors of this book tend to side with Berliner and Glass (2014) who address what they see as the 50 myths and lies which threaten American public schools. Myth no. 17 in their list is the belief that reducing class size will not result in more learning' (p275).

Other Commentary

The Australian Education Union has published a comprehensive analysis of the class size research. They summarise that reducing class size does seem to improve student outcomes. Also, they highlight the problems with Hattie's methodology:
'The critics have cited the methodological problem of synthesising a whole range of meta-studies each with their own series of primary studies. There is no quality control separating out the good research studies from the bad ones. The different assumptions, definitions, study conditions and methodologies used by these primary studies mean that Hattie’s meta-analysis of the meta-analyses is a homogenisation which may distort the evidence (comparing apples with oranges)' (p13).

'The 0.21 effect he claims for class size is an average so that some studies may have found a significantly higher effect than that. For example, ‘gold standard’ primary research studies (using randomised scientific methodology) such as the Tennessee STAR project recorded a range of effect sizes including some at 0.62, 0.64 and 0.66, clearly well above the ‘hinge-point’ and the same as most variables which Hattie regards as very important' (p14).
From Professor John O'Neill's AMAZING letter. O'Neill quotes from a detailed case/naturalistic study by Blatchford (2011),
"Professor Blatchford makes the point that class size effects are ‘multiple’. For children at the beginning of schooling, there are significant potential gains in reading and maths in smaller classes. Children from ethnic minorities and children who start behind their peers benefit most. There is also a positive effect on behaviour, engagement and achievement, particularly for low achievers, where classes are smaller in the lower secondary school" (p10).
Leading researcher, Professor Dylan Wiliam states that the evidence is pretty clear that if you teach smaller classes you get better results. The problem is smaller classes cost a lot more (7min into full lecture).

Also, many scholars point out the irony in Hattie's view, that class size is a distraction - because the number of students in a class limits the ability of teachers to implement the kinds of changes that Hattie shows have the biggest effect, e.g., formative evaluation, micro teaching, behavior, feedback, teacher-student relationships, etc.

For example,  Dr. David Zyngier in his meta-review - 
'The strongest hypothesis about why small classes work concerns students’ classroom behaviour. Evidence is mounting that students in small classes are more engaged in learning activities, and exhibit less disruptive behaviour' (p17).

Each of these studies also discusses their limitations. In particular, Goldstein et al (2000) emphasise the issue, that has emerged for all of Hattie's synthesis; 
'... we have the additional problem that different achievement tests were used in each study, and this will generally introduce further, unknown, variation' (p403).
Goldstein et al (2003) go into detail about the problems of comparing correlation studies with random controlled experiments;
'… correlational studies that ... examined relationships between class size and children’s achievements at one point in time, are difficult to interpret because of uncertainties over whether other factors (e.g., non-random allocation of pupils to classes) might confound the results' (p3).
Goldstein et al (1998) point out another major confounding variable: 
'There is a tendency for schools to allocate lower achieving children to be in smaller classes. This bias means a considerable number of large cross-sectional studies (correlational) need to be ignored due to validity requirements' (p256).
Robert Slavin, Best-Evidence Synthesis: An Alternative to Meta-Analytic and Traditional Reviews (1986) also discusses this issue; a “best evidence synthesis” of any education policy should encourage decision makers to favor results from studies with high internal and external validity—that is, randomized field trials involving large numbers of students, schools, and districts. Note: Glass and Smith graph the difference between high and low-quality studies below.

Slavin also discusses the major issue of the disparate ways in which achievement is measured; one achievement test consisted of rallying a tennis ball against a wall as many times as possible in 30 seconds. Other studies used a total treatment time of only 30 minutes. Other studies used only post-secondary students (p7).

Dr. David Zyngier, has published an excellent meta-review on class size - Class size and academic results, with a focus on children from culturally, linguistically and economically disenfranchised communities.
'Noticeably, of the papers included in this review, only three authors supported the notion that smaller class sizes did not produce better outcomes to justify the expenditure' (p3).

'The highly selective nature of the research supporting current policy advice to both state and federal ministers of education in Australia is based on flawed research. The class size debate should now be more about weighing up the cost-benefit of class size reductions, and how best to achieve the desired outcomes of improved academic achievement for all children, regardless of their background. Further analysis of the cost-benefit of targeted CSR is therefore essential' (p16).
'Recognised in the education research community as the most reliable and valid research on the impact of class size reductions at that time, the Tennessee STAR project was a large series of randomised studies, followed up in Wisconsin by the SAGE project. After four years, it was clear that smaller classes did produce substantial improvement in early learning and cognitive studies, and that the effect of small class size on the achievement of minority children was initially about double that observed for majority children' (p7).
Zyngier concludes:
'Findings suggest that smaller class sizes in the first four years of school can have an important and lasting impact on student achievement, especially for children from culturally, linguistically and economically disenfranchised communities' (p1).
Professor Ivan Snook et al, in their peer review of Hattie, also comment in detail about class size. They also discuss the STAR study reporting effect sizes did reach 0.66. They conclude: 
'The point of mentioning these studies is not to 'prove' that Hattie is 'wrong' but to indicate that drawing policy conclusions about the unimportance of class size would be premature and possibly very damaging to the education of children particularly, young children and lower ability children. A much wider and in depth debate is needed' (p10).
Dr. Neil Hooley, in his review of Hattie - Making judgments about John Hattie's effect size talks about the complexity of classrooms and the difficulty of controlling variables, on the issue of class size he says, 
'Under these circumstances, the measure of effect size is highly dubious' (p44).
Dan Haesler has a detailed look at class size and other issues.

Kelvin Smythe provides a detailed review of Hattie's research on Class size.


  1. And my meta review of research also concludes that CSR had enormous impact on student outcomes in particular for children from disadvantage communities. See ANZSOG journal online Evidence Base

    1. Thanks David, your evidence about the the quality of teaching and learning as it relates to class size is very relevant. I've included some of your findings and insights into the commentary about class size. thank you

  2. All teachers should read this! Thank you.

  3. I always thought Hattie was wrong on this, every teacher knows class size makes a difference.