Hattie has divided this into the separate categories of:

Decreasing Disruptive Behavior - d = 0.34 (Rank = 80)
Classroom Management - d = 0.52 (Rank = 42)
Classroom Cohesion - d = 0.53 (Rank = 39)
Classroom Behavioral - d = 0.80 (Rank = 6)

There is a lot of overlap in these categories which is a problem concerning confounding variables. Teacher experience would suggest that improving behavior would increase achievement and some of these results verify this. However, the result for 'decreasing disruptive behavior' is below Hattie's hinge point d = 0.40.

Hattie has used polemic labels for influences below the hinge point of 0.40 e.g., 'also-rans', 'disasters', 'going backward' and 'distractions' - see 2005 ACER Conference - Hattie's Lecture here and Slides here, and his 2008 Nuthall lecture.

Decreasing Disruptive Behavior:

Mary Hudson in her best selling book, Public Education’s Dirty Secret, detailed extensive examples of teaching in New York, and concludes,
"Disruptive students must be removed from the classroom, not to punish them but to protect the majority of students who want to learn."
Hattie used the following 3 meta-analyses to get his average, d = 0.34. Note: the negative probability value (CLE) -49% is a major mistake now admitted by Hattie.
Authors/Disruptive BehaviourYear#studiesstudentsMean (d)CLEVariable
Skiba & Casey1985418830.9366%disruptive behaviour
Stage & Quiroz19979950570.7855%decreasing disruptions
Reid, Gonzlez, Nordness, et al2004252486-0.69-49%behavioural disturbance

Reid et al. (2004) seem to contradict the other 2 studies by indicating that decreasing disruptive behavior decreases student achievement. 

How can 'reducing disruptive behavior' decrease achievement to that extent?

Reid, et al., compared the achievement of students labeled with 'emotional/behavioral' disturbance (EBD) with a 'normative' group. They used a range of measures to determine EBD, e.g., students who are currently in programs for severe behavior problems e.g., psychiatric hospitals (p. 132).

The effect size was calculated by using (EBD achievement) - (Normative achievement) / SD (p. 133).

The negative effect size indicates the EBD group performed well below the normative group. The authors conclude: 
'students with EBD performed at a significantly lower level than did students without those disabilities across academic subjects and settings' (p. 130).
Hattie clearly misrepresents this study, as it is not investigating 'decreasing disruptive behavior' as a teaching strategy. Also, the sample is from a population of 'abnormal' students. Therefore this meta-analysis should not be included in Hattie's work.

If this study was removed, the average effect size would then be, d = 0.86 which would rocket its ranking up to #6.

Hattie's lack of consistency in interpreting effect sizes is a major problem, e.g., using Frazier et al. (2007) (see below) students were identified as having ADHD and proxies for achievement were used: GPA, class and parent rating, etc (p. 51).

The ADHD group was the control group and the normative group was the experimental group. So effect size was calculated using (Normative - ADHD) getting a large positive result.

This is the REVERSE of Reid et al. (2004) study who calculated effect size using (EBD - Normative) getting a large negative result.

Prof Adrian Simpson identifies this problem in 'The misdirection of public policy: comparing and combining standardised effect sizes',
'the experimental condition in some studies and meta-analyses is the comparison condition in others' (p. 455).
Skiba, et al. (1985) investigated reinforcement and feedback as strategies for reducing behavioral problems. They measured indicators of behavior, like noncompliance, off task and withdrawal; they did not measure achievement.

They conclude,
'Results indicated that both reinforcement and feedback type procedures are highly effective in the remediation of classroom behavior problems across a variety of behaviors, settings, and administrative arrangements' (p. 472).
So the effect size is an indicator of behavior NOT achievement.

Stage et al (1997) also measured disruptive behavior, not achievement,
'the treated students reduced their disruptive behavior compared to nontreated students' (from abstract).
Hattie should have adjusted for this inconsistency. If this were done the average effect size would rocket up in Hattie's rankings to #6. If other studies were removed then ranking would rise further to #3.

This would then be consistent with teachers' experience.

A key tenet of the scientific method is reliability, this simple analysis demonstrates how unreliable Hattie's rankings are.

Professor Adrian Simpson summarises,
'Using unequal comparisons or using unspecified ones makes it impossible to compare or combine effect sizes meaningfully' (p. 455).
'As such, using these ranked meta-meta-analyses to drive educational policy is misguided' (p. 451).

Classroom Management:

Hattie uses 1 study Marzano (2003) 'Classroom management that works', to get d = 0.52.

Note: an excellent blog detailing the issues with Marzano's research can be found here.

The study does measure achievement and aims to have randomised control and experimental groups, so the result is worthwhile. However, there are only 553 students used (p. 10) - Hattie does not mention this.

Classroom Cohesion:

Hattie uses 3 meta-analyses to get his average, d = 0.53.

Authors/Classroom cohesionYear#studiesstudentsMean (d)CLEVariable
Haertel et al198012178050.1712%classroom climate
Evans & Dion1991270.9265%group cohesion
Mullen & Copper19944987020.5136%group cohesion

Haertel et al. (1980) state,
'The socio-psychological environment is typically measured by asking students to agree on a three-to-five point scale to such items as "The students enjoy their work in the class" and "The goals of the class are clear." The purpose of the study is to estimate the magnitude of this relationship between learning and the environment of the classroom, and the relation of its variability across grades, subject areas and aspects of the learning environment' (p. 113).  
'Learning outcomes and gains, including student achievement, performance and self-concept, were found to be positively associated with student perceived Cohesiveness, Satisfaction, Task Difficulty, Formality, Goal Direction, Democracy and Material Environment. Negative associations were found with Friction, Cliqueness, Apathy and Dis-organisation" (p. 114).  
They warn however that, 'given the correlational basis of much of the research, however rigorously controlled by conventional statistical methods, the next steps in the research should emphasise continued true experimentation' (p. 114).

Evans & Dion (1991) found a large effect size relating cohesion to group performance. However, the studies were mostly on small sports teams and military units. Also, they state performance criteria for these types of groups are simple, e.g., win/loss record of a sports team. In contrast, the performance criteria for normal work groups are not easily identified (p. 696).

They summarise, 
'Given the nature of the studies used here, caution is suggested in generalising these results to "real" work groups' (p. 690).
Mullen & Copper (1994) Once again studies were mostly on very small military groups and sports teams. They used many of the studies that Evans & Dion used, thus introducing bias into Hattie's work. They isolated the factor of 'commitment to task' as the most influential aspect of cohesion rather than interpersonal attraction or group pride (p. 210).

They conclude, the effect was larger in small groups compared to large groups and was larger in correlation studies compared to experimental studies (p. 210).

One wonders how relevant these are to the classroom given they are about small sports and military groups.

Classroom Behavioral:

Hattie uses 3 meta-analyses to get his average, d = 0.80.

Mullen & Copper (1994) from 'classroom cohesion' state the general rule for meta-analyses - subjects should not be sampled from abnormal populations (p. 215). Yet, all of the following studies are from abnormal populations -students diagnosed with ADHD or a learning disability.

Authors/Classroom behavioralYear#studiesstudentsMean (d)CLEVariable
Bender & Smith199025.1.10178%disabled behavior
Dupaul & Eckart1997630.5841%ADHD behavior
Frazier et al2007720.7150%ADHD behavior

Bender & Smith (1990) This meta-analysis did not measure classroom behaviour as an influence on achievement, as Hattie implies. But rather, it compares learning disabled students with non-disabled students on a number of behavioural measures. They conclude, 
'Results showed that both methodologically strong and weak studies demonstrated significant behavioural deficits of children with learning disabilities compared to their non-disabled peers in each of five overall areas: on-task behaviour, off-task behaviour, conduct disorders, distractibility, and shy/withdrawn behaviour. Both observational and teacher rating data demonstrated these differences. Effect sizes for both groups of studies seemed to cluster around 1 standard deviation, suggesting noticeable and educationally significant impairment in the behaviour of children with disabilities' (p. 298).
They caution, 
'that drastic increases of handicapped students in certain mainstream classes may result in a more negative classroom climate, as mainstream teachers attempt to deal with increased behaviour problems' (p. 305).
Below is the summary table of their results (p. 301), I can not find any achievement measures:

DuPaul & Eckert (2012) [note: I have not been able to find their original 1997 study but have found this updated study] The authors state, 
'The purpose of the present meta-analysis is to provide a quantitative review of school-based intervention studies for students with ADHD' (p. 389).
They conclude, 
'The results of this meta-analysis indicate that school-based interventions for students with ADHD yield moderate to large effects for both behavioural and academic outcomes' (p. 401).
DuPaul & Eckert comment on the broader issues of meta-analysis that have been covered in other sections. Regarding proper experiments:  
'randomised control trials are considered the scientific gold standard for evaluating treatment effects ... the lack of such studies in the school-based intervention literature is a significant concern' (p. 408).
Regarding comparing different studies: 
'It is difficult to compare effect size estimates across research design types. Not only are effect size estimates calculated differently for each research design, but there appear to be differences in the types of outcome measures used across designs' (p. 408).
Frazier et al. (2007) Once again participants were identified as having ADHD and proxies for achievement were used: GPA, class ranking, parent/teacher rating, etc (p. 51).

The ADHD group was the control group and the normative group was the experimental group. This was done to get positive effect sizes (p. 51). So the result d = 0.71 means the ADHD group underperformed the normal group. 

Note: Reid et al. (2004) above reversed the calculation getting a negative result.

They summarise, there is a moderate to large discrepancy in academic achievement between individuals with ADHD and those without ADHD (p. 59).

These 3 studies simply compared students with ADHD or a learning disability with normal students and found that they performed less in achievement.


  1. Hattie's entire approach of creating a league table of effect sizes is totally flawed. There seems to be a growing acceptance of this in education - effect sizes for different types of studies are simply not comparable, they cannot be ranked in the way Hattie would like us to believe. Typically the higher quality the study the smaller the effect size, the poorer quality the study, the larger the effect size. Good on you for dissecting this lot though.

    1. THanks Derek, I totally agree with you. The more i read the studies the more misrepresentation i see. It is amazing this research is used to decide educational policy.