Professor O'Neill also extended these arguments in a 2012 publication, Material fallacies of education research evidence and public policy advice.
I will quote full sections of Prof O'Neill's letter pertaining to the quality of Hattie's synthesis:
Prof O'Neill calls for Hattie to remove inappropriate studies and re-rank influences (p. 4):
'Professor Hattie’s research comprised a synthesis of more than 800 meta-analyses relating to achievement. These meta-analyses cover early childhood education, schooling and college (tertiary) level education. It is important to note, therefore, that some of the studies included in (i) the synthesis; (ii) calculations of the average effect size of the studies within a topic category; and (iii) the rank order of effect sizes, are not in fact studies of schooling.
This creates two policy problems. First, the synthesis contains studies that have no proven relevance to the schooling sector and schooling policy decisions; and second, the inclusion of these studies skews the stated average effect size for a particular topic and, as a consequence, its overall position in Professor Hattie’s rank order.
If as Minister of Education you wish to use the Visible Learning synthesis as evidence to inform policy decisions in the schooling sector then I would point out that, minimally, all the studies unrelated to schooling need to be removed and the remaining average effect sizes recalculated and re-ranked.'Hattie claims he has only used studies relating the students aged 4-20.
But as Prof O'Neill has identified and I've confirmed this is not so. There are many many examples. Hattie used adults, mechanics, army personnel, doctors, nurses, masters and PhD university students. In October 2018, Hattie added 2 studies on class size from pre-school children.
Meta-analyses do not uncover the details of what happens in the classroom (p. 5):
'The synthesis has no interest in uncovering interaction or mediating effects (e.g. what happens in school classrooms when class sizes are reduced and teachers and learners interact differently, or the curriculum is changed). This is problematic for educators at all levels not least because real classrooms are all about interactions among variables, and their effects. Professor Hattie implicitly acknowledges this shortcoming when he states that ‘a review of non-meta-analytic studies could lead to a richer and more nuanced statement of the evidence’ (p. 255).
He also explicitly acknowledges that when different teaching methods or strategies are used together their combined effects may be much greater than their comparatively small effect measured in isolation (p. 245).
Let me state the basic shortcoming more bluntly. The non-meta-analytic and qualitative or mixed methods studies Professor Hattie has excluded are precisely the research investigations that do make visible not only (a) that class size matters to student achievement, but also (b) what the observed effects of different class sizes are on classroom teaching and learning practices as a whole, and furthermore (c) which sub-groups of students are most materially affected by larger or smaller class sizes and the attendant changes in classroom processes they require.'
Prof O'Neill urges for some quality control of the studies that Hattie uses (p. 7):
'While Visible Learning has been described in popular media internationally as ‘teaching’s Holy Grail’, and has anecdotally proved very influential in New Zealand government circles, the method of the synthesis and, consequently, the rank ordering are highly problematic for the teachers and policy makers whose practical decisions it is intended to inform.
... scrutinise the references and begin to establish whether the sources used in the synthesis are
(a) school-specific or should be discarded for the present purpose;
(b) quality assured or not – I discarded unpublished conference papers but retained doctoral theses;
(c) studies of general or specific populations of students such as those with learning disabilities, or of specific learning areas.'
Prof O'Neill's analysis of some of the research used for particular influences (p. 8):
'At the very least, the problems below should give you and your officials pause for thought rather than unquestioningly accepting Professor Hattie’ research at face-value, as appears to have been the case.
(i) The ‘micro-teaching’ influence (average effect size 0.88, rank 4) must be discounted as the synthesis provides no evidence that it has had any effect on school students’ achievement, only on that of pre-service teachers;
(ii) the ‘professional development’ average effect size (0.62, rank 19) should be recalculated as one of the studies discussed provides no evidence of student effects; another cites the general effect size not the lower student achievement effect. Recalculation gives an average effect size of 0.49 and drops the ‘influence’ to 48 in the rank order. This is a considerable difference which both illustrates the overall fragility of the ranking, and suggests extreme caution in its use as a simplistic policy ‘takeaway menu’.
(iii) ‘providing formative evaluation [to teachers]’ (average effect size 0.9, rank 3) is based on two meta-analyses only, both involving students with special educational needs and therefore is not obviously generalisable to all schools, classrooms and teachers;
(iv) similarly ‘comprehensive interventions for learning disabled students’ (average effect size 0.77, rank 7) does not have demonstrated general applicability;
(v) the ‘feedback’ influence (average effect size 0.73, rank 10) is significantly increased by inclusion of one meta-analysis on the use of music as an education reinforcement (effect size 2.87). The meta-analysis contains a high proportion of studies with participants who have severe learning and/or developmental delays, in both school and out of school settings, and includes both adults and children. If this one source is excluded, the average drops to 0.63 (rank 19). (It should be noted that feedback is one of the few teaching influence domains where there is a sufficient number of studies to indicate more clearly which single aspects of feedback are likely to have the most general practical effect on student achievement (e.g. ‘immediacy of teacher feedback’) and which least (e.g. ‘teacher praise’);
(vi) the influence ‘spaced vs. massed practice’ (average effect size 0.71, rank 12) includes two meta-analyses specifically on the learning of motor skills with an average effect size of 0.96. If these are discarded on the grounds that they are not of general relevance to most learning areas of the curriculum, the influence of spaced practice drops to 0.46 (rank 53);
(vii) the general importance and ranking accorded to ‘meta-cognition strategies’ (average effect size 0.69, ranking 13) must also be questioned on the basis that the two meta-analyses both refer to reading interventions only;
(viii) the findings for ‘problem-solving teaching’ (average effect size 0.61, rank 20) are derived from six meta-analyses, three of which are unpublished doctoral studies and one an unpublished conference paper. The average effect size of the two peer-reviewed journal meta-analyses (one in mathematics, the other science) is 0.46 (this would give a reduced rank of 53);
(ix) the commentary (p. 201) on the influence ‘teaching strategies’ (average effect size 0.6, rank 23) lists numerous possible strategies for inclusion in teachers’ pedagogical repertoires but gives no policy or practice guidance on which should be used with which learners, in which subjects, under what conditions and in which sequence or combination, nor for how long or with what frequency. Equally, the author comments that ‘most of these meta-analyses relate to special education or students with learning difficulties’ (p. 200). Their general applicability for all school students has not been demonstrated;
(x) the ranking of ‘co-operative vs. individualistic learning’ (average effect size 0.59, rank 24) must also be recalculated because the studies include one of adults (effect size 0.68) and one unpublished conference paper (effect size 0.88). If these are excluded the average effect size falls to 0.4 (rank 64);
(xi) in contrast, for study skills, (average effect size 0.59, rank 25), if the five college level meta-analyses are excluded, the average effect size of the remaining meta- analyses rises markedly to 0.74 (rank 9);
(xii) finally, for mastery learning (average effect size 0.58, rank 29) the meta-analysis with the largest effect size is an unpublished conference paper. If this is excluded, the average effect size is reduced slightly to 0.55 (rank 35) but even so this reduces its measured effect on student achievement to less than those of the home environment or socio-economic circumstances influences which Professor Hattie says at the outset cannot be influenced in schools.'
Many other academics reiterate the concerns O'Neill has, for example, Professor Pierre-Jérôme Bergeron points out,
'Hattie talks about success in learning, but within his meta-analyses, how do we measure success? An effect on grades is not the same as an effect on graduation rates. An effect on the perception of learning or on self-esteem is not necessarily linked to “academic success” and so on. A study with a short time-span will not measure the same thing as a study spanning a year or longer. And, of course, we cannot automatically extend observations based on elementary school students to secondary school or university students.'Hattie is aware of these issues:
as he excluded some meta-analyses as they were,
'mainly based on doctoral dissertations, ..., with mostly attitudinal outcomes, many were based on adult samples ... and some of the sample sizes were tiny' (VL, p. 196).
Most of the other peer reviews support and expand on John O'Neill's arguments - see References.