Wisniewski, Zierer & Hattie (2020), "The Power of Feedback Revisited" finally tries to address the key issues from the mounting peer review critique of Hattie’s methods & claims.
They report a much-reduced effect size (ES) of 0.48 and compare,
“The average weighted effect size differs considerably from the results of meta-synthesis (d = 0.79, Hattie and Timperley, 2007).” (p. 11)
Many peer reviews have consistently questioned the reliability of Hattie’s ES – see Effect Size, and the results of this study confirm that suspicion.
As a result, Hattie’s major claim in Visible Learning (VL) that the ES determines "what works best" is also questioned, as well as his notions of "High Impact", see HITs.
The Key Issues That They Attempt to Address
Discriminating studies with different definitions of feedback. (p. 7)
Discriminating between different outcomes (achievement, motivation & behavioural). (p. 7)
Removal of studies not in an Educational context. Although they don’t appear to deal with the large proportion of studies on adults who are NOT in a classroom setting. (p. 3)
Weighting studies. (p. 11)
Adjusting for the huge duplication of individual studies that occurred in VL. (p. 11)
Adjusting for different methods of ES calculation. (p. 3)
Removal of studies that did not have a control versus experimental group. (p. 3)
They focused on one method of ES calculation, switching from the fixed model used in VL to the random model. (p. 2)
The admission a meta-analysis is better than a META-meta-analysis, “Therefore, a meta-analysis is likely to produce more precise results.” (p. 3) This subtle difference is important to understand.
As a result of addressing the above areas, Hattie & his colleagues conclude,
Discriminating between different outcomes (achievement, motivation & behavioural). (p. 7)
Removal of studies not in an Educational context. Although they don’t appear to deal with the large proportion of studies on adults who are NOT in a classroom setting. (p. 3)
Weighting studies. (p. 11)
Adjusting for the huge duplication of individual studies that occurred in VL. (p. 11)
Adjusting for different methods of ES calculation. (p. 3)
Removal of studies that did not have a control versus experimental group. (p. 3)
They focused on one method of ES calculation, switching from the fixed model used in VL to the random model. (p. 2)
The admission a meta-analysis is better than a META-meta-analysis, “Therefore, a meta-analysis is likely to produce more precise results.” (p. 3) This subtle difference is important to understand.
As a result of addressing the above areas, Hattie & his colleagues conclude,
"...the significant heterogeneity in the data shows that feedback cannot be understood as a single consistent form of treatment." (p. 1)
The table below shows the details of the original 23 studies & why they were removed - note, another 9 new studies have also been added - adapted from p. 11.
One of the early peer review critiques of Hattie’s work was Snook et al (2009). They detailed major problems with the quality of the studies Hattie used.
"Hattie says that he is not concerned with the quality of the research... of course, quality is everything. Any meta-analysis that does not exclude poor or inadequate studies is misleading, and potentially damaging if it leads to ill-advised policy developments. He also needs to be sure that restricting his data base to meta-analyses did not lead to the omission of significant studies of the variables he is interested in." (p. 2)
Hattie retorted back,
'Where a meta-analysis itself was poor quality, I was not backward in saying so… Thus, claims that the studies were not appraised for their validity are misleading and incorrect. One of the very powers of meta-analysis is to deal with this issue. Readers and policy makers can have assurance that the conclusions I made are based on "studies, the merits of which have been investigated"'. (Hattie, 2010, p. 88)
Snook at el. (2010) responded,
'We are rather surprised that Hattie did not comment on what is perhaps the major problem in using meta-analysis in educational research... the variables being studied are often poorly conceptualised and the studies often far from rigorous. How does one clearly distinguish for research purposes between a classroom that is "teacher centred" and one which is "student centred" and, in comparing them, how can one control all the variables in a noisy and busy classroom with perhaps more than 30 participants?' (p. 96)
In VL Hattie claimed,
"Nearly all studies in the book are based on real students in front of real teachers in real schools." (VL, preface ix)
Hattie re-iterated in many of his presentations,
"nearly all of it is based on what happens in regular classrooms by regular teachers... 99.+% is based on classrooms run by ordinary teachers, not like in Psychology where they use under-graduate students, they bring in outsiders and this kinda stuff" (Hattie, 2017, reseachEd talk Melbourne @9minutes)
Yet, previously Hattie & Timperley (2007) cautioned about their best feedback study - Kluger & De Nisi,
"Many of their studies were not classroom based." (p. 85)
And NOW, "The Power of Feedback Revisited" confirms that most of the other studies Hattie cited were NOT about classrooms.
Also, Snook et al., critique, that poor quality studies have been included and good quality studies have been overlooked have been confirmed and this questions Hattie’s defense above.
Hattie & Timperley (2007) - The Power of Feedback
As mentioned in the CLE and Other Errors page, this is one of the most widely cited educational research papers, yet it seems to contain some very basic errors.
Firstly they head their main table as 12 meta-analyses, yet there are clearly 13. This prompted me to check the average, which I found to be 0.53 NOT 0.79 (as they published).
See (2017) also questions how these effect sizes were calculated,
"How Hattie arrived at the effect sizes that he did in his paper was not explained." (p. 8)
I've adjusted the table of meta-analyses indicating the studies removed or partially removed in Wisniewski, Zierer & Hattie (2020).
In addition, I've highlighted in blue the ES size reported differently in VL, e.g. Skiba et al is reported as 1.24 in 2007 but as 0.68 in VL in 2008.
See (2017) also investigated the research cited and found many anomalies,
"Going back to the studies that Hattie cited in his paper, we could not locate the effect sizes listed in the summary table (Table 1, p. 83). For example, the review states that the effect size of 54 studies in Lysakowski and Walberg (1982) was +1.13 whereas the original paper reports it at 0.97 (study-weighted). The figure 1.13 appears nowhere in their paper. In Hattie (1992) and repeated subsequently, it is said that ‘Skiba et al. (1985–1986) used 315 effect-sizes (35 studies) to investigate the effects of some form of reinforcement or feedback and found an effect-size of 1.88’, but the later 2007 paper reports this review as having 35 effect sizes not studies, and an effect size of +1.24." (p. 8)
Robert Slavin (2018) in his blog John Hattie is Wrong, also provides another pertinent example of more Hattie errors,
"A meta-analysis by Rummel and Feinberg (1988), with a reported effect size of +0.60, is perhaps the most humorous inclusion in the Hattie & Timperley (2007) meta-meta-analysis. It consists entirely of brief lab studies of the degree to which being paid or otherwise reinforced for engaging in an activity that was already intrinsically motivating would reduce subjects’ later participation in that activity. Rummel & Feinberg (1988) reported a positive effect size if subjects later did less of the activity they were paid to do. The reviewers decided to code studies positively if their findings corresponded to the theory (i.e., that feedback and reinforcement reduce later participation in previously favored activities), but in fact their “positive” effect size of +0.60 indicates a negative effect of feedback on performance.
I could go on (and on), but I think you get the point. Hattie’s meta-meta-analyses grab big numbers from meta-analyses of all kinds with little regard to the meaning or quality of the original studies, or of the meta-analyses."
So once again most of these studies do not have anything to do with feedback in the classroom.
The Focus of Feedback in English Schools:
The poor research studies used as the basis for deciding that "Feedback" is a high-impact strategy may account for the lack of success of focusing on it as an initiative.
Wiliam (2019) notes,
Christodoulou & Ashman discuss Wiliam's famous quote, 'feedback is true but useless' in their podcast here @ 18 minutes.
They talk about Wiliam's observation of a science class who hand in a report and the teacher provides written feedback, "you need to be more systematic in your investigation."
Wiliam then asks the student what are you going to do next?
The student replied, "I don't know, if I'd know how to be more systematic I would have done that."
The poor research studies used as the basis for deciding that "Feedback" is a high-impact strategy may account for the lack of success of focusing on it as an initiative.
Wiliam (2019) notes,
"the EEF’s emphasis on feedback as the single most cost-effective intervention justified 'additional pressures on teachers from inspectors that are ultimately not productive' even though few, if any, of the studies that the EEF included in its review looked at the effects of marking in school."Cohen (2019) also comments about this lack of success quoting Christodoulou (2016),
"drawing on similar foundational assumptions, takes on the issue of feedback in the context of the English school system. She deals with the puzzling failure of Assessment for Learning. AfL was a government programme for rolling out feedback strategy based on strong evidence from a range of scholarly sources, including experimental evidence. It commanded strong support among policymakers and a great deal of the teaching profession. It was successfully implemented at least to the extent that teachers in England now provide a great deal more feedback than before, and more than teachers in other countries. Yet, the theorised improved student outcomes did not materialise.
Christodoulou (2016) uses a range of evidence to argue that part of the reason for this is a failure to differentiate formative and summative assessment."Feedback is True but Useless
Christodoulou & Ashman discuss Wiliam's famous quote, 'feedback is true but useless' in their podcast here @ 18 minutes.
They talk about Wiliam's observation of a science class who hand in a report and the teacher provides written feedback, "you need to be more systematic in your investigation."
Wiliam then asks the student what are you going to do next?
The student replied, "I don't know, if I'd know how to be more systematic I would have done that."
They continue with other examples like telling a comedian who is not funny, you need to be more funny.
Christodoulou then recommends for a student to improve a certain skill they need to be given an activity that they can repeat and practice.
Work Load & Feedback:
Glenn Pearsall (2017), one of the most popular teaching experts in Australia links teacher work load with inefficient feedback practices, in his TER podcast, he said,
"Great feedback which the kid does not act on is a waste of both the kid and the teacher's time!"
Internal Feedback Using Comparisons
Prof David Nicol has done some exciting work involving students using comparisons & peer reviewing to get internal feedback about their work. Nicol claims this is more powerful than external feedback.
Nicol talks about this being a more natural process which is highly engaging. With the added benefit of reducing Teacher workload.
PISA and Feedback
He also pointed out that even though most researchers have feedback as an important teaching strategy, PISA has feedback NEGATIVELY correlated with Science performance.
The Negative Influences
Price, Handley, Millar & O'Donovan (2010). Feedback: all that effort, but what is the effect? Confirm feedback is complex and add relationships are important, e.g.,
Prof Paul Kirschner (2018) details the problems with feedback.
Michael Pershan (2019) writes an excellent and insightful blog on feedback. He basic argument is the evidence is poor and the notion of feedback is too general to be of any help to teachers.
Buckingham & Goodall (2019) in the Harvard Business Review - The Feedback Fallacy, detail arguments that feedback is a complex phenomena.
"Measuring ‘effectiveness’ requires clarity about the purpose of feedback. Unless it is clear what feedback is trying to achieve, its success cannot be judged...
Although a frequently used term, feedback does not have clarity of meaning. It is a generic term which disguises multiple purposes which are often not explicitly acknowledged. The roles attributed to feedback fall broadly into five, but not entirely delineated discrete, categories: correction, reinforcement, forensic diagnosis, bench-marking and longitudinal development (feed-forward)...
Accurate measurement of feedback effectiveness is difficult and perhaps impossible. Furthermore, the attempt to measure effectiveness using simple indicators – such as input measures or levels-of-service – runs the risk of producing information which is misleading or invalid and which may lead to inappropriate policy recommendations."
Other Worthwhile Commentary on Feedback
Michael Pershan (2019) writes an excellent and insightful blog on feedback. He basic argument is the evidence is poor and the notion of feedback is too general to be of any help to teachers.
Buckingham & Goodall (2019) in the Harvard Business Review - The Feedback Fallacy, detail arguments that feedback is a complex phenomena.
No comments:
Post a Comment