VisibleLearning: Feedback

Effect Size d= 0.73 (Hattie's Rank=10).

The 23 meta-analyses Hattie cities are:

While all these studies are GENERALLY about FEEDBACK, they totally differ on the specific aspect of feedback. For example, they range from extrinsic vs intrinsic rewards (money & chocolate vs pleasure & achievement), background music, feedback to the teacher vs to the student, computer feedback, timing of feedback, etc.

These studies highlight the major problem with Hattie's method of the combining disparate studies to get ONE effect size.

Hattie borrows from Gene Glass,

"A common criticism is that it combines 'apples with oranges' and such combining of many seemingly disparate studies is fraught with difficulties. It is the case, however, that in the study of fruit nothing else is sensible" (VL, p. 10).

Hattie's response cleverly avoids the need to justify specific studies, e.g., why is Standley (1996) included when it is about background music and not about the sorts of feedback strategies used by teachers?

Ashman (2018c) also detailed similar problems with the EEF's top strategy, meta-cognition, which Ashman says,

"appears to be a chimera; a monster stitched together from quite disparate things."

Sundar & Agarwal (2021) analyse Hattie and also warn of this,

"If learning strategies included in the meta-analysis are not consistent and logical, then beware! For example, if you find a meta-analysis that groups together “feedback” strategies including teacher praise, computer instruction, oral negative feedback, timing of feedback, and music as reinforcement, does that sound consistent to you?" (p. 6)

Ekecrantz (2015), Mannion (2017), Mannion (2020) & Fletcher-Wood (2021) also cast doubt about Hattie's representation of Feedback.

The details in the peer review summarised in this blog, show over & over again, that the way Hattie defines, selects & groups studies is highly questionable.

Dylan Wiliam had a humorous response to this issue,

"This underscores the importance of adequate theorization, identified by Kelley (1927). The "jingle fallacy" is assuming that two things with the same name are in fact the same, while the "jangle fallacy" is assuming that things with different names, are, in fact different."

Are The Studies About Real Kids in Real Classrooms?

Hattie consistently claimed,

"nearly all of it is based on what happens in regular classrooms by regular teachers... 99.+% is based on classrooms run by ordinary teachers, not like in Psychology where they use under-graduate students, they bring in outsiders and this kinda stuff." (2017, ResearchEd)

The peer review investigating the details of the studies show this is clearly NOT the case. In addition, they show many of the effect sizes measured behavior NOT achievement - details below.

The EEF Have Changed Their Method!

As a result of this significant critique the EEF have completely removed their original 7 meta-analysis that they had cited (click here for those studies) and have replaced them with INDIVIDUAL studies.

Unfortunately Hattie has not made the same changes.

The Detail for Some of the Studies Hattie Cites:

Standley (1996) - Music as Feedback. The highest ES = 2.87

Schulmeister & Loviscach (2014) detail the problem of Hattie averaging many different studies on different target groups (teachers and students) using very different feedback mechanisms, e.g., Standley (1996) is about the impact of music on behavioral interventions. They conclude,

"Only in a very broad sense has this study something to do with feedback; it is behavioristic reinforcement" (p. 9).

Some details from Standley (1996) below, show the studies are about adults in production lines or nursing homes and the ES measured some sort of behavior NOT achievement!

Rummel & Feinberg (1988). The detrimental effects of extrinsic rewards on intrinsic motivation.

Slavin (2018) in his blog "John Hattie is Wrong", gives a pertinent example of Hattie's use of feedback studies,

"A meta-analysis by Rummel and Feinberg (1988), with a reported effect size of +0.60, is perhaps the most humorous inclusion in the Hattie & Timperley (2007) meta-meta-analysis. It consists entirely of brief lab studies of the degree to which being paid or otherwise reinforced for engaging in an activity that was already intrinsically motivating would reduce subjects’ later participation in that activity. Rummel & Feinberg (1988) reported a positive effect size if subjects later did less of the activity they were paid to do. The reviewers decided to code studies positively if their findings corresponded to the theory (i.e., that feedback and reinforcement reduce later participation in previously favored activities), but in fact their “positive” effect size of +0.60 indicates a negative effect of feedback on performance.

I could go on (and on), but I think you get the point. Hattie’s meta-meta-analyses grab big numbers from meta-analyses of all kinds with little regard to the meaning or quality of the original studies, or of the meta-analyses."

Janson (2018) also confirms this,

"the positive effect size found indicates a negative effect of that feedback on the performance of that activity."

Some studies that Rummel and Feinberg (1988) used are shown below and Slavin is correct,

The regulatory body in mine and Hattie's jurisdiction, The Victorian Institute of Teaching (2018), publish dismissal proceedings for teachers they deem unfit to teach.

"The evidence of possible serious misconduct or lack of fitness to teach ... was:

Inappropriate gift buying, described by students as bribery, such as sweets and other items."

Skiba, Casey & Center (1985) - Non Aversive Procedures in the Treatment of Classroom Behavior Problems.

The authors list their inclusion criteria (p. 463) -

They then detail the study design (pp. 466-467 )

Once again this study is not about the type of feedback that Hattie claims, "where to next", etc. The dependent variable is Behavior, so the ES represents a change in Behavior NOT Achievement. Also, Hattie & Timperley (2007) report this ES as 1.24 yet in VL, Hattie reports the ES as 0.68. The discrepancy let alone the misrepresentation is huge.

Skiba et al. reported that the ES for both Feedback and Reinforcement were high but the standard deviations (variation) was also high, so they concluded,

"that although reinforcement strategies are generally highly effective, no single strategy is uniformly effective across all subjects or behaviors." (p. 474)

This also does not support Hattie's claim that praise dilutes feedback.

Hattie's Feedback Model

Hattie details his model with 3 major feedback questions as,

Where am I going? How am I going? and Where to next? (VL, p. 176).

Hattie continues to promote this model - see Hattie Presentations. Yet, I can't find those particular aspects of feedback defined or measured in any of the meta-analyses that Hattie cites.

Hattie seems to have got the "where to next" from a student survey he did and not from the meta-analyses he cites - see here @40mins. Then later in this presentation he says the most important aspect is feedback to the teacher @42.50mins.

However, Hattie previously stated,

"All the meta-analyses on the relation of the quality of teaching to learning come from student ratings of teachers by college and university students. It appears that student rating of the quality of teachers and teaching is related to learning outcomes, although the feedback that is provided to teachers rarely leads to improvements in their teaching or the effectiveness of the courses." (VL, p. 115)

Hattie then muddies the water further by claiming recently,

"I'm not even sure there is a concept such as formative or summative assessment." (@ 30mins)

Yet, in Visible Learning he published "providing formative evaluation" with one of the largest ES = 0.90 (VL, p. 162)

Hattie Obscures the True Nature of Feedback

It is ironic that one of the studies that Hattie used, Rummel & Feinberg (1988), warned of the major problem of combining disparate studies as Hattie has done,

"It is argued that by including studies that claim to examine this theory but are in reality, not adequately operationalizing the theoretical propositions, this only serves to obscure the true nature of this area" (p. 160).

Prof Terry Wrigley (2015) in Bullying by Numbers, gave an English humorist critique of Hattie's method,

"Its method is based on stirring together hundreds of meta-analyses reporting on many thousands of pieces of research to measure the effectiveness of interventions.

This is like claiming that a hammer is the best way to crack a nut, but without distinguishing between coconuts and peanuts, or saying whether the experiment used a sledgehammer or the inflatable plastic one that you won at the fair" (p. 5).

Hattie Finally & Quietly Concedes?

Hattie & Hamilton (2020) in "Real Gold Vs Fool's Gold" continue with this 'fruit' defense and once again do not address the significant issues of disparate studies,

"Any literature review involves making balanced judgements about diverse studies. A major reason for the development of meta- analysis was to find a more systematic way to join studies, in a similar way that apples and oranges can make fruit salad. Meta-analysis can be considered to ask about “fruit” and then assess the implications of combining apples and oranges, and the appropriate weighting of this combination." (p. 3)

Hattie & Hamilton (2020) continue,

"Unlike traditional reviews, meta-analyses provide systematic methods to evaluate the quality of combinations, allow for evaluation of various moderators, and provide excellent data for others to replicate or recombine the results. The key in all cases is the quality of the interpretation of the combined analyses. Further, as noted above, the individual studies can be evaluated for methodological quality." (p. 4)

Once again, Hattie cleverly implies that he has addressed all these issues, but the peer review examples above show that this is not the case.

Yet, around the same time in Wisniewski, Zierer & Hattie (2020) "The Power of Feedback Revisited", Hattie admits these differences, or heterogeneity of definitions, is a major problem.

"...the significant heterogeneity in the data shows that feedback cannot be understood as a single consistent form of treatment." (p. 1)

While not directly acknowledging the wide variety of issues in VL, Hattie and his co-authors completely excluded 8 of the original 23 meta-analyses and partially excluded a further 11 meta-analyses, resulting in a substantially reduced ES of 0.48 - details in Feedback Revisited.

This is an amazing turn around given Hattie's consistent claims that all of the studies he used were about regular students in regular classrooms, e.g. Hattie in 2017, Melbourne, ResearchEd address,

"nearly all of it is based on what happens in regular classrooms by regular teachers... 99.+% is based on classrooms run by ordinary teachers, not like in Psychology where they use under-graduate students, they bring in outsiders and this kinda stuff." (@9mins)

Wisniewski, Zierer & Hattie (2020) also recommend moving away from the combining of different meta-analyses as Hattie did in VL, to a focus on individual studies (p. 3).

Yet, Hattie and his commercial partner Corwin continued to combine disparate meta-analyses.

More Detail on the Nature of these Feedback Studies

Ruiz-Primo & Li (2013) Examining formative feedback in the classroom context: New research perspectives, reviewed over 9,000 studies on feedback (most of the studies Hattie used) and decided only 238 were of high enough quality to use (p. 217). But they also warn that,

"only 131 studies, or 4%, were considered appropriate for reaching some type of valid conclusion based on their selection criteria" (p. 218).

They detail that studies differed in many respects (p. 217):

They conclude,

"Clearly, the range of feedback definitions is wide... how is it possible to argue for feedback effects without considering the nuances and differences among the studies?" (p. 217).

Many other scholars also show Hattie just combines all these different studies together without regard for their differences, e.g.,

Proulx (2017) in "Critical Essay on the Work of John Hattie for the Teaching of Mathematics", observes that, Hattie's definition of feedback is not consistent with the collection of feedback studies he cites and therefore has nothing to do with his aim of "what works best".

Proulx also expressed his concern that even though Hattie compares studies with totally different definitions of feedback, Hattie constantly reminded us "feedback" is one of the best approaches to teaching.

Nielsen & Klitmøller (2017) in "Blind spots in Visible Learning - Critical comments on the 'Hattie revolution'", discuss in detail the many problems of Hattie's synthesis of feedback studies. They start with Hattie's definition of feedback,

"... feedback is information provided by an agent (e.g., teacher, peer, book, parent, or one’s own experience) about aspects of one’s performance or understanding. For example, a teacher or parent can provide corrective information, a peer can provide an alternative strategy, a book can provide information to clarify ideas, a parent can provide encouragement, and a learner can look up the answer to evaluate the correctness of a response. Feedback is a 'consequence' of performance" (VL, p. 174).

"In summary, feedback is what happens second, is one of the most powerful influences on learning, occurs too rarely..." (VL, p. 178).

They then detail a significant problem with Hattie's work in general but with the influence of feedback in particular, i.e., the different definitions of variables,

"it is our assessment that in four of the five 'heaviest' surveys that mentioned in connection with Hattie's cover of feedback, it is conceptually unclear whether they are operates with a feedback term that is identical with Hattie's" (p. 11, translated from Danish).

Furthermore, they state (p. 10),

"The breadth of the phenomenon of feedback varies clearly in the meta-analyses used."

They then go into more detail,

"... we will come closer to look at five of the meta-analyses that Hattie builds his calculation on... Hattie's feedback area consists of 23 meta-analyses including 67,931 people, 5 of them are a special heavy because they include 62,761 people corresponding to 92% of the total sample" (p. 11).

This also shows the major issue of how to weight studies and how different weightings derive totally different effect sizes.

They define their criteria for examination of the studies (p. 11):

1. Are the surveys valid, do they measure what Hattie says?

2. Are the meta-analyses transparent, so it is possible to examine the quality of the individual studies in the meta-analyses?

3. The use of randomized control group studies? Studies which use control groups are of higher quality.

See their summary table here.

They conclude,

"...our analysis shows that the transparency is low in two out of five studies, also only one of the five studies is consistently working with a control group design.

...the study by Kluger and DeNisi (1996), that Hattie (VL, p175) denotes 'the most systematic study addressing the effects of various types of feedback' has an effect size of d = 0.38 - i.e., a much lower impact assessment than the 0.73 ...other than that 38 percent of the surveys that are included in Kluger and DeNisi's study, a negative effect on the learning process - which is moreover contrary to Hattie's assumption that 'almost everything works'.

Kluger and DeNisi (1996) therefore denote feedback as a two-fold sword that both can lead to the student either learning significantly more or significantly less" (p. 11).

One of their pertinent observations is that many of the studies produce negative effects. They quote Shute (2008) "Within this large body of feedback research, there are many conflicting findings and no consistent pattern of results."

Barwe and Dahlström (2013) also argue it is better to look at a good meta-analysis rather than average a whole bunch of disparate meta-analyses as Hattie does.

"If one would have to choose between Hattie's average measure of the effect size of a certain influencing factor or the measure that a high-quality meta-analysis presents, we recommend the later. For example, Hattie takes up a systematic, high-quality meta-study of Kluger & DeNisi (1996) which deals with feedback and where the effect size is = 0.38 (p. 175).

Maybe you can have more confidence in that study and its value than the average Hattie produces (= 0.73)." (p. 27).

The Best Feedback Study - Kluger & DeNisi (1996)?

Hattie claimed the highest quality study is Kluger & DeNisi (1996) with an effect size of 0.38, which is below Hattie's magic hinge point of 0.40. Also, 32% of effects were negative meaning they decreased achievement!

David Didau (2016) briefly summarises the Kluger and DeNisi study (listen around 24 mins).

Professor Dylan Wiliam confirms 32% of studies show feedback has a negative effect (see video below).

Prof Richard E. Clark goes through what we can use from the Kluger & Denisi study here @ 49 minutes.

Busch & Watson (2019). The Science of Learning: 77 Studies That Every Teacher Needs to Know. Also detail that, the negative result of feedback, in this study, is one of the most important findings in the research (go to 1hr, 35 mins).

So there is strong counter to Hattie's disparaging claim that,

"When teachers claim that they are having a positive effect on achievement or when a policy improves achievement, this is almost always a trivial claim: Virtually everything works.

One only needs a pulse and we can improve achievement" (VL, p. 16).

Prof Wiliam details more problems with the feedback research - most of the studies are on university students and 85% of the feedback is ONE event lasting minutes!!! He then goes further and says that if you compare these different studies on feedback (as Hattie does),

"Your results are meaningless."

Yet Hattie says, 95-97% of things we do enhances achievement!

But There are Doubts About Kluger & DeNisi (1996)

Fletcher-Wood (2021) in Research: Is all our evidence all it’s cracked up to be? casts doubt on this studies' relevance to school age students, as most of the studies are on adults and university students.

"First, Kluger and DeNisi focused on the way feedback affects behaviour – not how it affects learning. Later authors extended Kluger and DeNisi’s conclusions to argue that feedback has powerful effects on learning – but this isn’t fully justified by the original research.

Second, Kluger and DeNisi included a range of studies – including those testing the effect of feedback on workers’ use of ear protection, hockey players’ body checks, and people’s extra-sensory perception (apparently feedback helps). Only nineteen of the 131 studies included were in schools and most focused on changing classroom behaviour – not learning."

So once again we have the "apples vs oranges" problem and also another problem we have mentioned many times - see Student Achievement, of Hattie mixing & combining studies that do not measure achievement but something else, in this case behaviour.

More Critique of Hattie's Method

Wrigley (2015) in Bullying by Numbers, critiquing the old EEF method in particular but also Hattie,

"Specifically, on Feedback, the Toolkit provides some more specific references to back up its very general claims, but many of these are over 20 years old and currently unobtainable. Seven more detailed references are given, each with an ‘effect size’, but these range from .97 to .20. Which is to be believed? Summaries follow, in highly technical language, mostly without indicating which stage or subject, what kind of learning, what kind of feedback, which countries the research took place in, and so on. Some of the sources are very critical of particular types of feedback...

Meta-analyses are used in Medicine to enable researchers to complement the reading of other research, though not to substitute for it; for example, if experiments have been based on small samples, averaging the results can suggest a general trend.

But the medical literature contains serious warnings against the misuse of meta-analysis. Statisticians are warned not to mix together different treatments, types of patient or outcome measures – the ‘apples and pears’ problem. If the original results differ strongly, they are advised to highlight the difference, not provide a misleading average. This is exactly what has not happened in the Toolkit, which should never have provided an average score for “Feedback” since the word has so many meanings" (p. 6).

The Problem With Most Feedback Studies

Ruiz-Primo & Li (2013) detail the quality issues which we consistently see in Hattie's synthesis,

"A high percentage of papers investigating the impact of feedback did so without using a control group...

Confounded effects, rarely mentioned in the synthesis and meta-analyses, pose another threat to validity when interpreting results of feedback studies" (p. 218).

"most of the studies do not provide information about the reliability and validity of the instruments used to measure the effects of feedback on the selected outcomes. The validity of feedback studies is threatened by a failure to attend to the technical characteristics of the instruments used to measure learning outcomes... Given these measures with ambiguity in technical soundness, can we fully trust results reported in synthesis and meta-analysis studies?

... there is an issue of ecological validity. For a research study to possess ecological validity and its results to be generalizable, the methods, materials, and setting of the study must sufficiently approximate the real-life situation that is under investigation. Most of the studies reported are laboratory-based or are conducted in classrooms but under artificial conditions (e.g., students were asked to identify unfamiliar uses of familiar objects)...

Furthermore, a high percentage of the studies focus on written feedback, and only a few on oral or other types of feedback, although oral feedback is more frequently observed in teachers’ daily assessment practices (see Hargreaves et al., 2000).

...we argue that formative feedback, when studied in the classroom context, is far more complex than it tends to appear in most studies, syntheses, or meta-analyses. Feedback practice is more than simply giving students feedback orally or in written form with externally or self-generated information and descriptive comments. We argue that feedback that is not used by students to move their learning forward is not formative feedback. We thus suggest that feedback needs to be examined more closely in the classroom setting, which should ultimately contribute to an expanded and more accurate and precise definition" (p. 219).

Some interesting findings,

"Research has made clear that students hardly read teachers’ written feedback or know how to interpret it (Cowie, 2005a, 2005b)" (p. 225).

"most of the publications on formative assessment and feedback include examples of strategies and techniques that teachers can use. Most of them, however, do not provide empirical evidence of the impact of these strategies on student learning; nor do they link them to contextual issues that may affect the effectiveness of the strategies...

there is a lack of studies conducted in real classrooms—the natural setting—where it would be important to see evidence that feedback strategies have substantive impact. Moreover, few studies have focused on feedback over extended periods or on factors that can moderate or mediate the effectiveness of feedback. Therefore, we cannot generalize what we know from the literature to classroom practices...

Rather than persisting with our common belief that feedback is something doable for teachers, we should strive to study formative assessment practices in the classroom, including feedback, to help teachers and students to do better. Given these unanswered questions, we need different and more trustworthy strategies of inquiry to acquire firsthand knowledge about feedback in the classroom context and to systematically study its effects on student learning" (p. 226).

VisibleLearning

Feedback

1 comment:

Blog Archive

About Me