VisibleLearning: Collective Teacher Efficacy

Around 2017, Hattie identified Collective Teacher Efficacy (CTE) as having the highest average effect size in his research, reporting an effect size of d = 1.57.

Hattie's Claim

Hattie & Hamilton (2020) in Real Gold vs Fool's Gold stated,

"From the research, we know that the following things matter the most:

1. Achieving teacher collective efficacy.

...The research tells us that where teachers have this shared belief, great things happen." (p. 22)

Yet, they cautioned,

"...a key point of caution is that there is currently only one meta-analysis of 26 research studies. The evidence base is still too small to form anything more than speculative conclusions." (p. 19)

Hattie's rating (Feb 2023) - OVERALL CONFIDENCE, rates CTE very LOWLY as 2 out of 5.

So, from these contradictory ratings, it is difficult to know if CTE is real gold or fool's gold!

Hattie & Donohoo make Dubious Claims!

Given the limitations of this study and many other issues raised (below), Hattie & Donohoo (2018b), continue with extreme claims,

"collective teacher efficacy is greater than three times more powerful and predictive of student achievement than socioeconomic status. It is more than double the effect of prior achievement and more than triple the effect of home environment and parental involvement. It is also greater than three times more predictive of student achievement than student motivation and concentration, persistence, and engagement."

For Donohoo's presentation with Corwin, click here.

The Evidence

Hattie originally used one meta-analysis to determine his effect size, a PhD thesis:

Eells, Rachel Jean. (2011) Meta-Analysis of the Relationship Between Collective Teacher Efficacy (CTE) and Student Achievement. Loyola University of Chicago.

However, a 2nd meta-analysis was added in 2018, Çoğaltay & Karadağ (2017), which reports a correlation of r = 0.52 (p. 221).

Hattie then converts this correlation to an effect size (d) = 1.22.

Correlation Studies

Hattie's conversion of a correlation to an effect size, then comparing this with different methods of effect size calculation has been widely critiqued in the peer review - see Correlation.

Duplication of Studies

Çoğaltay & Karadağ (2017) used 35 studies, while Eells (2011) used 26.

Hattie's method of averaging many meta-analyses to get ONE effect size has also come under significant peer review critique (Shannahan, 2017 & Wecker et al., 2017) as meta-analyses often include the SAME studies. So Hattie's averaging counts the same study twice, or even multiple times. This causes significant bias to the effect size calculation - see Effect Size.

In this regard, Çoğaltay & Karadağ (2017) used the same 22 out of 26 studies that Eells (2011) used.

Removing duplicate studies is a major protocol of the meta-analysis method, as shown by Eells below. But, Hattie had ignored this and many other protocols for over 15 years.

However, due to the weight of peer review critique, Hattie has finally admitted that this is a major problem in his work and has re-examined his feedback research, Wisniewski, Zierer & Hattie (2020) "The Power of Feedback Revisited", subsequently getting totally DIFFERENT results,

"...a source of distortion when using a synthesis approach results from overlapping samples of studies. By integrating a number of meta-analyses dealing with effects of feedback interventions without checking every single primary study, there is a high probability that the samples of primary studies integrated in these meta-analyses are not independent of each other...Therefore, these would have to be considered as duplets–primary studies that are included in the result of the synthesis more than once–and consequently cause a distortion." (p. 2)

Dr Stephen Vainker also reports on this duplicate problem in Hattie's work here,

In 2020, Hattie acknowledged the issue of duplicate studies in his work. Despite this, he included Çoğaltay and Karadağ's meta-analysis, which significantly overlapped with Elle's research, sharing 22 out of 26 studies. As of January 2025, this meta-analysis still appears in Hattie's table without any qualifying remarks.

Comparison With Other Evidence Organisations

A key question for teachers is, why are the evidence summaries from different organizations so different and contradictory?

For example, the dominant UK organization, The Education Endowment Foundation (EEF), used a similar method to Hattie, the Meta-meta analysis. Yet, CTE does not appear in any of their recommendations.

Another example is the largest education evidence organization, the USA, What Works Clearing House (WWC). It focuses on the highest quality of evidence - the individual study, based on random allocation of students. Once again, CTE does not appear in any of their recommendations.

Another example is the large English organization Evidence-Based Education. In their Great Teaching Toolkit, they don't mention CTE either and they warn about these types of correlation studies,

"Much of the available research is based around correlational studies; in these the relationships between two variables is measured. While interesting, the conclusions drawn from them are limited. We cannot tell if the two have a causal relationship – does X cause Y, or does Y cause X? Or might there be a third variable, Z? Therefore, while we may find a positive correlation between a teaching practice and student outcomes, we do not know if the practice caused the outcome." (p. 11)

Summary of the Peer Review

Hoogsteen (2020 & 2020b) argues that the correlational nature of studies means we can not ascribe improved achievement to CTE, since it is equally likely that causation could be the reverse or bi-directional, i.e., Achievement increases CTE. He also casts doubt on the validity of the different CTE questionnaires. He concludes,

"...although results of meta-analysis have led collective efficacy to be named the top influence on student achievement, this review has shown that meta-analyses may not be the most reliable format to, especially as it relates to CTE, make this claim. In contrast to the comments by Donohoo (2018) that policy makers, system, and school leaders should aim to develop collective efficacy throughout reforms, maybe in light of everything presented in this review, school and system leaders should instead take heed to the words of Wiliam (2016) and not put a heavy emphasis on meta-analysis to identify priorities for the development of teachers. In doing so they risk spending a great deal of time helping teachers improve aspects of their practice that do not benefit students." (p. 583-584)

Dylan Wiliam (2019b) mounts a similar argument for bi-directionality or the opposite effect.

"It's not teacher collective efficacy that causes high achievement, it's high achievement that causes teacher collective efficacy." (Wiliam, 2019b, @3mins)

Klassen et al. (2011) conclude,

"This review investigates the state of teacher self- and collective efficacy research conducted from 1998 to 2009... Continuing problem areas were a lack of attention to the sources of teacher efficacy, continued measurement and conceptual problems, a lack of evidence for the links between teacher efficacy and student outcomes, and uncertain relevance of teacher efficacy research to educational practice..." (p. 21)

Zhao (2019) concludes,

"Measurement issues and congruence with established theory are a severe problem affecting collective efficacy research (Klassen et al., 2011). The illegitimacy of the teacher collective efficacy measurements cast a shadow on their results and might lead to inaccurate conclusions of the research." (p. 80)

Professor Nazım Çoğaltay kindly responded to my question about his view of Hattie's use of his research,

"The interpretation is completely subjective and contrary to the spirit of meta-analysis studies. However, my study was done in 2017. So it is a quantitative combination of the studies done until 2017. Eels' work is from 2011. Of course, there will be an intersection set in the works we combine. However, each researcher will discuss his own style in his research while doing meta-analysis. Therefore, interpretation of the correlation value was not correct without understanding how I did it."

Prof Çoğaltay clarified that he did not want his comments to be used polemically.

I responded that I have put together a summary of a large number of peer reviews, that form a strong critique of Hattie's methods, but not polemical.

Given the dominance of Hattie's views in Education Policy and his claim that CTE is the #1 influence in education & more than 3 times more effective than family life, socioeconomic status & student motivation (Hattie & Donohoo, 2018b) I believe a strong critique is necessary and warranted.

The Eells PhD

Eells reports an average correlation r = 0.617 between Collective Teacher Efficacy (CTE) and Student Achievement. (p. 111)

Hattie then converts this to an effect size of d = 1.57. But this conversion has major problems and reputable evidence organizations do not accept correlation studies as good evidence (e.g., EEF & WWC) - See Correlation for details.

Bergeron & Rivard (2017) in their critique of Hattie, published an example of the danger of converting a correlation to an effect size:

The beneficial effects of ice cream on intelligence - a delicious correlation (r = 0.7):

If we convert r to an effect size = 1.96!

Much larger than for Hattie’s Collective Teacher Efficacy!

From: https://www.economist.com/blogs/graphicdetail/2016/04/daily-chart

Many peer reviews, Lipsey et al. (1993, 2012), Bakker et al. (2019) the problem of converting correlation into an effect size, e.g., Kraft (2018),

"Effect sizes from studies based on correlations or conditional associations do not represent credible causal estimates."

Eells herself, confirms the problems of correlational studies,

"Correlational research can only go so far... correlation is not sufficient to determine causation... but this study is limited because it cannot address causation." (p. 122)

What is Collective Teacher Efficacy?

Eells definition (p. 8),

"Collective Teacher Efficacy is an emergent group level property referring to the perceptions of teachers in a school that the faculty as a whole will have a positive effect on the students (Goddard, Hoy, & Woolfolk Hoy, 2000)."

How was CTE measured?

Eells reports by 2 methods both self-report questionnaires - the CE scale and the CTBS.

The CE questionnaire, from p. 155:

The CTBS questionnaire, from p. 156:

Eells explains these self-reports are (p. 75),

"...then aggregated so that the school yields one score."

Eells also reports a major problem with using 2 different measures (p. 109),

"Another important moderator was the tool used to measure collective teacher efficacy. Studies that used the short form of the CE-SCALE ... yielded higher effect sizes than studies that used the CTBS..."

Hoogsteen (2020) also discusses this in his critical review, (note Hoogsteen uses the Correlation (r) and calls this the effect size, but it is different from Hattie's effect size (d),

"...Studies that used the long version of the Collective Efficacy Scale had an effect size of .605, the short form produced an effect size of 0.645, those that used the Collective Efficacy belief scale had an average effect of 0.464, and studies that used a different measure had an effect size of .455..." (p. 583).

Converting these correlations (r) to effect sizes (d), which as discussed before, is in itself problematic, we get the wildly different effect sizes of: 1.52, 1.69, 1.05, 1.02

Hattie & Donohoo (2018b) cite Tschannen-Moran & Barr (2004) who also discuss the importance of the questionnaire, yet Hattie & Donohoo fail to mention this,

"A new measure of collective teacher efficacy was developed for this study because of concerns that the existing measure developed by Goddard et al. (2000) artificially drives down the collective efficacy scores of schools in more challenging environments by its explicit measurement of task difficulty." (p. 199)

Hoogsteen (2020) discusses the problem of self-report,

"The one-time data collection in these studies presents the limitation of only measuring the relationship between self-efficacy and collective efficacy at one point in time and does not measure changes in either variable. Furthermore, the studies’ data were confined to self report surveys, and as Cansoy and Parlar (2017) admit, respondents’ appraisals of their collective efficacy may be highly influenced by their self-efficacy, particularly those in key managerial and leadership roles. Currently, at best, one can claim that self-efficacy and collective teacher efficacy co-vary, meaning when one is strong, so is the other, and vice-versa." (p. 579)

Zhao (2019) also discusses the problem of the different questionnaires used to measure CTE,

"...there is the measurement issue of collective efficacy. The most commonly used collective teacher efficacy measures are variations of Goddard et al.’s 21-item Collective Teacher Efficacy Scale or its revised 12-item short version. Goddard’s work has laid the foundation for collective efficacy research and made an outstanding contribution to the development of this research area. Nonetheless, Klassen et al. (2011) pointed out, in the literature review of teacher self- and collective efficacy research from 1998 to 2009, that some of the content of Goddard’s measures 'displays a lack of congruence with theory’' (p. 35). Several items were orientated toward external determinants, and others focused on teachers’ current abilities rather than on the 'more theoretically congruent forward-looking capabilities' (p. 35).

The scale Somech and Drach-Zahavy (2000) used in their study included items that was incongruent with efficacy theory, such as 'The teachers of this school have excellent job skills' and 'Team teachers that can perform their jobs as well as this team are rare' (p. 653), which are also present focused. However, a few measures investigated in Klassen et al.’s review were more congruent with efficacy theory. For example, several studies used Tschannen-Moran and Barr’s (2004) collective efficacy scale, a 12-item scale focusing on teachers’ collective capabilities, e.g., 'How much can teachers in your school do to produce meaningful student learning?' (p. 196), displaying a closer congruence to collective efficacy theory." (p. 79)

Hattie's Recent New Interpretation & Presentation of CTE

In his Corwin webinar (mid-2021) Hattie changed the influence of "collective teacher efficacy" to "teachers working together as evaluators of their impact".

Hattie's presentation does not faithfully represent the original studies and their definition of CTE.

How was Student Achievement Measured?

"School achievement is typically measured with standardized tests, but studies that use other measurements were included..." (p. 76).

An example of the range of tests used (p. 92), note the timing of CTE measure is also important, as Wiliam discusses below.

Bergeron & Rivard (2017) warn of the disparate measures of achievement that Hattie averages together,

"Hattie talks about success in learning, but within his meta-analyses, how do we measure success? An effect on grades is not the same as an effect on graduation rates. An effect on the perception of learning or on self-esteem is not necessarily linked to 'academic success' and so on. A study with a short time-span will not measure the same thing as a study spanning a year or longer. And, of course, we cannot automatically extend observations based on elementary school students to secondary school or university students."

Quality Control

Eells's flowchart from p. 79,

This flowchart is useful as it displays the protocol for meta-analyses and also illustrates the reason why many scholars (e.g., Shannahan (2017) & Wecker et al. (2017)) have asked Hattie to remove duplicate studies from his averages.

Eells also details another criterion (p. 75),

"Any studies conducted at the post-secondary level were excluded, since a college education is not compulsory, and the student population is comprised of self-selected individuals."

Does CTE cause High Achievement, or does High Achievement cause CTE - Bi-directionality?

Hoogsteen (2020) reports on this,

"...Donohoo (2018), notes that caution should be taken regarding the directionality of included variables because causal direction was determined in advance. This is the case in many studies, and in many instances conclusions are drawn that can be misleading. For example, both Bandura (1993) and Goddard (2001) investigated and confirmed the link between collective teacher efficacy and student achievement, however, in both cases achievement was measured prior to CTE." (p. 576)

So the timing of the CTE questionnaire, before or after the school rating on achievement, is significant. Yet, none of the studies control for this variable.

I've just witnessed Tom Brady and his team win the Super Bowl in 2021. If we could have measured their Collective Efficacy before the Win as compared to After the Win and also measured the team that Lost, I wonder what that would have told us?

The Bi-directionality of CTE is also a key component of the Hoogsteen (2020) critical review,

"The concept of bi-directionality as it relates to collective teacher efficacy is illustrated in Tschannen-Moran & Barr (2004) when they state that there is a reciprocal relationship between collective teacher efficacy and student achievement. The relationship is such that the school environment can affect teachers’ belief in their collective efficacy to improve student achievement and increased student achievement can increase teachers’ sense of collective efficacy (p. 196). Eells (2011) proffers a similar explanation when she notes that CTE influences cultural norms when belief leads to action, and cultural norms influence CTE when action transforms belief (p. 65). In fact, inquiry into the directionality of collective efficacy can be found in oft-studied areas linked to CTE, student achievement, teacher outcomes, leadership, and professional learning communities. It should be noted, definitive evidence regarding directionality has not been found." (p. 576)

Hoogsteen (2020) concludes,

"...conclusive evidence has not been found regarding the directionality of collective efficacy, no matter the area of study relating to CTE. Differing results and/or inconsistencies because of study methodology and timing of data collection have occurred. Even so, what has often happened is that researchers have made strong conclusions that are only partially supported because of the correlational nature of the studies, and the inconsistencies have led them to theorize and infer about reasons for their results so that they align with social cognitive theory. With that said, school and district leaders as well as policy makers should take a cautious approach when seeking to develop collective teacher efficacy as a primary strategy to increase achievement, improve outcomes such as innovation and behavior management, implement professional learning communities, and the other school-related outcomes discussed in this section. Hattie and most of the researchers he cites, use the theoretical work of Bandura (1986, 1997) who postulated four sources of efficacy-shaping information: mastery experience, vicarious experience, social persuasion, and affective state." (p. 580)

Implementation in Schools

Hoogsteen (2021) provides wise advice on the implementation of CTE in Schools,

"...it may be time to cease viewing collective efficacy as a shaper of the normative environment of a school (Goddard et al., 2000) and a characteristic to build. Instead, school leaders and researchers may need to view CTE as an indicator of the health of the organization and result of having implemented and performed effective school practices." (p. 83)

However, Hoogsteen (2020) also warned,

"As Loughland and Ryan (2020) note, with this declaration comes an uncritical adoption of the collective efficacy bandwagon when effect sizes are used in support of professional learning consultancies that are motivated by profit." (p. 575)

Limitations of this research

Eells details several limitations of her synthesis (p. 122) which are echoed regularly by the peer reviews of Hattie's work,

"A related concern is the apples and oranges issue: the mix of studies synthesized may be too dissimilar to yield meaningful overall results (Lipsey and Wilson, 2001). Within this meta-analysis, this problem became apparent when examining how the variables were defined. Student achievement was operationalized in numerous ways, and CTE, in a few ways. Additionally, there was some discrepancy concerning the timing of the measures. As previously stated, although the component studies addressed the correlation between CTE and achievement, some were clearer when distinguishing prior achievement from subsequent achievement. This posed a problem because that inconsistency reduces the certainty of any conclusions drawn about how CTE can predict achievement.

Another limitation for this meta-analysis was the small number of studies sampled. This line of research is less than 20 years old, and relies on a handful of studies to demonstrate the relationship in question...

Correlational research can only go so far. The effect size for this meta-analysis came from Pearson product-moment correlations, which quantify the strength of a relationship. While there will always be a correlation between cause and effect, correlation is not sufficient to determine causation. This research makes it clear that CTE and school achievement vary together: They are strongly and positively correlated, but this study is limited because it cannot address causation."

Then finally on p. 124,

"The confounding of moderator variables was a particular limitation for this study."

Teacher Self-Efficacy and Pupil Achievement: Much Ado About Nothing? International Evidence from TIMSS - here

Full Tweet thread here

Examples of Different Definitions of Collective Teacher Efficacy

Carmel Patterson (2021),

"Teacher collective efficacy (CE) is often misrepresented as collaborative task generation or cooperative marking or reviewing of student work. CE entails the work of teaching teams to garner collective contributions for:

-developing understanding of observations on learning,
-critiquing task requirements,
-assessing student work samples,
-creating reasoned strategies to implement and evaluation in context,
-expanding and clarifying individual teacher thinking with colleagues, and
-collectively developing practice in context."

VisibleLearning

Collective Teacher Efficacy

No comments:

Post a Comment

Blog Archive

About Me