Correlation



Hattie's top 3 ranked influences are correlation studies: Collective Teacher Efficacy, Self-Report Grades and Piagetian Programs.

Bergeron (2018b) illustrates the problem of Hattie's work, using an example of converting the correlation between ice cream sales & achievement to an effect size:

The beneficial effects of ice cream on intelligence - a delicious correlation (r = 0.7): 

Convert r to an effect size = 1.96! Larger than Hattie’s best intervention so far!



From: https://www.economist.com/blogs/graphicdetail/2016/04/daily-chart

The Largest Correlation - Intelligence & School Achievement

The largest correlation, relevant to education, that I've come across is from Deary, Strand, Smith, & Fernandes (2007). They summarise,
"This 5-year prospective longitudinal study of 70,000 + English children examined the association between psychometric intelligence at age 11 years and educational achievement in national examinations in 25 academic subjects at age 16. The correlation between a latent intelligence trait (Spearman's g from CAT2E) and a latent trait of educational achievement (GCSE scores) was 0.81."
When this correlation is converted to an effect size we get ES = 2.76. Once again, larger than any ES in Hattie's rankings in Visible Learning.

Cautions on Converting Correlations

"Garbage in, Gospel out." Dr Gary Smith (2014, p. 25),

Higgins & Simpson (2011), Bakker et al. (2019) & Kraft (2020) all warn of Hattie combining studies of correlations with experimental/intervention studies, e.g., Kraft counsels,
"Knowing whether an effect size represents a causal or correlational relationship matters for interpreting its magnitude. Comparing meta-analytic reviews that incorporate effect size estimates from observational studies (e.g., Hattie, 2009; Lipsey & Wilson, 1993) to those that only include experimental studies (e.g., Hill et al., 2008; Lipsey et al., 2012; Lortie-Forgues & Inglis, 2019) illustrates how correlational relationships are, on average, substantially larger than causal effects. It is incumbent on researchers reporting effect sizes to clarify which type their statistic describes, and it is important that research consumers do not assume effect sizes inherently represent causal relationships." (p. 3).
Sue Cowley cleverly writes about educational authorities using the high correlation between vocabulary size and achievement to direct teacher practice & shows it's more complicated than that.

Brady et al. (2023) warn of the trend in Educational Research of inferring causality from correlation studies as Hattie does.

The largest, independent & most reputable evidence organisations in the world - the USA, What Works Clearinghouse (WWC) and the English, Education Endowment Foundation (EEF) consider correlation studies to be of very low quality.

Another example is the large English organisation Evidence Based Education. In their Great Teaching Toolkit, they warn about these types of correlation studies,

"Much of the available research is based around correlational studies; in these the relationships between two variables is measured. While interesting, the conclusions drawn from them are limited. We cannot tell if the two have a causal relationship – does X cause Y, or does Y cause X? Or might there be a third variable, Z? Therefore, while we may find a positive correlation between a teaching practice and student outcomes, we do not know if the practice caused the outcome." (p. 11)
Gilmore et al. (2021) in their report to the Association of Mathematics Teachers regarding the misuse of research, warn of pedological conclusions made from correlation studies,
"correlational evidence does not tell us the direction of the relationship nor whether other un-measured factors cause the relationship." (p. 36)
This in part, accounts for the large differences in their evidence when compared to Hattie - see Other Researchers.


Many peer reviews, Lipsey et al. (1993, 2012), Bakker et al. (2019) detail the problem of converting correlation into an effect size, e.g., Kraft (2018),
"Effect sizes from studies based on correlations or conditional associations do not represent credible causal estimates."
Many scholars have asked Hattie to remove these low-quality studies. 

However, Hattie ignores this with an astonishing caveat, there is,

"no reason to throw out studies automatically because of lower quality" (VL, p. 11).
Snook et al. (2009, p. 2),
"Hattie says that he is not concerned with the quality of the research... of course, quality is everything. Any meta-analysis that does not exclude poor or inadequate studies is misleading, and potentially damaging if it leads to ill-advised policy developments. He also needs to be sure that restricting his data base to meta-analyses did not lead to the omission of significant studies of the variables he is interested in."
Terhart (2011),
"It is striking that Hattie does not supply the reader with exact information on the issue of the quality standards he uses when he has to decide whether a certain research study meta-analysis is integrated into his meta-meta-analysis or not. Usually, the authors of meta-analyses devote much energy and effort to discussing this problem because the value or persuasiveness of the results obtained are dependent on the strictness of the eligibility criteria" (p. 429).
See Collective Teacher Efficacy for an example of quality control.

But, Hattie constantly boasts that he has the largest set of studies so somehow this overrides the quality issue.

Bergeron & Rivard (2017) on Hattie's huge numbers:
"We cannot allow ourselves to simply be impressed by the quantity of numbers and the sample sizes; we must be concerned with the quality of the study plan and the validity of collected data."
Larsen (2014),
"the megalomaniac additive annexation of all sorts of meta-analyses is not concerned with methodologically critical self-reflections, nor with validity claims, i.e., it does not specify the limits to what can be said and made commensurable. The risk is that knowledge in the collected empirical data piles disappears when it is formalised in a second-, third-, and-fourth-order perspective" (p. 6).
Prof Terry Wrigley (2015) in Bullying by Numbers, gives a detailed analysis of this problem.

Also, Wrigley (2018, p. 365) in The power of ‘evidence’: Reliable science or a set of blunt tools? highlights the problem of Hattie's use of correlation quoting Hubert and Wainer (2013: 119),
"One might go so far to say that if only the value of rXY is provided and nothing else, we have a prima facie case for statistical malpractice."
Bergeron & Rivard (2017) show how r is converted to d:
"Hattie confounds correlation and causality when seeking to reduce everything to an effect size. Depending on the context, and on a case by case basis, it can be possible to go from a correlation to Cohen’s d (Borenstein et al., 2009):
but we absolutely need to know in which mathematical space the data is located in order to go from one scale to another. This formula is extremely hazardous to use since it quickly explodes when correlations lean towards 1 and it also gives relatively strong effects for weak correlations. A correlation of .196 is sufficient to reach the zone of desired effect in Visible Learning... 
It is with this formula that Hattie obtains, among others, his effect of creativity on academic success (Kim, 2005), which is in fact a correlation between IQ test results and creativity tests. It is also with correlations that he obtains the so-called effect of self-reported grades, the strongest effect in the original version of Visible Learning. However, this turns out to be a set of correlations between reported grades and actual grades, a set which does not measure whatsoever the increase of academic success between groups who use self-reported grades and groups who do not conduct this type of self-examination."
I've created an example of the problem with correlation here using a class of 10 students.


A moderate correlation of r = 0.69 gets converted into one of the largest effect sizes in Hattie's book of d = 1.91 - this would rank #1 on Hattie's list. 


A weak correlation of r = 0.29 gets converted into an effect size of d = 0.61 - this would rank #20.

Blichfeldt (2011) on Hattie's correlation,
"correlations or correspondence do not provide grounds for causation. Hattie mentions that correlations should not be confused with causal analyzes. It is striking that the book is first and foremost presented so that it is read easily as causal analyzes, of "what works" or leading to good test results and not that he ranks the 138 variables thereafter - as a list of disconnected factors."
DuPaul & Eckert (2012, p. 408) - behaviour:
"randomised control trials are considered the scientific "gold standard" for evaluating treatment effects... the lack of such studies in the school-based intervention literature is a significant concern."
Kelley & Camilli (2007, p. 33) - Teacher Training. Studies use different scales (not linearly related) for coding identical amounts of education. This limits confidence in the aggregation of the correlational evidence.

Studies inherently involve comparisons of nonequivalent groups; often random assignment is not possible. But, inevitably, this creates some uncertainty in the validity of the comparison (p. 33).

The correlation analyses are inadequate as a method for drawing precise conclusions (p. 34).

Research should provide estimates of the effects via effect size rather than correlation (p. 33).

Breakspear (2014, p. 13) states,

"Too often policy makers fail to differentiate between correlation and causation."
Blatchford (2016, p. 94) commenting on Hattie's class size research,
"Essentially the problem is the familiar one of mistaking correlation for causality. We cannot conclude that a relationship between class size and academic performance means that one is causally related to the other."
We are constantly warned that correlation does not imply causation! Yet, Hattie confesses: 
"Often I may have slipped and made or inferred causality" (p. 237).
Lind (2013, p. 2) also questions Hattie about his use of correlation and accuses Hattie of displaying the correlation and not the effect size when it suits him. This means the effect appears small but when converted it is large. The example Lind gives is VL, p. 197, where Hattie cites r = 0.64 for kinesthetic learning, but when converted d = 1.67. This is a huge effect! But, Hattie rejected this study,
"It is difficult to contemplate that some of these single influences... explain more of the variance of achievement that so many of the other influences in this book."
Sue Cowley,
"It sounds so wonderfully simple doesn’t it? All you have to do to become ‘smarter’ is to know more words. And this ties so perfectly into the learning is memory narrative – memorise more words and hey presto! You are smart....

But you can’t work backwards like that from research. It makes a nonsense of the vast complexity of the process. Correlation, as we should never tire of saying, is not causation. Sure, there’s a link, but you can’t put the cart in front of the horse. Knowing more words didn’t happen first – you can’t use it as a substitute for best practice in EYFS because it came as a result of something else. Which, in the case of early child development, is what we call ‘serve and return’ conversations, where loving and attentive caregivers pay careful attention to small children in order to support them, within rich and imaginative environments that enable learning. And there ain’t nothing simple about that."

2 comments:

  1. “The example Lind gives is VL, p. 197, where Hattie cites r = 0.67 for kinesthetic learning, but when converted d = 1.81. This is a huge effect! But, Hattie rejected this study”

    the data is wrong. kinesthetic learning (r=0.64), when converted d = 1.67

    ReplyDelete
    Replies
    1. thank you you are correct, Lind and Hattie quoted r=0.64. It was my mistake I wrote r=0.67 and will fix this.

      Delete