CLE & Other Errors

The effect size (d) is equivalent to a 'Z-score' of a standard normal distribution.

The Common Language Effect Size (CLE) is closely related to Pr (Z> x) and is used to interpret an effect size by converting an effect into a probability.

Hattie borrowed the CLE from McGaw and Wong (1992, p. 361) which they defined as,

"the probability that a score sampled at random from one distribution will be greater than a score sampled from some other distribution." Hattie used the example from McGaw and Wong (1992) of the differences in height between men and women (p. 362). If a man and women are sampled randomly, the chance the man will be taller than the women is pr (z > -1.41) or around 92% as shown below:

However, Hattie has calculated CLE probability values of between -49% and 219%. The researchers, Higgins and Simpson (2011), Topphol (2011) and Bergeron (2017) identified that these values are not possible.

Bergeron (2017) states,

"To not notice the presence of negative probabilities is an enormous blunder to anyone who has taken at least one statistics course in their lives. Yet, this oversight is but the symptom of a total lack of scientific rigor, and the lesser of reasoning errors in Visible Learning."
Topphol (2011) states,
"I will show that Hattie makes a systematic mistake in his calculations of CLE. In addition, his judgement, on how a calculated CLE should be interpreted in the context, suggests that he misunderstands what information CLE provides." (p. 463, Translated).
As a result, Hattie finally has now admitted that he calculated all CLE incorrectly. Although, he now says the calculation was not important. However, originally he did say that,
"in all examples in this book, the CLE is provided to assist in the interpretation of the effect size" (p. 9). 
Given Hattie's mantra that interpretations are the important aspect of his synthesis, this is a significant mistake. 

Topphol (2011), details that Hattie's explanations of particular CLE's are also incorrect, e.g., an effect size d = 0.29 must create a CLE of at least 50%. Excerpt from VL p. 9,

Topphol (2011) shows this interpretation is not correct on a number of levels,
"On page 9 Hattie give an example to help the reader interpret the CLE. “Now, using the example above, consider the d = 0.29 from introducing homework […]. The CLE is 21 per cent […]” First, 21 per cent is not correct, it should be approximately 58 per cent. The example continues “[…] so that in 21 times out of 100, introducing homework into schools will make a positive difference, or 21 per cent of students will gain in achievement compared to those not having homework”. This interpretation is not correct, and it would not have been even with the correct percentage.  
The CLE does not tell us how many students that gain in achievement, with a CLE of 58 per cent it is in fact possible that every student gain in achievement. In this context a correct interpretation would be that a CLE of 21 per cent (58 per cent) means that if you draw a random pair of students, one from the population with homework and the other from the population without (otherwise identical populations), there would be a probability of 21 per cent (58 per cent) that the student with homework achieves higher than the one without. This is not equivalent with Hattie’s explanation."
Ecological Fallacy, Topphol again,
'Hattie continues his explanation: “Or, if you take two classes, the one using homework will be more effective 21 out of a 100 times”. His interpretation of the percentages is now correct, but he seems to make what is often referred to as an ecological fallacy.'

It is not obvious what he means by one class is more effective than the other class. Nevertheless, what he writes indicates strongly that he means that an effect size, here a CLE, on an individual level can be transferred unchanged to the class level. First he writes about an effect on student level, the on the class level, using the same effect size value. This is not correct. The effect size on class level will, in normal circumstances, be larger than on student level, often noticeable larger.'
Topphol provides the following simulation,

'A computer program has been used to draw random samples of student scores from given population distributions using a Monte Carlo method (see e.g. Mooney 1997 or Rubinstein & Kroese 2008). In figure (a) the results for one million individual students are shown. Half of the students were randomly chosen to constitute a control group and the rest went into a treatment group. The two curves, solid and dashed, show how the students’ results were distributed. The difference between the two distributions corresponds to an effect size d = 0.30 .  
The students were then randomly grouped together in classes of 20 students each. For every class a class score were calculated as the arithmetic mean of the individual student scores. The class score distributions are plotted in figure (b), both control and treatment classes. The difference between the mean values is the same in (a) and (b), but the standard deviations are substantially reduced in (b), as expected from the central limit theory. Calculating the class effect size gives d = 1.34 , a factor of 4.46 larger than the individual effect size, also as expected; a substantial increase in effect size. Converted to CLE we get 58 per cent on student level and 82 per cent on class level. is nevertheless a mistake to directly transfer an effect size from student to class level. When Hattie in his example uses the same value of CLE on class level as on student level, it is an ecological fallacy.' 
More Errors

Then again on page 42, Hattie makes the same error citing d = 0.67 has a CLE of 48%. But 48% would indicate a negative d value. See table 1 from Bergeron (2017),

TABLE 1. Correspondence between Cohen’s d and CLE equivalents

Professor Arne Kare Topphol, who published that Hattie had calculated CLE's incorrectly in his paper, 'Can we count on the statistics use in education research?' had a dialogue with Hattie:

"My criticism of the erroneous use of statistical methods will thus probably not affect Hattie’s scientific conclusions. However, my point is, it undermines the credibility of the calculations and it supports my conclusion and the appeal I give at the end of my article; when using statistics one should be accurate, honest, thorough in quality control and not go beyond one's qualifications.  
My main concern in this article is thus to call for care and thoroughness when using statistics. The credibility of educational research relies heavily on the fact that we can trust its use of statistics. In my opinion, Hattie’s book is an example that shows that we unfortunately cannot always have this trust.  
Hattie has now given a response to the criticism I made. What he writes in his comment makes me even more worried, rather than reassured." 
Topphol was referring to Hattie's response that he used a slightly different formula and that resulted in these strange values.

Hattie stated,
"Yes, I did use a slightly different notion to McGraw and Wong. I struggled with ways of presenting the effect size data and was compelled by their method – but … I read another updated article that was a transformation of their method – but see I did NOT say this in the text. So Topphol’s criticism is quite reasonable. I did not use CLE exactly as they described it!" Read full dialogue here.
Hattie (2015) in a later defence of his work, contradicts the explanation he gives in 2012 to Topphol above, i.e., that he used a different calculation to Topphol:
"at the last minute in editing I substituted the wrong column of data into the CLE column and did not pick up this error; I regret this omission." 
But this does not explain Hattie's incorrect use of the CLE probability statistic in VL (2009, p. 9&42) as mentioned by Topphol above.

Hattie no longer promotes the CLE as a way of understanding his effect sizes, he uses a value of d = 0.40 as being equivalent to 1 year's progress. But as already stated, this creates other more significant problems.

Hattie mixes up the X\Y Axis:

Hattie uses a funnel plot (p. 21) to show that publication bias does not affect his research. But, Higgins and Simpson (2011) show that Hattie has mixed the X/Y axis and if drawn correctly the funnel plot does, in fact, show publication bias (p. 198).

Yelle et al. (2016) 'What is visible from learning by problematization: a critical reading of John Hattie's work', say there is an implied underestimation of the publication bias in Hattie's synthesis.

Dr. Ben Goldacre on the Funnel Plot

For more information see -

Standard Error:
"In his "Effect Barometers" Hattie also gives a standard error for the determined effect size. However, their calculation is flawed (see also Pant 2014 a, p. 96, note 4, 2014b, p. 143, FN 4). As a rule, the specified value is the arithmetic mean of the specified standard errors of the individual first-stage meta-analyzes. For example, in the case of inquiry-based teaching, the given standard error of 0.092 is the arithmetic mean of the two standard errors of 0.154 and 0.030 given two of the four meta-analyzes. However, the precision of the estimate from both meta-analyzes can not be less than that of the individual meta-analyzes; in fact, the standard error of an effect magnitude estimate from these two overlap-free meta-analyzes in the primary studies is 0.029...

The reconstruction of Hattie's approach in detail using examples thus shows that the methodological standards to be applied are violated at all levels of the analysis. As some of the examples given here show, Hattie's values are sometimes many times too high or low. In order to be able to estimate the impact of these deficiencies on the analysis results, the full analyzes would have to be carried out correctly, but for which, as already stated, often necessary information is missing. However, the amount and scope of these shortcomings alone give cause for justified doubts about the resilience of Hattie's results" Wecker et al. (2016, p. 30). 
Schulmeister & Loviscach (2014) Errors in John Hattie’s "Visible Learning".
"Hattie’s method to compute the standard error of the averaged effect size as the mean of the individual standard errors ‒ if these are known at all ‒ is statistical nonsense." 

No comments:

Post a Comment