Teacher Training

Effect Size d = 0.11  (Hattie's Rank = 124) but PD = 0.62

In Hattie's 2008 Nuthall lecture he called Teacher Education which he later changed to Training a disaster!

Hattie continued to use these slides to 2012:



Teacher Training is another controversial finding as the effect size appears to contradict the Professional Development effect size and PISA analysis.

This also contradicts one of the highest performing educational systems in the World, the Finnish system. The Director General Pasi Sahlberg outlines in Finnish Lessons 2.0, their priority,

Teacher Training both in subject knowledge and didactics (p. 77).

Hattie used the following meta-analyses:

Authors/Teacher TrainingYear#studiesstudentsMean (d)CLEVariable
Wu, Becker & Kennedy2002240.086%certified v alternative
Wu, Becker & Kennedy2002240.1410%trad v emergency
Hacke20102119897610.09NBC vs Non NBC
Kelley & Camilli2007320.15Early child teacher Ed
Sparks200450.128%trad v emergency

Hacke (2010) is a PhD dissertation, she compares the most common form of certification in U.S.A - The National Board of Certified Teachers (NBC) with non-certified teachers (Non-NBC).

Certification costs between $18,000 - $31,000 per teacher and about 400 hours work (p. 113). One of Hacke's aims was to determine whether the cost and time of certification are worth the effort. 

The low effect size indicates NO. 

Note: this contradicts Hattie's use of NBC research - see below.

NBC is a teaching certificate for teachers with a teaching degree plus 3 years experience.

Certification consists of four components:
-written assessment of content knowledge,
-reflection on student work samples,
-video and analysis of teaching practice,
-documented impact and accomplishments as a teaching professional.

The certification is expensive and takes a lot of time to prepare, so many experienced teachers do not go through the process. Also, school districts in poorer areas do not require certification as there is major turnover and a shortage of teachers. Harris and Sass (2009) report these are MAJOR confounding variables (p. 7).

Sparks (2004) is also a PhD dissertation which compares certified versus alternatively certified or non-certified teachers. Sparks reports a range of correlational effect sizes, she advises against averaging them,
"The disparate definitions of certification do not permit effect size estimates to be combined" (p. 89).
Yet Hattie goes ahead to get an average of 0.12.

Sparks reports a major problem with how student achievement is represented - a major issue is the use of school or state-level means in place of individual student data (p. 101). For example, in one of the 5 studies used,
"student achievement was represented by a single state-level mean and paired with the proportion of certified teachers in the state" (p. 102).
Wu, Becker & Kennedy (2002) is a conference presentation mostly comparing NBC with Non-NBC. 

Kelley & Camilli (2007) is a study on early childhood (pre-school) teachers, it does compare Bachelor degrees with Diplomas.


Hattie's Arbitrary Interpretations of NBC Research (VL, Ch 11):

Hattie only uses Bond, Smith, Baker, Hattie (2000) comparing 65 NBC with 40 Non-NBC teachers.

Hattie makes the arbitrary judgement that NBC certified teachers are 'Experienced Experts' while Non-NBC teachers are 'Experienced Non Experts' (VL, p. 259).

He does not use Standardised Student Achievement (SSA), rather arbitrary criteria -see the graph below. Hattie's justification for not using SSA (Bond, Smith, Baker, Hattie, 2000),
"such a comparison ignores other powerful influences on standardized measures such as socio-economic status. Even if one could control for such effects, a simple mean comparison would still be inappropriate. It is simply unrealistic to expect student performance on off-the-shelf, multiple-choice achievement tests to be measurably sensitive to differences between teachers in a single academic year. This is especially so since the relationship between what teachers are teaching and the content of various standardized tests varies widely from state to state, district to district, school to school, and indeed from classroom to classroom."
Podgursky (2001) in his critique of (Bond, Smith, Baker, Hattie, 2000)
'No study, however, has ever shown that National Board-certified teachers are any better than other teachers at raising student achievement. Nothing has changed with the release of this report. The National Board’s researchers rejected the use of student test scores as a measure of teacher performance, claiming, “It is not too much of an exaggeration to state that such measures have been cited as a cause of all of the nation’s considerable problems in educating our youth. . . . It is in their uses as measures of individual teacher effectiveness and quality that such measures are particularly inappropriate.”' (p. 2).
Later, Podgursky described these non-achievement criteria as
"nebulous standards"
How did Hattie's group identify Non- NBC teachers?

"In order to investigate the extent to which National Board Certified teachers differ from non-certified teachers in the amount and type of professional activity, we designed and administered an extensive telephone interview protocol to a sample of 40 MC/Gen and EA/ELA candidates from across the United States." (p. 11).

Hattie concludes that expert teachers (NBC) outperform Non-NBC teachers on almost every criterion (VL, p. 260). Although Professor Gore, who delivered the Dean's Lecture at Hattie's school at Melbourne University disagrees with him.



Harris and Sass (2009) report that the National Board for Professional Teaching Standards (NBPTS) who administer the NBC generate around $600 million in fees each year (p. 4). Harris and Sass's much larger study 'covering the universe of teachers and students in Florida for a four - year span' (p. 1) contradict Hattie's conclusion, 
"we find relatively little support for NBC as a signal of teacher effectiveness" (p. 25).
It is interesting that much of Hattie's consulting work to schools involves measuring teachers on the arbitrary categories listed on the graph above, a significant omission is Teacher Subject Knowledge.

Another example of inconsistent results when using different outcomes:

Hacke (2010) pinpoints the central issue to all of Hattie's work, 
"identifying effective teachers hinges on how it is defined and measured" (p. 32).
This is a good example of, on the one hand, the NBC research (using arbitrary outcomes) being used to demonstrate a SPECTACULAR difference in expert vs experienced teachers; yet when measuring student achievement it is used to demonstrate Teacher Training is a disaster.

Hacke (2010) goes further, illustrating the inconsistency of using the different type of tests: criterion-referenced tests are intended to measure how well a person has learned a specific body of knowledge and skills, whereas norm-referenced tests are developed to compare students with each other and are designed to produce a variance in scores. 

She cites a unique study by Harris and Sass (2007), who examined the influence of teacher certification (NBC) using two different types of assessment data from the state of Florida, which gives both norm-referenced and criterion-referenced tests. Harris and Sass compared the results which revealed that the effect of NBC was negative for both reading and mathematics using the norm-referenced test, YET, for the criterion-referenced assessments they were positive (p. 109).


A more detailed look at the studies:


Hacke (2010) is a dissertation comparing National Board Certificated (NBC) teachers with Non-NBC teachers (p. 8).

Hacke's dissertation is of high-quality and is very thorough. Her major inclusion criteria are: studies must be done in the USA on year 3-12 students,
"Student achievement is defined as end-of-year or end-of-instruction test score gains on standardised tests in reading and mathematics" (p. 20).
Also, she agrees with Wu et al., in identifying the major confounds. 
"The on-going debate over what an effective teacher is and does make measuring teacher effectiveness elusive, as there is no generally accepted method for doing so" (p. 28).
Wu, Becker & Kennedy (2002) is a paper presented at the annual meeting of the American Educational Research Association. I have not been able to get a copy of the full presentation, but I contacted Professor Kennedy and she sent me a summary. In the introduction they state, 
"Our synthesis will focus on studies conducted in the United States since 1960 ... We decided to limit ourselves to the U.S. K-12 context on the premise that the factors involved in the training and hiring of teachers at other educational levels and in other countries may differ functionally and culturally from those at play in the U.S. K-12 system. We also are examining studies of teachers in the workplace - that is, we will not include studies of pre-service teachers because we presume that they are still learning to teach and the relationships we might observe between qualifications and teaching outcomes for this population might not be reliable or stable. Also, studies examining whether in-service programs make better teachers are omitted."
They detail the "inclusion/exclusion" criteria used to select studies for their synthesis - it is interesting many of Hattie's studies would fail on many of these criteria, e.g., BIAS. They state, 
"When reviewers do not describe how studies are selected, the reader is left to wonder whether personal predilections or biases led the reviewer to select studies favouring his or her viewpoint. A major goal of data collection for systematic reviews is to have thorough and replicable search and selection procedures."
They compare Alternate routes to teacher certification with traditional routes. Alternate routes often attract older recruits who are qualified in another profession (career changers) and involve an internship or teacher training "on the job". Emergency routes are often created by school districts in response to teacher shortages (the obvious confounding variable here is these school districts are often in the poor areas with low achieving schools!). 

They identify another major confounding variable - the terms 'qualifications' and 'quality' are used differently by researchers and are measured differently. Some consider a college education to be a qualification and teacher assessments to be a measure of quality; while others use teacher test scores as indications of qualifications and student achievement as quality. Once again the old comparing apples with oranges problem of meta-analyses. Their summary of Teacher Qualifications and Quality:



Teacher QualificationsNumber of DissertationsNumber of Other SourcesTotal
Educational Background*9690186
Certification365591
Subject Matter Knowledge61319
Verbal Ability077
Other Test Score213253
Teaching Experience4462
106




Quality of TeachingNumber of DissertationsNumber of Other SourcesTotal
Student Achievement110157267
Observed Classroom Practice173451
Teacher/School Effectiveness13114
Performance Assessments12012

Note the use of 'Subject Matter Knowledge', where they use many of the studies that Hattie uses for the different influence 'Teacher Subject Matter Knowledge'. This partially accounts for the similar effect sizes. (Although, the use of the same data across different influences is poor scholarship as it leads to bias).


Also, Professor Becker has published some details of their results which show wide variation (p. 12):




Becker warns, 

"The literature tells us very little about the exact nature of these programs, or about the comparison (traditional) programs" (p. 14).
and concludes with a caution: 
"Because we are still obtaining sources, the data we present here is tentative;"

Kelley & Camilli (2007) compare pre-school teachers with/without a BA of students from 3-5 years of age. They conclude:
"The analysis indicated that effects on quality outcomes from teachers with a bachelor’s degree (the treatment group) were significantly different from those teachers with less education (the comparison group). In standard deviation units, the average effect was .16 standard deviations ... There are, however, two caveats. First, the effect size is relatively small, though significant ... Second, the research underlying this effect size is correlational in nature. Thus, it is possible that any number of factors, aside from having a bachelor’s degree, cause this effect" (p. 1).
However, pre-school studies are precisely, the studies that Wu, Becker & Kennedy (2002) above have removed from their analysis, 
"we will not include studies of pre-service teachers because we presume that they are still learning to teach and the relationships we might observe between qualifications and teaching outcomes for this population might not be reliable or stable."
So this is another example of the problem with meta-analysis in comparing 'apples with oranges'.

However, it is an excellent summary of the protocols used in meta-analyses and highlights many of the issues of this methodology:
"Correlations were transformed into comparative effect sizes when sufficient information was given (e.g., point-biserials were transformed to ES). When studies failed to report such information, the Pearson correlation coefficient r was used as the primary effect size measure" (p.18).
They advocate a strengthening of the peer review process as it is the traditional safeguard for ensuring complete and accurate reporting (p. 35).


Sparks (2004) 'The Looming Danger of a Two-Tiered Professional Development System'is not a meta-analysis but rather a commentary on professional development. So should NOT be included in this category. It is only 3 pages long and there is no mention of an effect size anywhere. So I'm not sure how Hattie gets d = 0.12. 

Sparks comments on two tiers of professional development - NOT Teacher Training or qualifications. 
"The first tier is an emerging system that advocates the development of professional community and the exercise of professional judgement ... Conversely, the second tier of professional development is built on mandates, scripted teaching, and careful monitoring for compliance" (p. 304).
He states, 
"I have several concerns about this second tier of professional development. Far too many tier-two efforts begin and end with top down, highly prescriptive approaches, leaving the culture of schools untouched and teachers and students ill prepared to function much beyond the most rudimentary levels of performance. I am also concerned that demeaning and mind-numbing staff development will create a persistent aversion to professional learning and leave teachers feeling resigned to their fate and dependent on experts as the primary source for their development. And most important, because such forms of professional development are typically directed at those who teach our most vulnerable students, I believe that this approach will have long-term, deleterious consequences for poor and minority students" (p. 305)
Sparks also interviewed education guru Andy Hargreaves in 2004, on issues relevant to our discussion - 'Broader purpose calls for higher understanding.' Hargreaves states,  
"I come from England, where the professional culture was for many years based on a craft view of teaching in which teachers know best and researchers know little. Research was disparaged as irrelevant and esoteric with no relevance to the classroom. In moving to America, I found the opposite problem in which there's a tendency not only to respect but to revere research and researchers, to give them too much of their due, and not to challenge them enough from the wisdom of practice. Both of these extremes are undesirable. 
The challenge is to bring the wisdom of practice into critical dialogue with the wisdom of research" (p. 49).

PISA Analysis:


The highly respected Grattan Institute analysed the high performing international educational systems and concluded that one of the reforms responsible for improving student achievement across the four high-performing education systems in East Asia was 'providing high-quality initial teacher education' (p. 12). While the high performing Finnish system introduced this reform in the late 1990's - see the interview with Pasi Sahlberg.

An example of a study on teacher training is in English schools.

Another blog questioning qualifications - https://teachingbattleground.wordpress.com/2017/11/11/why-all-the-research-on-teacher-qualifications-is-worthless/

No comments:

Post a Comment