Skip to main content

Developing New Measures of Teachers’ Instruction: Part 1

Morgan Polikoff
Monday, July 10, 2017
Teaching and Instruction

Over the last two years, I have been leading C-SAIL’s measurement work. This is the first of three posts in which I will describe what we’ve been doing and what we’ve found so far.

The heart of the C-SAIL project is the FAST program study, which begins at the start of the coming school year (we have been piloting the program during the 2016-17 school year). The FAST program, a teacher support intervention, was developed by C-SAIL researchers, represents a bold effort to bring standards-based reform to the classroom door, providing teachers with detailed feedback about their instruction in order to improve instructional alignment with standards and raise student achievement.

In order to provide teachers feedback on their instructional alignment, we need high quality measures of their instruction throughout the school year. We started from an existing survey instrument, the Surveys of Enacted Curriculum (SEC). This instrument has been used in several published studies to gauge the content of teachers’ instruction (examples here and here), but most previous studies have used SEC data from only semester-end or year-end surveys (for more on the history of the SEC, see here). Thus, we needed to revise the instrument in at least two ways: 1) we needed to ensure the content included in the survey comprehensively covered the content teachers were supposed to teach according to new standards, and 2) we needed to create a log version that participating teachers could complete more frequently. We accomplished both of these tasks with a convening of content and survey experts at USC in 2015.

Once we developed the revised surveys and logs, we needed to pilot them to see if they were working as intended. Three key questions guided this work, questions that arose from our intended uses of the surveys and logs in the FAST program:

  1. What is the validity of teacher reports of their instruction for a semester?
  2. What is the validity of teacher reports of their instruction for a single lesson?
  3. How reliably can raters code the content of teachers’ assignments and assessments?

We conducted a series of validity studies of the new logs, and this blog summarizes the results of the first of these studies.

To answer the first question above, we recruited a sample of teachers to complete the log surveys every two weeks over the course of approximately a semester (eight logs), as well as a semester-end survey covering the same period. We recruited 79 teachers from across the country and from all grades in mathematics and English language arts (ELA). In the end, we obtained satisfactory data from 52 of these, 26 from each subject. Based on the data we obtained, we identified several key findings (these will be described in more detail in a full report due out this fall):

  1. Teachers’ aggregated logs were related to their end-of-semester surveys, but there was not perfect alignment. Teachers are generally able to report the same sets of topics on the surveys as they report on the logs, but their estimates of the percent of time spent on each piece of content differ between the two sources. These differences are discussed in more detail below. 
  2. The end-of-semester surveys looked more like the aggregated logs in ELA than in mathematics. That is, ELA teachers appear better able to accurately report their instruction on the end-of-semester surveys.
  3. One of the main challenges of the end-of-semester surveys seems to be that teachers do not report as varied coverage of the different levels of cognitive demand (e.g., procedures vs. analysis) on the surveys as they do on the logs. This could be because it is more difficult to remember which cognitive demands were covered when thinking about a whole semester, or it could be that the end-of-semester task is simply more burdensome and teachers do not put as much effort into reporting accurately.
  4. In general, when teachers erred on their cognitive demand on the end-of-semester survey, they over-reported higher levels of cognitive demand on the survey than they did on the logs. That is, when you asked teachers about the whole semester, they said they covered more cognitively complex skills than when you asked them every other week and added things up to the semester.
  5. In general, the skills that were under-reported on the end-of-semester survey relative to the aggregated logs were more “fundamental” skills. These included Add/subtract whole numbers and integers, Formulas, equations, and expressions, and Use of calculators in mathematics; Comprehension of multi-paragraph texts, Drawing inferences and conclusions from texts, and Listening comprehension in ELA.

Overall, we viewed these results as promising in some ways and disappointing in others.  For example, we were pleased that teachers were generally able to report very similar topic coverage between the logs and the surveys, giving us some confidence that teachers can recall their instruction across this length of time. However, the proportions of time for topic and cognitive demand coverage did not precisely match up, so we have made some changes to the surveys (discussed next).

The findings offered us guidance for how to get teachers to more accurately report on their instruction across the year. For instance, one change that we have made is that we have reduced the number of cognitive demand levels we are asking mathematics teachers to report on. We also changed the response task from first reporting topic coverage and then reporting cognitive demand coverage to reporting both at the same time. We think these changes will make it more likely that teachers can accurately report and can be consistent between logs and year-end surveys, especially when it comes to the proportional coverage across topics and cognitive demands. We are also confident that these changes will still give us the information we need to test the effects of the intervention on teachers’ instruction.

In my next post, I will present results from the third research question above.