Assessing the Impact of College- and Career-Ready Standards: Findings, Challenges, and Future Directions

Mengli Song
Monday, March 19, 2018
College- and Career-Readiness
Common Core State Standards
NAEP
Standards Implementation


Twenty20

Since the early 1980s, standards-based reforms have been a crucial part of federal and state efforts to improve education. Adopted by all 50 states and the District of Columbia (DC) between 2007 and 2015, college- and career-ready (CCR) standards—the focus of the current wave of standards-based reform—differ from states’ previous standards in important ways. Most notably, prompted by concerns about high college remediation rates and the recognition that the rigor of states’ standards varied widely and declined in many states, the new CCR standards were explicitly designed around the goal of ensuring college and career readiness for all students upon high school graduation.

Although the question “Are CCR/CC standards working?” is clearly on everyone’s mind, there has been very limited empirical research explicitly addressing this question.

Another distinctive feature of the current standards-based reform movement is a strong emphasis on common standards across states. Spearheaded by the National Governors’ Association and the Council of Chief State School Officers, the Common Core (CC) State Standards Initiative was launched in 2009 and aimed to develop a common set of mathematics and English language arts (ELA) standards for all states, based on evidence of what knowledge and skills are needed to be ready for college and career upon high school graduation, and internationally benchmarked to the world’s highest-performing countries. Released in June 2010, the CC standards were quickly adopted by 45 states and Washington, D.C. by the end of 2011, and adopted by one more state (Washington) in 2012. The extraordinary initial response of states to the CC standards, however, was followed by contentious debates and a steady decline in support. As of March 2018, 11 states that originally adopted the CC standards have withdrawn from the CC. Nevertheless, with 36 CC-implementing states, the CC standards are still by far the most prominent form of CCR standards today.

Given the prominence of the CC standards and the highly politicized debates around the CC, it is no wonder that existing research on CCR standards has focused almost exclusively on the CC standards, particularly the implementation of the CC standards (e.g., Bay-Williams, Duffett, & Griffith, 2016; Rentner, 2013). Although the question “Are CCR/CC standards working?” is clearly on everyone’s mind, there has been very limited empirical research explicitly addressing this question. Below, I first summarize available research evidence on the impact of CCR/CC standards on student achievement, and then point out a few challenges related to estimating the impact of CCR/CC standards, which may explain the paucity of impact studies in this area. Finally, I offer a few suggestions for directions for future research.

What Have We Learned About the Impact of CCR/CC Standards on Student Achievement?

To the best of my knowledge, there have been only four empirical studies that attempted to assess the impact of the CC standards on student achievement. The first three studies were conducted by Loveless (2014, 2015, and 2016) as part of the Brown Center annual reports on American education. Based on state-level NAEP data for grade 8 math, Loveless (2014) compared 2009–2013 NAEP gains across three groups of states defined based on a CC implementation index: strong implementers (n=19), medium implementers (n=26), and non-adopters (n=5). He found that strong implementers experienced a slightly larger gain in NAEP scores than non-adopters (difference in gain=0.04 standard deviations (SD)).

In his 2015 study, Loveless replicated his 2014 analyses using NAEP data for grade 4 reading, and conducted similar analyses with groups of states defined based on an alternative CC implementation index.[1] Both sets of analyses suggest that the 2009–2013 gain in NAEP 4th-grade reading score was slightly higher (by 0.03 to 0.04 SDs) in strong implementers than in non-adopting states. Loveless’ more recent analyses incorporating the 2015 NAEP data, however, revealed that the 2009–2015 gain in NAEP 4th-grade reading score was slightly smaller (by 0.01 to 0.02 SDs) in strong implementers than in non-adopting states (Loveless, 2016). For 8th-grade reading, Loveless (2016) found that the 2009–2015 NAEP gain in strong implementers was slightly smaller (by 0.003 SDs) based on one implementation index but slightly larger (by 0.02 SDs) based on the other implementation index relative to the gain in non-adopting states. Taken together, the three studies conducted by Loveless suggest that there were very little systematic differences between states that were strong implementers of CC and non-adopting states in NAEP gains between 2009 and 2015. 

Findings from these studies need to be interpreted with a large grain of salt, as they are based on simple descriptive comparisons of group means between non-equivalent groups of states, and thus reflect associations rather than causal effects.

Findings from the studies conducted by Loveless, however, need to be interpreted with a large grain of salt, as they are based on simple descriptive comparisons of group means between non-equivalent groups of states, and thus reflect associations rather than causal effects. In particular, the comparison group used in these studies included a small set of non-adopting states (n=4 or 5), which were quite unique given the almost nationwide adoption of the CC. These non-adopters therefore may not be an appropriate comparison group as selection bias may be a serious concern. In addition, given the very small number of non-adopters, results from the analyses presented in Loveless (2014, 2015, 2016) were sensitive to substantial changes in NAEP scores in one or two states, as the author acknowledges. 

While the three studies conducted by Loveless included all 50 states, the latest study on the impact of CC standards relied on data from a single state (Kentucky). Tacking three cohorts of students from grade 8 through 11, Xu and Cepa (in press) found that students exposed to the CC standards (i.e., students in the two more recent cohorts) scored significantly higher (by 0.03–0.04 SDs) on the ACT taken in the 11th grade than similar students not exposed to the new standards (i.e., student in the earliest cohort) based on difference-in-difference analyses.[2] The authors caution, however, that the observed differences between the cohorts may not be completely attributable to the CC standards, as cross-cohort differences in student achievement occurred in both the year before and the year after the adoption of the new standards.

In contrast to the studies mentioned above which all focused on the Common Core, the ongoing C-SAIL Longitudinal Outcome Study is intended to assess the impact of the college- and career-readiness standards in general.

In contrast to the four studies mentioned above which all focused specifically on the CC standards, the ongoing C-SAIL Longitudinal Outcome Study is intended to assess the impact of the CCR standards—not just the CC standards—in general, and thus provides a broader assessment of the current wave of standards-based reform. Relying on state-level NAEP data from 1990 to 2015, we used comparative interrupted time series analyses to assess whether the adoption of CCR standards led to a larger improvement in student achievement in states with lower prior proficiency standards (where the CCR standards represented a stronger form of “treatment”) than in states with higher prior standards. Preliminary results show that the early effects of CCR standards on student achievement were generally very small—with effect sizes ranging from -0.09 to 0.06 SDs across subjects (math and reading), grades (4 and 8), and years (1 year, 3 years, and 5 years after adoption). None of the effects are statistically significant except for the significant negative effects for grade 4 reading observed 1 year and 3 years after CCR adoption. Updated findings that incorporate the latest 2017 NAEP data will be available later this year.

Why Have There Been So Few Studies Assessing the Impact of CCR Standards?

Given the potentially far-reaching impact of CCR standards (including the CC standards) on student learning across states and the unabated attention to the highly politicized standards among policymakers, the body of empirical research on the impact of CCR standards is unusually small, which may be attributable to a number of challenges with conducting research in this area. One major challenge concerns selection bias. For obvious reasons, it is not feasible to assess the impact of CCR standards—a state-level “treatment/intervention”—based on a randomized controlled trial, a gold-standard design for eliminating selection bias and generating causally valid impact estimates. The best that researchers could possibly do is to employ some type of quasi-experimental design (QED) based on comparisons of non-equivalent groups. Although a QED study could address selection bias to some extent by controlling for observed sample characteristics, it is subject to selection bias due to unobserved sample characteristics that differ between the groups compared and relate to the outcome of interest (i.e., “confounders”).[3] If not fully accounted for, such confounders (which could be state demographic characteristics or the more-difficult-to-measure state political climate and policy context) would call into question the validity of the impact estimates based on a QED.

It is not feasible to assess the impact of CCR standards—a state-level “treatment/intervention”—based on a randomized controlled trial, a gold-standard design for eliminating selection bias and generating causally valid impact estimates.

The selection bias issue is particularly difficult to address for QED studies on the impact of the statewide CCR standards than for QED studies of interventions at lower levels (e.g., school or teacher level). This is because some of the commonly used design strategies for removing selection bias in QEDs (e.g., matching) require a relatively large pool of potential comparison units from which well-matched comparison units can be selected, and such strategies would not work well for a state-level intervention such as the CCR standards given the limited number of states.

Related to the level of intervention, another challenge for impact studies of state-level CCR standards concerns statistical power, which is likely to be constrained by the number of states included in the impact analysis. Power will be even more limited if the impact estimates are based on comparisons involving only a subset of states—as is the case with all existing studies of CCR/CC impact based on state-level data. If the sample size for the impact analysis based on a particular QED is too small, the estimates may not have sufficient precision, and as a result, the study may fail to detect the true impact of CCR standards, if there is one.

Where Should We Go From Here?

The challenges mentioned above are not meant to deter future research on the impact of CCR standards. On the contrary, more research in this area is needed particularly given the heightened focus on CCR standards among state legislatures in recent years, despite the paucity of research-based evidence that policymakers could rely on to inform their decisions.[4] There is definitely a strong need for timely research-based information on CCR standards to help state policymakers make better-informed policies, which may have far-reaching implications for not only state standards for the content that students should learn, but also numerous other related programs and policies (e.g., state assessments and textbook adoptions). So, as researchers, we should not let the perfect be the enemy of the good and should strive to expand the evidence base for CCR standards with more empirical studies as rigorously designed as possible. Below I offer a few suggestions concerning the design, outcome measures, and the dissemination of findings for future research in this area.

As researchers, we should not let the perfect be the enemy of the good and should strive to expand the evidence base for CCR standards with more empirical studies as rigorously designed as possible.

Design

Future QED studies of CCR standards need to pay careful attention to the issue of selection bias, identify potential confounders based on empirical data and/or theory, and incorporate adequate statistical control for confounders in the impact analysis, which will not only help reduce selection bias but also help improve the precision of estimates and hence the statistical power to detect true impacts. Given that it is virtually impossible to completely eliminate all selection bias in a QED, results based on a QED are likely to be more sensitive to the way the impact model is specified than results based on an RCT. Therefore, it is advisable that QED studies routinely supplement the main impact analyses with carefully designed sensitivity analyses to check the robustness of their results to alternative model assumptions and specifications.   

Outcome Measures

With the exception of Xu and Cepa (in press), all existing impact studies of CCR standards focus on the impact of the new standards on student achievement as measured by state-level NAEP scores. Admittedly, NAEP scores are a perfectly reasonable choice of outcome measure for these studies, given that they are readily available for all states and many years, and given that there is substantial overlap between the NAEP item pool and the CCR standards (Daro, Hughes, & Stancavage, 2015). However, as Polikoff (2017) aptly asks: “The CCS are billed as college- and career-readiness standards—might we not want to evaluate their impact on outcomes of this sort?” (p.2). I think the answer is obvious—we should definitely go beyond NAEP scores as the only outcome measure for studies of CCR standards. In fact, the C-SAIL Longitudinal Outcome Study does plan to assess the impact of CCR standards on high school graduation and college enrollment as additional measures of student outcome, drawing on relevant datasets compiled by the National Center for Education Statistics. We encourage other researchers to also explore these and other data sources to identify appropriate measures of college and career readiness and to expand the evidence base for CCR standards beyond NAEP-based evidence.

Dissemination of Findings

CCR standards have been an area where research activities seem to have lagged behind legislative activities. Clearly, there is a need for more research in this area, and there is also a need for findings from such research to reach policymakers in a timelier fashion, as policymakers would not refrain from making decisions even if sufficient evidence is not yet available. Thus, to better inform policy, researchers should consider dissemination channels that deliver their findings to policymakers and other interested audience much more quickly (e.g., blogs, policy briefs, and newsletters) than traditional journal publications, which may take several years to materialize and may miss the opportunity to inform important policy at critical time points. Other than being timelier, research findings disseminated through non-traditional channels are often presented in easily accessible language in short pieces, which would increase the chance that they are read by policymakers for whom a 30-page journal article full of technical jargons may be too daunting.  

Regardless of the specific channel through which findings about the impact of CCR standards are disseminated, such findings are likely to attract a great deal of attention and scrutiny, given the dearth of research-based evidence on the highly politicized CCR standards. Researchers thus need to be extra careful in communicating their findings about the impact of CCR standards. In particular, they should be transparent about the assumptions underlying their methods (which may or may not be tenable) and the limitations of their research—even in short pieces with limited space, so that their findings can be properly interpreted and will not be overgeneralized or misused.

References

Bay-Williams, J., Duffett, A., & Griffith, D. (2016). Common Core math in the K-8 classroom: Results from a national teacher survey. Washington, DC: Thomas B. Fordham Institute. Retrieved January 9, 2017, from https://edexcellence.net/publications/common-core-math-in-the-k-8-classroom-results-from-a-national-teacher-survey

Daro, P., Hughes, G. B., & Stancavage, F. (2015). Study of the Alignment of the 2015 NAEP Mathematics Items at Grades 4 and 8 to the Common Core State Standards (CCSS) for Mathematics. Washington, D.C.: American Institutes for Research.

Loveless, T. (2014). 2014 Brown Center report on American education: How well are American students learning? Part III: A progress report on the Common Core. Washington, DC: The Brookings Institution. Retrieved January 15, 2017, from https://www.brookings.edu/research/a-progress-report-on-the-common-core/

Loveless, T. (2015). 2015 Brown Center report on American education: How well are American students learning? Part II: Measuring effects of the Common Core. Washington, DC: The Brookings Institution. Retrieved January 15, 2017, from https://www.brookings.edu/research/measuring-effects-of-the-common-core/

Loveless, T. (2016). 2016 Brown Center report on American education: How well are American students learning? Part I: Reading and math in the Common Core era. Washington, DC: The Brookings Institution. Retrieved January 15, 2017, from https://www.brookings.edu/research/2016-brown-center-report-on-american-education-how-well-are-american-students-learning/

Polikoff, M. (2017). Is Common Core “working”? And where does Common Core research go from here? AERA Open, 3(1), 1­–6. Retrieved March 1, 2018, from: http://journals.sagepub.com/doi/abs/10.1177/2332858417691749

Rentner, D.S. (2013). Year 3 of implementing the common core state standards: An overview of states’ progress and challenges. Washington, DC: Center on Education Policy. Retrieved January 8, 2017, from http://www.cep-dc.org/displayDocument.cfm?DocumentID=421

Xu, Z. & Cepa, K. (in press). Getting college-ready during state transition toward the Common Core State Standards. Teachers College Record, 120(6).

Footnotes


[1] The CC implementation index used in both the 2014 and 2015 studies conducted by Loveless is based on a 2011 survey of state education agencies; the alternative CC implementation index used in Loveless (2015) is based on a 2013 survey of state education agencies. 

[2] An earlier version of the paper is available at: https://caldercenter.org/sites/default/files/WP%20127_0.pdf

[3] The study by Xu and Cepa (in press) and the C-SAIL Longitudinal Outcomes Study both incorporate controls for sample characteristics, but the studies by Loveless (2014, 2015, and 2016) did not.

[4] According to National Conference of State Legislatures, the number of CCR-related bills introduced by state legislatures jumped from 43 in 2011 to 293 in 2013; it soared further to 774 in 2015 and was on track to surpass 800 in 2017.