The Coalition Is Pleased with GAO’s Confirmation of the Top Tier Initiative’s Adherence to Rigorous Standards and Overall Transparency
The Coalition is pleased with the GAO report’s key findings that the Top Tier initiative’s criteria conform to general social science research standards (pp. 15-23), and that its process is mostly transparent (pp. 9-15). We also agree with its observation that the Top Tier initiative differs from common practice in its strong focus on randomized experiments, and would add that this was the initiative’s goal from the start. Indeed, its stated purpose is to identify interventions meeting the top tier standard set out in recent Congressional legislation: “well-designed randomized controlled trials [showing] sizeable, sustained effects on important … outcomes” (e.g., Public Laws 110-161 and 111-8).
Consistent with our initiative’s unique focus on helping policymakers distinguish the relatively few interventions meeting this top evidentiary standard from the many that claim to, we have – as noted in the GAO report – identified 6 interventions as Top Tier out of the 63 reviewed thus far. The value of this process to policymakers is evidenced by the important impact these findings have already had on federal officials and legislation. For example, the initiative’s findings for the Nurse-Family Partnership (NFP) have helped to spur the Administration and Congress’ proposed national expansion of evidence-based home visitation. (The NFP study results are cited in the President’s FY 2010 budget.) Similarly, the initiative’s findings for the Carrera Adolescent Pregnancy Prevention program and Multidimensional Treatment Foster Care (MTFC) have helped inform the Administration and Congress’ proposed evidence-based teen pregnancy prevention program. (The MTFC study results are cited in the Senate’s FY10 Labor-HHS-Education Appropriations Committee report.1)
In fact, OMB Director Peter Orszag recently posted on the OMB website a summary of the Administration’s “two-tiered approach” to home visitation and teen pregnancy, which links to the Coalition’s website.2 The approach includes (i) funding for programs backed by strong evidence, which he identifies as “the top tier;” and (ii) additional funding for programs backed by “supportive evidence,” with a requirement for rigorous evaluation that, if positive, could move them into the top tier.
Consistent with this Administration approach, we recognize (and agree with GAO) that nonrandomized studies provide important value – for example, in (i) informing policy decisions in areas where well-conducted randomized experiments are not feasible or not yet conducted; and (ii) identifying interventions that are particularly promising, and therefore ready to be evaluated in more definitive randomized experiments. We think the GAO report somewhat overstates the confidence one can place in nonrandomized findings alone, per (i) a recent National Academies recommendation that evidence of effectiveness generally “cannot be considered definitive” without ultimate confirmation in well-conducted randomized experiments, “even if based on the next strongest designs;” 3, and (ii) evidence that findings from nonrandomized studies are often overturned in definitive randomized experiments (see attachment, below). But the important and complementary value of well-conducted nonrandomized studies as part of an overall research agenda is a central theme of the Coalition’s approach to evidence-based policy reform.
In conclusion, we appreciate GAO’s thoughtful analysis, and will use its valuable observations to strengthen our initiative as it goes forward. Although the Congressionally-established top tier standard itself was not a main focus of the GAO report (as opposed to our process), we have attached some brief background on the standard and the reasons we support its use as an important element of appropriate policy initiatives (see below).
The Congressionally-established Top Tier evidence standard is based on a well-established concept in the scientific community, and strong evidence regarding the importance of random assignment
Congress’ Top Tier standard is based on a concept well-established in the scientific community – that when results of multiple (or multisite) well-conducted randomized experiments, carried out in real-world community settings, are available for a particular intervention, they generally comprise the most definitive evidence regarding that intervention’s effectiveness. The standard further recognizes a key concept articulated in a recent National Academies recommendation: although many research methods can help identify effective interventions, evidence of effectiveness generally “cannot be considered definitive” without ultimate confirmation in well-conducted randomized experiments, “even if based on the next strongest designs.”3
Although promising findings in nonrandomized quasi-experimental studies are valuable for decisionmaking in the absence of stronger evidence, too often such findings are overturned in subsequent, more definitive randomized experiments. Reviews in medicine, for example, have found that 50-80% of promising results from phase II (mostly quasi-experimental) studies are overturned in subsequent phase III randomized trials.4 Similarly, in education, eight of the nine major randomized experiments sponsored by the Institute of Education Sciences since its creation in 2002 have found weak or no positive effects for the interventions being evaluated – interventions which, in many cases, were based on promising, mostly quasi-experimental evidence (e.g., the LETRS teacher professional development program for reading instruction).5 Systematic “design replication” studies comparing well-conducted randomized experiments with quasi-experiments in welfare, employment, and education policy have also found that many widely-used and accepted quasi-experimental methods produce unreliable estimates of program impact.6
Thus, we support use of the Top Tier standard as a key element of policy initiatives seeking to scale up interventions backed by the most definitive evidence of sizeable, sustained effects, in areas where such proven interventions already exist. The standard has a strong basis in scientific authority and evidence, as reflected, for example, in the recent National Academies recommendation.
2 Peter Orszag’s summary of the Administration’s two-tiered approach is posted at http://www.whitehouse.gov/omb/blog/09/06/08/BuildingRigorousEvidencetoDrivePolicy/
3 National Research Council and Institute of Medicine. (2009). Preventing Mental, Emotional, and Behavioral Disorders Among Young People: Progress and Possibilities. Committee on Prevention of Mental Disorders and Substance Abuse Among Children, Youth and Young Adults: Research Advances and Promising Interventions. Mary Ellen O’Connell, Thomas Boat, and Kenneth E. Warner, Editors. Board on Children, Youth, and Families, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. Recommendation 12-4, page 371 [linked here].
4 John P. A. Ioannidis, “Contradicted and Initially Stronger Effects in Highly Cited Clinical Research,” Journal of the American Medical Association, vol. 294, no. 2, July 13, 2005, pp. 218-228. Mohammad I. Zia, Lillian L. Siu, Greg R. Pond, and Eric X. Chen, “Comparison of Outcomes of Phase II Studies and Subsequent Randomized Control Studies Using Identical Chemotherapeutic Regimens,” Journal of Clinical Oncology, vol. 23, no. 28, October 1, 2005, pp. 6982-6991. John K. Chan et. al., “Analysis of Phase II Studies on Targeted Agents and Subsequent Phase III Trials: What Are the Predictors for Success,” Journal of Clinical Oncology, vol. 26, no. 9, March 20, 2008.
5 The Impact of Two Professional Development Interventions on Early Reading Instruction and Achievement, Institute of Education Sciences, NCEE 2008-4031, September 2008, http://ies.ed.gov/ncee/pubs/20084030/.
6 Howard S. Bloom, Charles Michalopoulos, and Carolyn J. Hill, “Using Experiments to Assess Nonexperimental Comparison-Groups Methods for Measuring Program Effects,” in Learning More From Social Experiments: Evolving Analytic Approaches, Russell Sage Foundation, 2005, pp. 173-235. Thomas D. Cook, William R. Shadish, and Vivian C. Wong, “Three Conditions Under Which Experiments and Observational Studies Often Produce Comparable Causal Estimates: New Findings from Within-Study Comparisons,” Journal of Policy Analysis and Management, vol. 27, no. 4, 2008, pp. 724-50. Steve Glazerman, Dan M. Levy, and David Myers, “Nonexperimental versus Experimental Estimates of Earnings Impact,” The American Annals of Political and Social Science, vol. 589, September 2003, pp. 63-93.