Skip to main content

Standardizing an approach to the evaluation of implementation science proposals



The fields of implementation and improvement sciences have experienced rapid growth in recent years. However, research that seeks to inform health care change may have difficulty translating core components of implementation and improvement sciences within the traditional paradigms used to evaluate efficacy and effectiveness research. A review of implementation and improvement sciences grant proposals within an academic medical center using a traditional National Institutes of Health framework highlighted the need for tools that could assist investigators and reviewers in describing and evaluating proposed implementation and improvement sciences research.


We operationalized existing recommendations for writing implementation science proposals as the ImplemeNtation and Improvement Science Proposals Evaluation CriTeria (INSPECT) scoring system. The resulting system was applied to pilot grants submitted to a call for implementation and improvement science proposals at an academic medical center. We evaluated the reliability of the INSPECT system using Krippendorff’s alpha coefficients and explored the utility of the INSPECT system to characterize common deficiencies in implementation research proposals.


We scored 30 research proposals using the INSPECT system. Proposals received a median cumulative score of 7 out of a possible score of 30. Across individual elements of INSPECT, proposals scored highest for criteria rating evidence of a care or quality gap. Proposals generally performed poorly on all other criteria. Most proposals received scores of 0 for criteria identifying an evidence-based practice or treatment (50%), conceptual model and theoretical justification (70%), setting’s readiness to adopt new services/treatment/programs (54%), implementation strategy/process (67%), and measurement and analysis (70%). Inter-coder reliability testing showed excellent reliability (Krippendorff’s alpha coefficient 0.88) for the application of the scoring system overall and demonstrated reliability scores ranging from 0.77 to 0.99 for individual elements.


The INSPECT scoring system presents a new scoring criteria with a high degree of inter-rater reliability and utility for evaluating the quality of implementation and improvement sciences grant proposals.

Peer Review reports


The recognition that experimental efficacy studies alone are insufficient to improve public health [1] has led to the rapid expansion of the fields of implementation and improvement sciences [2,3,4,5]. However, studies that aim to identify strategies that facilitate adoption, sustainability, and scalability of evidence may not translate well within traditional efficacy and effectiveness research paradigms [6].

The need for new tools to aid investigators and research stakeholders in implementation science became clear during evaluation of grant submissions to the Evans Center for Implementation and Improvement Sciences (CIIS) at Boston University. CIIS was established in 2016 to promote scientific rigor in new and ongoing projects aimed at increasing the use of evidence and improving patient outcomes within an urban, academic, safety net medical center. As part of CIIS’s goal to foster rigorous implementation and improvement methods, CIIS established a call for pilot grant applications for implementation and improvement sciences [7]. Proposals were peer-reviewed using traditional National Institutes of Health (NIH) scoring criteria [8]. Through two cycles of grant applications, proposal reviewers identified a need for improved evaluation criteria capable of identifying specific strengths and weaknesses in order to rate the potential impact of implementation and/or improvement study designs.

We describe the development and evaluation of ImplemeNtation and Improvement Science Proposal Evaluation CriTeria (INSPECT): a tool for the standardized evaluation of implementation and improvement research proposals. The INSPECT tool seeks to operationalize criteria proposed by Proctor et al. as “key ingredients” that constitute a well-crafted implementation science proposal, which operate within the NIH proposal scoring framework [6].


Assessment of need

CIIS released requests for pilot grant applications focused on implementation and improvement sciences in April 2016 and April 2017 [7]. The request for applications described an opportunity for investigators to receive up to $15,000 for innovative implementation and improvement sciences research on any topic related to improving the processes and outcomes of health care delivery in safety net settings. CIIS funds pilot grants with the goal of providing investigators with the opportunity to obtain preliminary data for further research. Proposals were required to include a specific aims page and a three-page research plan structured within the traditional NIH framework with subheadings for significance, innovation, approach, environment, and research team. The NIH framework was required because it corresponds with the grant proposal structure required by the NIH. A study budget and justification as well as research team biographical sketches were required with no page limit restrictions. CIIS received 30 pilot grant applications covering a broad array of content areas, such as smoking cessation, hepatitis C, diabetes, cancer, and neonatal abstinence syndrome.

Six researchers with experience in implementation and improvement sciences served as grant reviewers. Four reviewers scored each proposal. Reviewers evaluated the quality of pilot study proposals, assigning numerical scores from 1 to 9 (1 = exceptional, 9 = poor) for each of the NIH criteria (significance, innovation, investigators, approach, environment, overall impact) [8]. CIIS elected to use the NIH criteria to evaluate the pilot grant applications because the criteria are those used by the NIH peer review systems to evaluate the scientific and technical merit of grant proposals. The CIIS grant review team held a “study section” to review and discuss the proposals. However, during that meeting, reviewers provided feedback that the NIH evaluation criteria, based in the traditional efficacy and effectiveness research paradigm, did not offer sufficient guidance for evaluating implementation and improvement science proposals, nor did it provide enough specificity for the proposal writers who are less experienced in implementation research. Grant reviewers requested new proposal evaluation criteria that would better inform score decisions and feedback to proposal writers on specific aspects of implementation science including measuring the strength of implementation study design, strategy, feasibility, and relevance.

Despite the challenges of using the traditional NIH evaluation criteria, the review panel used those criteria to score all of the grants received during the first 2 years of proposal requests. CIIS pilot grant funding was awarded to applications that received the lowest (best) scores under the NIH criteria and received positive feedback from the review panel.

The request for more explicit implementation science evaluation criteria prompted the CIIS research team to conduct a qualitative needs assessment of all 30 pilot study applications in order to determine how the proposals described study designs, implementation strategies, and other aspects of proposed implementation and improvement research. Three members of the CIIS research team (MLD, AJW, DB) independently open-coded pilot proposals to identify properties related to core implementation science concepts or efficacy and effectiveness research [9]. The team identified common themes in the proposals, including an emphasis on efficacy hypotheses, descriptions of untested interventions, and the absence of implementation strategies and conceptual frameworks. The consistent lack of features identified as important aspects of implementation science reinforced the need for criteria that specifically addressed implementation science approaches to guide both proposal preparation and evaluation.

Operationalizing scoring criteria

We identified Proctor et al.’s “ten key ingredients” for writing implementation research proposals [6] as an appropriate framework to guide and evaluate proposals. We operationalized the “ingredients” into a scoring system. To construct the scoring system, a four-point scale (0–3) was created for each element. In general, a score of 3 was given for an element if all of the criteria requirements for the element were fully met; a score of 2 was given if the criteria were somewhat, but not fully addressed; a score of 1 was given if the ingredient was mentioned but not operationalized in the proposal or linked to the rest of the study; and a score of 0 was given if the element was not addressed at all in the proposal. Table 1 illustrates the INSPECT scoring system for the 10 items, in which proposals receive one score for each of the 10 ingredients, for a cumulative score between 0 and 30.

Table 1 Implementation and Improvement Science Proposal Evaluation Criteria


We used the pilot study proposals submitted to CIIS to develop and evaluate the utility and reliability of the INSPECT scoring system. Initially, two research team members (ELC, DB) independently applied the 10-element criteria to 7 of the 30 pilot grant proposals. Four team members (MLD, AJW, ELC, DB) then met to discuss these initial results and achieve consensus on the scoring criteria. Two team members (ELC, DB) then independently scored the remaining 23 pilot study applications using the revised scoring system. Both reviewers recorded brief justifications for each of the ten scores assigned to individual study proposals. The two coders (ELC, DB) then met to compare scores, share scoring justifications, and determine the final item-specific scores for each proposal using group consensus.

Inter-coder reliability with the scoring protocol was measured using Krippendorff’s alpha to assess observed and expected disagreement between the two coders’ initial individual item scores [10, 11]. An alpha coefficient of 0.70 was deemed a priori as the lowest acceptable level of agreement to establish reliability of the new scoring protocol [10, 11]. Frequency analyses were conducted to determine the distribution of final element-specific scores (0–3) across all proposals. We calculated a correlation coefficient to assess the association between proposal scores assigned using the NIH framework and scores assigned using INSPECT. All calculations were performed in R version 3.3.2 [12].


Iterative review of the 30 research proposals using Proctor et al.’s “ten key ingredients” resulted in the development and testing of the INSPECT system for assessing implementation and improvement science proposals.

Figure 1 displays the skewed right distribution of cumulative proposal scores, with most proposals receiving low overall scores. Out of a possible cumulative score of 30, proposals had a median score of 7 (IQR 3.3–11.8).

Fig. 1
figure 1

Distribution of cumulative proposal scores assigned using ImplemeNtation and Improvement Science Proposal Evaluation CriTeria (INSPECT)

Table 2 presents the distribution of cumulative and item-specific scores assigned to proposals using the INSPECT criteria. Across individual elements, proposals scored highest for criteria describing care/quality gaps in health services. Thirty-six percent of proposals received the maximum score of 3 for meeting all care or care or quality gap element requirements, including using local setting data to support the existence of a gap, including an explicit description of the potential for improvement, and linking the proposed research to funding priorities (i.e., safety net setting).

Table 2 Distribution of ImplemeNtation and Improvement Science Proposal Evaluation CriTeria (INSPECT) Scores

Proposals generally scored poorly for other criteria. As shown in Table 2, most study proposals received scores of 0 in the categories of evidence-based treatment to be implemented (50%), conceptual model and theoretical justification (70%), setting’s readiness to adopt new services/treatment/programs (53%), implementation strategy/process (67%), and measurement and analysis (70%). For example, reviewers gave scores of 0 for the “evidence-based intervention to be implemented” element because the intervention was not evidence-based and the project sought to establish efficacy, rather than to examine uptake of an established evidence-based practice. Similarly, proposals that only sought to study effectiveness and did not assess any implementation outcomes [13] (e.g., adoption, fidelity) received scores of 0 for “measurement and analysis.” None of the study proposals primarily aiming to assess effectiveness outcomes expressed the dual research intent of a hybrid design. Scores of 0 for other categories were given when applications lacked any description relevant to the category, such as no conceptual model, no implementation strategy, or no research team skills revenant to implementation or improvement science.

Table 2 displays the assessed rates of inter-coder reliability in applying INSPECT to the 30 pilot study proposals. An overall alpha coefficient of 0.88 was observed between the coders. Rates of inter-coder reliability in applying each of the 10 items to the proposals ranged from 0.77 to 0.99, all above the 0.70 reliability threshold.

Additionally, we observed a moderate inverse correlation (r = − 0.62, p < 0.01) between the proposal scores initially assigned using the NIH framework and the scores assigned using INSPECT.


We developed a reliable proposal scoring system that operationalizes Proctor et al.’s “ten key ingredients” for writing an implementation research grant [6]. Previous research analyzing peer-review grant processes has highlighted a need to improve scoring agreement between peer reviewers [14]. High levels of disagreement in assessors’ interpretation of grant scoring criteria result in unreliable peer-review processes and funding decisions based more on chance than scientific merit [14]. Measuring rates of inter-rater reliability are a standard approach for evaluating the utility of existing proposal scoring criteria and assessing efforts to improve the criteria [15, 16]. Application of the INSPECT system demonstrated high inter-rater reliability overall and within each of the 10 items. The high degree of reliability measured for INSPECT may be related to the specificity of its design as an implementation and improvement science scoring criteria. A review of scoring rubrics reported in the scientific literature suggests that topic-focused criteria contribute to increased scoring reliability [17]. Additionally, the moderate correlation between scores assigned using the NIH framework and scores assigned using INSPECT suggests validity of the INSPECT criteria in evaluating proposal quality. Proctor et al.’s “ten key ingredients” for grant writers were developed to map onto the existing NIH criteria. Our operationalized version of the ingredients as scoring criteria demonstrated that proposals that scored poorly under NIH criteria also scored poorly under INSPECT.

Applying the INSPECT system to proposed implementation and improvement science research at an academic medical center improved proposal reviewers’ ability to identify specific strengths and weaknesses in implementation approach. Overall, proposals only received high scores for identifying the care gap or quality gap. Since efficacy and implementation or improvement research may use similar techniques to establish the significance of the study questions [18], proposals may score well on describing the quality gap, even if they later described efficacy hypotheses that received overall low scores from the INSPECT system. Further studies should explore techniques for describing care and quality gaps that highlight implementation or improvement research questions.

Consistently low scores in four areas—defining the evidence-based treatment to be implemented, conceptual model and theoretical justification, setting’s readiness to adopt new programs, and measurement and analysis—suggest that many investigators seeking to conduct implementation research may have misconceptions about the fundamental goals of this field. One misconception may relate to a sole focus on evaluating an intervention’s effectiveness rather than studying the processes and outcomes of implementation strategies. The majority of study proposals evaluated using INSPECT neither aimed to improve uptake of any evidence-based practice nor included any implementation measures such as acceptability, adoption, feasibility, fidelity, penetration, or sustainability [19]. Inadequate and inconsistent descriptions of implementation strategies and outcomes represent major challenges to overall implementation study success [20]. In addition to guidance provided by the INSPECT criteria, recent efforts to develop implementation study reporting standards [21] may assist proposal writers in describing planned research.

Several proposals addressed treatments or practices with low evidence for the potential to improve healthcare. Although hybrid studies, which study both effectiveness and implementation outcomes, are practical approaches to establishing the effectiveness of evidence-informed practices while measuring implementation efforts [18], none of the study proposals expressed this dual research intent or were conceived as hybrid designs.

Our findings also suggest low familiarity with and use of resources to evaluate of the strength of evidence (such as the Grading Quality of Evidence and Strength of Recommendations system [22] and the Strength of Recommendation Taxonomy grading scale [23]) for implementation science research. A more systematic evaluation of the strength of evidence [24,25,26,27] necessary to warrant implementation efforts may help to differentiate implementation science from efficacy or effectiveness research and improve understanding of the utility hybrid studies offer [28].

Expanding access to implementation science training in universities as part of the core health services research curriculum and enhancing access to professional development opportunities that focus on conceptual and methodological implementation skills in a content agnostic way would aid in building capacity for the next generation of implementation science researchers. Additionally, training programs provide an opportunity to provide guidance on both writing and evaluating the quality of implementation science grant applications.

Strengths of our results include that application of INSPECT to study proposals submitted by investigators with a wide range of implementation and improvement science-specific experience, and covering a variety of content areas. However, our results are limited in that they characterize one academic institution’s familiarity with implementation and improvement science research and the INSPECT system requires validation in other settings and over a broader range of proposal ratings. Additionally, we measured a high degree of inter-rater reliability for INSPECT when it was applied to a sample of low-scoring proposals. INSPECT’s inter-rater reliability may decrease when applied to a sample of higher quality proposals, and reviewers are required to discriminate between gradations of quality (i.e., scores of 1–3) rather than mostly scoring the absence of key items (i.e., scores of 0). Future research should test the validity of INSPECT by comparing INSPECT-assigned scores to ratings assigned to approved proposals by the NIH Dissemination and Implementation Research in Health study section. Future research should also assess the relationship between INSPECT score assignments and successful study completion to determine the utility of INSPECT as a mechanism for ensuring the quality and impact of funded research. To aid in these prospective research efforts, forthcoming proposal calls from CIIS will specifically use INSPECT as the proposal evaluation criteria.

Although multiple tools exist to aid researchers in writing implementation science proposals [6, 29, 30], few resources exist to support grant reviewers. This study identified additional functionality of Proctor et al.’s “ten key ingredients” as a guide for writers by developing it into a detailed checklist for proposal reviewers. The current research makes a substantive contribution to implementation and improvement sciences by demonstrating the utility and reliability of a new tool designed to aid grant reviewers in identifying high-quality research.


In conclusion, we operationalized an implementation and improvement research-specific scoring system to provide guidance for proposal writers and grant reviewers. We demonstrated the utility and reliability of the new INSPECT scoring systems in evaluating the quality of implementation and improvement sciences research proposed at one academic medical center. The prevalence of low scores across the majority of INSPECT criteria suggests a need to promote education about the goals of implementation and improvement science, including the conceptual and methodological distinctions from efficacy and effectiveness research.



Center for Implementation and Improvement Sciences


Implementation and Improvement Science Proposal Evaluation Criteria


National Institutes of Health


  1. Glasgow RE, Lichtenstein E, Marcus AC. Why don’t we see more translation of health promotion research to practice? Rethinking the efficacy-to-effectiveness transition. Am J Public Health. 2003;93:1261–7. Accessed 1 Mar 2018

    Article  PubMed  PubMed Central  Google Scholar 

  2. Neta G, Sanchez MA, Chambers DA, Phillips SM, Leyva B, Cynkin L, et al. Implementation science in cancer prevention and control: a decade of grant funding by the National Cancer Institute and future directions. Implement Sci. 2015;10:4.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Purtle J, Peters R, Brownson RC. A review of policy dissemination and implementation research funded by the National Institutes of Health, 2007–2014. Implement Sci. 2015;11(1)

  4. Tinkle M, Kimball R, Haozous EA, Shuster G, Meize-Grochowski R. Dissemination and implementation research funded by the US National Institutes of Health, 2005-2012. Nurs Res Pract. 2013;2013:909606.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Smits PA, Denis J-L. How research funding agencies support science integration into policy and practice: an international overview. Implement Sci. 2014;9:28.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Proctor EK, Powell BJ, Baumann AA, Hamilton AM, Santens RL. Writing implementation research grant proposals: ten key ingredients. Implement Sci. 2012;7:96.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Center for Implementation and Improvement Sciences. Pilot Grant Program; 2016. Accessed 2 Nov 2017.

  8. National Institutes of Health. Definitions of criteria and considerations for research project grant (RPG/X01/R01/R03/R21/R33/R34) Critiques | 2016. Accessed 26 Oct 2017.

  9. Green J, Thorogood N. Qualitative methods for health research. 3rd ed. Thousand Oaks: SAGE Publications Ltd; 2013.

  10. Hayes AF, Krippendorff K. Answering the call for a standard reliability measure for coding data. Commun Methods Meas. 2007;1:77–89.

    Article  Google Scholar 

  11. Krippendorff K. Content analysis: an introduction to its methodology. Thousand Oaks, CA: SAGE Publications; 2004.

    Google Scholar 

  12. R Core. R: a language and environment for statistical computing 2016.

    Google Scholar 

  13. Proctor E, Silmere H, Raghavan R, Hovmand P, Aarons G, Bunger A, et al. Outcomes for implementation research: conceptual distinctions, measurement challenges, and research agenda. Admin Pol Ment Health. 2011;38:65–76.

    Article  Google Scholar 

  14. Marsh HW, Jayasinghe UW, Bond NW. Improving the peer-review process for grant applications: reliability, validity, bias, and generalizability. Am Psychol. 2008;63:160–8.

    Article  PubMed  Google Scholar 

  15. Sattler DN, McKnight PE, Naney L, Mathis R. Grant peer review: improving inter-rater reliability with training. PLoS One. 2015;10:e0130450.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Demicheli V, Di Pietrantonj C. Peer review for improving the quality of grant applications. Cochrane Database Syst Rev. 2007:MR000003.

  17. Jonsson A, Svingby G. The use of scoring rubrics: reliability, validity and educational consequences. Educ Res Rev. 2007;2:130–44.

    Article  Google Scholar 

  18. Inouye SK, Fiellin DA. An evidence-based guide to writing grant proposals for clinical research. Ann Intern Med. 2005;142:274–82. Accessed 28 Feb 2018

    Article  PubMed  Google Scholar 

  19. Proctor EK, Landsverk J, Aarons G, Chambers D, Glisson C, Mittman B. Implementation research in mental health services: an emerging science with conceptual, methodological, and training challenges. Adm Policy Ment Heal Ment Heal Serv Res. 2009;36:24–34.

    Article  Google Scholar 

  20. Proctor EK, Powell BJ, McMillen JC. Implementation strategies: recommendations for specifying and reporting. Implement Sci. 2013;8:139.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Pinnock H, Barwick M, Carpenter CR, Eldridge S, Grandes G, Griffiths CJ, et al. Standards for reporting implementation studies (StaRI): explanation and elaboration document. BMJ Open. 2017;7:e013318.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Oxman AD. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:24–34. Accessed 28 Feb 2018

    Article  Google Scholar 

  23. Ebell MH, Siwek J, Weiss BD, Woolf SH, Susman J, Ewigman B, et al. Strength of recommendation taxonomy (SORT): a patient-centered approach to grading evidence in the medical literature. Am Fam Physician. 2004;69:548–56. Accessed 28 Feb 2018.

    PubMed  Google Scholar 

  24. Rycroft-Malone J, Seers K, Titchen A, Harvey G, Kitson A, McCormack B. What counts as evidence in evidence-based practice? J Adv Nurs. 2004;47:81–90.

    Article  PubMed  Google Scholar 

  25. Rycroft-Malone J, Seers K, Chandler J, Hawkes CA, Crichton N, Allen C, et al. The role of evidence, context, and facilitation in an implementation trial: implications for the development of the PARIHS framework. Implement Sci. 2013;8:28.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Prasad V, Ioannidis JP. Evidence-based de-implementation for contradicted, unproven, and aspiring healthcare practices. Implement Sci. 2014;9(1)

  27. McCaughey D, Bruning NS. Rationality versus reality: the challenges of evidence-based decision making for health policy makers. Implement Sci. 2010;5:39.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Bernet AC, Willens DE, Bauer MS. Effectiveness-implementation hybrid designs: implications for quality improvement science. Implement Sci. 2013;8(Suppl 1):S2.

    Article  PubMed Central  Google Scholar 

  29. Brownson RC, Colditz GA, Dobbins M, Emmons KM, Kerner JF, Padek M, et al. Concocting that magic elixir: successful grant application writing in dissemination and implementation research. Clin Transl Sci. 2015;8:710–6.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. University of Colorado Implementation Science Program. Ten key ingredients to writing successful d&i research proposals. 2018. Accessed 4 Apr 2018.

    Google Scholar 

Download references


We would like to thank the investigators who submitted pilot grant applications to the Center for Implementation and Improvement Sciences Pilot Grant Program in 2016 and 2017. Creating the Implementation and Improvement Science Proposals Evaluation Criteria would not have been possible without their submissions. We also appreciate the feedback form CIIS grant application reviewers which was instrumental in identifying the need for new scoring criteria. The CIIS team appreciates the ongoing guidance, interest, and support from David Coleman. Thanks also to Kevin Griffith for his feedback on measures of reliability.


This research was supported with funding from the Evans Medical Foundation Inc.

Availability of data and materials

The datasets generated and/or analyzed during the current study are not publicly available because they represent study proposals prepared by individual investigators. Proposal scoring data are available from the corresponding author on reasonable request.

Author information

Authors and Affiliations



DB, CGA, AJW, and MD conducted the initial thematic analysis. MD and DB created the original scoring criteria. ELC revised the scoring criteria. ELC, DB, AJW, and MD reviewed and finalized the scoring criteria. ELC and DB piloted the use of the scoring criteria and analyzed the score data. ELC drafted and revised the manuscript based on comments from coauthors. AJW, MD, DB, and EKP provided manuscript comments and revisions. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Erika L. Crable.

Ethics declarations

Ethics approval and consent to participate

This study was reviewed and determined to not qualify as human subjects research by the Boston University Medical Campus Institutional Review Board (reference number H-37709).

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Crable, E.L., Biancarelli, D., Walkey, A.J. et al. Standardizing an approach to the evaluation of implementation science proposals. Implementation Sci 13, 71 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: