Skip to main content

The Society for Implementation Research Collaboration Instrument Review Project: A methodology to promote rigorous evaluation

A Correction to this article was published on 03 January 2020

This article has been updated



Identification of psychometrically strong instruments for the field of implementation science is a high priority underscored in a recent National Institutes of Health working meeting (October 2013). Existing instrument reviews are limited in scope, methods, and findings. The Society for Implementation Research Collaboration Instrument Review Project’s objectives address these limitations by identifying and applying a unique methodology to conduct a systematic and comprehensive review of quantitative instruments assessing constructs delineated in two of the field’s most widely used frameworks, adopt a systematic search process (using standard search strings), and engage an international team of experts to assess the full range of psychometric criteria (reliability, construct and criterion validity). Although this work focuses on implementation of psychosocial interventions in mental health and health-care settings, the methodology and results will likely be useful across a broad spectrum of settings. This effort has culminated in a centralized online open-access repository of instruments depicting graphical head-to-head comparisons of their psychometric properties. This article describes the methodology and preliminary outcomes.


The seven stages of the review, synthesis, and evaluation methodology include (1) setting the scope for the review, (2) identifying frameworks to organize and complete the review, (3) generating a search protocol for the literature review of constructs, (4) literature review of specific instruments, (5) development of an evidence-based assessment rating criteria, (6) data extraction and rating instrument quality by a task force of implementation experts to inform knowledge synthesis, and (7) the creation of a website repository.


To date, this multi-faceted and collaborative search and synthesis methodology has identified over 420 instruments related to 34 constructs (total 48 including subconstructs) that are relevant to implementation science. Despite numerous constructs having greater than 20 available instruments, which implies saturation, preliminary results suggest that few instruments stem from gold standard development procedures. We anticipate identifying few high-quality, psychometrically sound instruments once our evidence-based assessment rating criteria have been applied.


The results of this methodology may enhance the rigor of implementation science evaluations by systematically facilitating access to psychometrically validated instruments and identifying where further instrument development is needed.


Identification of psychometrically strong instruments for the field of implementation science is a high priority in the United States, as underscored in a recent National Institute of Health working meeting (October 2013; Rabin et al., unpublished).a Reliable and valid instruments are critical to scientific advancement as they allow for careful collection, expression, and comparison of results of observation and experimentation [1]. Unfortunately, poor-quality instruments have slowed the discovery and application of evidence-based implementation strategies for supporting widespread delivery of evidence-based care. Many new fields face instrumentation challenges until consensus builds around high-quality measures of key constructs. Without consensus, informative, applicable instrumentation will remain slow and hindered by duplicative efforts and incommensurable results. For an in-depth discussion of instrumentation issues in implementation science, see Martinez et al. [2].

Existing instrument review efforts within the field of Dissemination and Implementation Science (DIS) focus on individual constructs such as readiness for change (e.g., [3]) constructs that predict specific implementation outcomes such as adoption [4] and on broader reviews of multi-level domains [5]. Other instrument review efforts such as the Grid-Enabled Measures Project (GEM; [6]; engage researchers and stakeholders in populating and evaluating an online repository of measures. Thus far, review efforts reveal that few instruments have undergone systematic development and are psychometrically strong. These instrument review efforts represent important contributions as they inform the state of measurement quality in the field and support a significant need for additional research in this area.

Despite these instrument review efforts, three important gaps remain. First, no existing instrument reviews include a comprehensive array of constructs relevant for DIS. A comprehensive review of constructs is important to guide instrument selection and development and then to facilitate identification of constructs that are implicated in successful implementation. Second, existing methodologies for instrument reviews are narrowly focused and only provide limited psychometric assessments of the instruments. Specifically, Chaudoir et al.’s instrument review focused only on predictive validity [5]. Although predictive validity is critical to the identification of key constructs, in the absence of also establishing reliability and/or content and construct validity, predictive validity is only marginally informative. Further, Chor et al.’s work provided dichotomous (yes/no) conclusions about the psychometric validation of instruments without providing an indication of the process for this determination [4]. These limitations of existing instrument review methodologies must be addressed to support quality measurement in this field. Third, no protocol exists to systematically develop a compendium or repository of instruments for widespread use. An open-source resource would facilitate simultaneous access to instruments and comparison between instruments with respect to their psychometric strength. A centralized online database that is searchable and provides head-to-head comparisons of instrument psychometric properties would be a significant step forward for the field.

The current project: aims and objectives

The Society for Implementation Research Collaboration (SIRC; formerly known as the Seattle Implementation Research Collaborative)b Instrument Review Project (IRP) has established a methodology for instrument review to address these gaps by a) conducting a systematic and comprehensive review of quantitative instruments assessing constructs delineated in two of the field’s most highly cited frameworks, the Consolidated Framework for Implementation Research (CFIR; [7]) and the Implementation Outcomes Framework (IOF; [8]); b) adopting and applying a systematic search process (using standard search strings); c) engaging an international team of experts to assess the full range of psychometric criteria (reliability, construct validity, and criterion validity); and d) building a centralized online, open access, evolving repository of instruments depicting graphical head-to-head comparisons of their psychometric properties. Existing instrument review and repository efforts are summarized and compared in a separate manuscript that highlights their unique contributions and the gaps in the field that the SIRC IRP seeks to fill (see Rabin et al., unpublished). In this article, we describe the SIRC IRP methodology and summarize preliminary results of the 420+ instruments that have been identified according to the following:

  • the number of instruments identified for each of the 48 DIS constructs (including the 13 subconstructs; CFIR and IOF),

  • the rigor underlying instrument development,

  • whether the construct was explicitly defined in the original article,

  • the year and field in which the instrument was created,

  • the stakeholder targeted by the instrument,

  • settings in which the instrument has been used, and

  • the number of published studies reporting use of the instrument (bibliometric data).

The findings from this methodology will inform a pressing research agenda by identifying priorities for measurement development. Moreover, the online repository will position those invested in advancing the field of implementation science (e.g., researchers and stakeholders: agency leaders, purveyors, decision makers in service provider organizations) to engage in rigorous evaluation of their implementation initiatives by providing online access to instruments, associated peer reviewed articles, and information regarding their psychometric properties. Although the resulting repository is geared towards implementation of psychosocial interventions in mental health and health-care settings to be consistent with the focus of SIRC, the repository is designed to promote the use of instruments across disciplines that will be useful to researchers and stakeholders implementing evidence-based practices across a broad spectrum of settings.


Step 1: defining the scope of the project

The instrument review protocol and development of the repository focuses on quantitative instruments used in the implementation of evidence-based practices or innovations in mental health, health care, and school settings. To adhere to this scope, we developed the following two criteria for identifying relevant instruments: a) if the instrument assesses some aspect of implementation science with regard to settings where mental health interventions are used, it will be regarded as relevant; and b) if an instrument can be easily adapted to make its subject pertinent to the mental health field, it will be deemed relevant (e.g., only the name of the intervention, population, or setting would need to be changed within the instrument).

Step 2: selecting theoretical frameworks to guide the review

Our team prioritized identifying a theoretical framework that could guide identification and organization of the instruments according to key DIS constructs. Although there are over 60 guiding frameworks for DIS ([9]; e.g., PARiHS [10], DoI [11], PRISM [12]), there is little agreement and little empirical evidence on which constructs are more important for planning and evaluation [13]. Few theoretical frameworks come close to comprehensively outlining the diverse array of constructs and domains implicated. However, two of the most highly cited frameworks were selected to categorize and organize instruments: (1) the CFIR [3] and (2) the IOF [4].

The CFIR was an obvious first choice as it fits with our goal to be as comprehensive as possible. Specifically, the CFIR is a meta-theoretical framework generated to address the lack of uniformity in the DIS theory landscape that minimizes overlap and redundancies in available frameworks, separates ideas that had been formerly seen as inextricable, and creates a uniform language for the domains and constructs of DIS. Our team conceptualizes the CFIR constructs as potential predictors, moderators, and mediators or “drivers” of DIS outcomes. Despite the fairly comprehensive nature of the CFIR, it is limited in that clearly defined outcomes for DIS are missing. DIS outcomes are distinct from clinical treatment and service system outcomes. Implementation outcomes are typically measured in implementation activities, can advance understanding of the implementation processes, enhance efficiency in implementation research, and pave the way for studies of the comparative effectiveness of implementation strategies [8]. To address this limitation, our team identified a second framework put forth by Proctor et al.’s work delineating “implementation outcomes” [8]. The isolation and concrete operationalization of implementation outcomes, separate from service and client outcomes, was a unique and important addition to the literature (Table 1). This added focus may be critical in future research seeking to understand the temporal relations between constructs. Our team conceptualizes implementation outcomes, such as penetration and sustainability, as dependent variables in a DIS process and, therefore, as integral constructs warranting inclusion in a comprehensive review of DIS instruments. A detailed review of the theories and frameworks summarized here can be found elsewhere [9].

Table 1 Listing of included and excluded constructs from the organizing frameworks

In sum, by combining the two frameworks, the resulting repository would include instruments based on a comprehensive listing of constructs implicated at the inception of an implementation project, throughout the early stages of an implementation, as well as those thought to contribute to the success of an implementation initiative. Constructs are defined here as factors inside domains that predict, moderate, or mediate DIS as well as implementation outcomes. The following domains guide review of the DIS instrument literature: characteristics of the intervention, outer setting, inner setting, characteristics of the individuals involved in implementation, process, implementation outcomes, and client outcomes (see Table 1).

Step 3: generating a search protocol for the literature review of constructs

Utilizing the CFIR and IOF, a scoping review of the DIS literature was conducted, broadly, in search of instruments and related articles that purportedly measured each of the 48 constructs (including subconstructs). Scoping reviews are a useful first step to inform the parameters of subsequent systematic reviews [14]. In our scoping review process, we completed searches of PsycINFO and Web of Science to explore the landscape of DIS instruments and identify those relevant to mental health. This first pass of the literature on DIS constructs resulted in identification of 105 instruments.

This exploratory stage was integral to setting search parameters to guide the subsequent review. This task was undertaken using the help of a trained information specialist. From this scoping review, a publication date parameter was set to include only those articles published after 1985 to maximize the relevance of instruments identified given how recently the science of dissemination and implementation has emerged. Drawing upon the work of Straus et al. [15], McKibbon et al. [16], and Powell et al. [17] who published helpful search strings for DIS literature reviews, a core set of search word strings that reflected the parameters of the project were identified (see core search strings in Additional file 1). Titles and abstracts were examined to exclude obviously irrelevant articles. Articles that survived the title and abstract review were then reviewed more thoroughly with special attention paid to the articles’ method sections. In addition, the articles’ references were reviewed and articles that appeared likely to yield new instruments were accessed.

Once an instrument was identified as relevant, it was sent to the project leads (i.e., C.C.L., C.S., R.G.M., and B.J.W.) for verification. Disagreements were resolved through careful review and consensus among our core workgroup. Disagreements were most often a result of issues of homonymy and synonymy as described in Martinez et al. [2], failure of the author to define the construct of interest, and misalignment (or multiple alignment) of the targeted construct with the constructs delineated by the organizing frameworks. In each case, at least two core workgroup members reviewed all available material and took one of the following actions: place the instrument within its most relevant construct, place the instrument within multiple constructs for ease of access, or exclude the instrument altogether.

The initial construct reviews were replicated by a team of research assistants (RA) at a second site. Each instrument author was contacted to obtain the full-length instrument in the event it was not included in the original article and to request permission to post the instrument under the password protection of the SIRC website for members to access. This process sought to improve the yield of available instruments to populate the developing repository.

Concurrent with the review of published literature, a snowball sampling email procedure was used to locate instruments in preparation or otherwise unpublished instruments. This was particularly important for preventing the creation of redundant instruments and extends this methodology beyond that of a typical systematic review. The snowball sampling technique accessed DIS stakeholders through relevant email LISTSERV (e.g., SIRC membership; Association of Behavioral and Cognitive Therapies Dissemination and Implementation Science Special Interest Group) and personal contacts. DIS-related websites across disciplines with a particular focus on mental health and health care were also reviewed for instruments or related papers and authors were subsequently contacted. Stakeholders who received emails from our group were encouraged to share the email request for DIS instruments with colleagues in the field.

Step 4: the literature review of specific instruments—extending beyond a systematic review

In the instrument review phase, we systematically compiled all information regarding each identified instrument, particularly with respect to the development of psychometrics and any data relevant to the evidence-based assessment (EBA) criteria described below in step 5. This step is a significant deviation from a typical systematic review protocol, but a necessary and effective innovation for our methodology to evaluate and synthesize the literature and produce a decision aid for researchers and stakeholders. As with the construct reviews, PsycINFO and Web of Science served as the primary databases for the instrument review. The instrument name written in quotations (e.g., “Treatment Acceptability Rating Form”) served as the primary search string; the search was then limited by drawing upon the core set of search terms outlined in Additional file 1. Specific instrument reviews were replicated by a second RA. When completed, all documents pertaining to a single instrument were compiled and combined into a single PDF (henceforth referred to as a packet) in preparation for the quality assessment phase in step 6: data extraction and rating.

Step 5: development of the evidence-based assessment rating criteria

In order to ensure that all identified instruments are evaluated for their psychometric qualities using a relevant system that is amenable to a large-scale collaborative effort, we developed an evidence-based assessment rating criteria. These criteria were derived from the EBA criteria of Hunsley and Mash’s earlier work that focused on standardized patient outcome measures [18] and from the work of Terwee et al. [19]. These criteria will ensure that all identified instruments are evaluated for their psychometric qualities using a standardized system. To reduce rater subjectivity and enhance inter-rater reliability, the criterion anchors needed to be especially concrete. The main modifications included increasing the number of anchors (from 3–5) to promote variability of the ratings.

To maximize the utility and relevance of the EBA criteria for the purposes of DIS, the first draft was sent to 106 expert DIS scientists, members of the SIRC Network of Expertise. We obtained 60 responses containing rich conceptual (e.g., how to include DIS-specific criterion) and practical (e.g., how to improve the likelihood that anchors would be selected reliably) feedback. All 60 responses were reviewed and integrated by the project’s core workgroup. The second draft of the EBA rating criteria was then sent to local experts in classical test theory and test development. A third version of the EBA criteria emerged from further revising the anchors in accordance with the expert feedback. In total, this final version of the EBA rating system included six criteria reflecting: norms, reliability information, criterion (predictive) and construct (structural) validity information, responsiveness (sensitivity to change), and usability (assessed by length). Each criterion included a five-point anchoring system for rating ranging from “0” or “no evidence” to “4” or “excellent evidence” (see Table 2 for the final version of the EBA).

Table 2 Evidence-based assessment criteria

Step 6: data extraction and rating instruments

The data extraction phase is ongoing to capture the most up-to-date public information on the instruments included in the repository. In this phase, the data is extracted by independent reviewers (RAs) using a standardized, piloted extraction procedure. Specifically, data referencing EBA-relevant information is highlighted and labeled by an RA for each article in every packet (which contains the instrument, the source article, and all associated peer reviewed publications in which the instrument is used). The purpose is to have well-trained RAs systematically complete the data extraction to promote ease of rating by the volunteer task force member (i.e., expert implementation scientist). Each packet is randomly assigned to an in-house advanced RA (often a PhD-level research scientist) plus one task force member to be rated for its psychometric strength and usability using the EBA criteria. Modeled after the work of Terwee et al. [19], we employed a “worst score counts” methodology. This is an intentionally conservative approach that also facilitates reliability in the rating process. Cohen’s kappa is computed to assess inter-rater reliability, and rating discrepancies are resolved through consensus among the core workgroup.

Figure 1 presents an illustration of the EBA criteria application and the resulting graphical displays of criterion scores. In this figure, two measures of evidence-based practice acceptability were evaluated according to the EBA rating process. As depicted in Figure 1, the Evidence-based Practice Attitudes Scale (EBPAS), a 15-item self-report measure that assesses “mental health provider attitudes toward adoption of evidence-based practice” [20], is directly compared with Addis and Krasnow’s 17-item self-report measure that assesses practitioners’ attitudes towards treatment manuals [21]. Using the worst score counts methodology and available data, the ratings reveal that the EBPAS is of high psychometric quality overall. Both instruments appear to have garnered strong psychometric properties including established structural validity (i.e., EFA/PCA analyses have accounted for more than 50% of variance), available norms, and fewer than 50 items. However, readers have the capacity to determine for themselves which qualities are most important (e.g., responsiveness versus predictive validity). The EBPAS has demonstrated stronger internal consistency and is more responsive (i.e., sensitive) to change. Conversely, Addis and Krasnow’s [21] measure appears to have more consistently predicted criterion measures. Important to note is that the EBPAS has demonstrated predictive validity in previous studies (e.g. [22]) but not in all. This is a prime example of how the worst score count methodology operates and affects the interpretation of instrument comparisons.

Figure 1
figure 1

A head-to-head comparison of the evidence-based practice attitudes scale (EBPAS) and the practitioner’s attitudes towards treatment manuals scale psychometric properties. Total possible score equals 24. Criteria rated 0 to 4: 0 = “none”, 1 = “minimal”, 2 = “adequate”, 3 = “good”, and 4 = “excellent” [20,21].

Step 7: population of the website repository

Once both sets of ratings are attained, data are converted into a head-to-head graphical comparison that depicts the relative and absolute psychometric strength of an instrument relative to others for that construct (see Figure 1). This information is contained in the website repository alongside the instrument and links to all relevant literature. This step is integral for researchers and other stakeholders to efficiently judge the state of instrumentation for each construct.

Preliminary results and discussion

Preliminary results

Despite identifying over 420 instruments across the 48 DIS constructs (including subconstructs), we uncovered critical gaps in DIS instrumentation. Preliminary results highlight constructs for which few to no instruments exist (see Table 3). Specifically, our review methodology revealed no instruments for the following constructs, many of which fall within CFIR’s outer setting domain: complexity of the intervention, intervention design quality and packaging, intervention source, external policies and incentives, peer pressure, tension for change, goals and feedback, formally appointed internal implementation leaders, and engaging champions. Many other constructs appear to have only one or two instruments available (e.g., compatibility, relative priority). These preliminary results suggest that there is a great need for instrument development to advance DIS, particularly in the critical domain of outer setting. In the absence of outer setting measures, the field will be challenged to identify the role that these constructs play in successful implementation across different contexts. Interestingly, despite the recently renewed NIH program announcement explicitly highlighting their interest in instrument-related proposals, they have received few proposals centered on instrument development (David Chambers DPhil, personal communication, October 24, 2013).

Table 3 Summary of preliminary results

Numerous constructs have 20 or more available instruments (e.g., acceptability, adoption, organizational context, culture, implementation climate, knowledge and beliefs about the intervention, other personal attributes, planning, reflecting, and evaluating), suggesting saturation. However, without readily available information on what exists nor the psychometric properties and associated decision making tools, DIS researchers and stakeholders may continue to develop instruments in these seemingly saturated areas or select poorly constructed instruments that will hinder scientific progress. It is important for researchers and stakeholders to carefully consider the applicability of available instruments to promote cross-study comparisons, which is a necessary process for building the DIS knowledge base.

Figure 2 depicts the timeline across which identified instruments were developed (“year developed” is based on the year in which the original article was published). That is, based on our search parameter (i.e., beginning in 1985), less than one quarter (23.17%) of all identified instruments were developed prior to 1999 (14-year period), whereas one quarter (25.61%) of instruments have been developed since 2009 (4-year period), reflecting the growth of DIS in recent years. Notably, and perhaps not surprisingly, over one third (34.90%) of instruments for implementation outcomes have been developed since the seminal paper by Proctor et al. was published [8]. Proctor et al. articulated a research agenda for DIS outcome evaluations that appears to have positively influenced instrument development.

Figure 2
figure 2

Timeline of instrument development.

Table 4 summarizes the six discrete fields from which the instruments emerged. The majority of instruments tapping implementation outcomes emerged from subfields of Psychology. Instruments tapping intervention characteristics stem from Psychology and Public Health or Government research. Inner setting instruments emerged from the previously mentioned fields, although more significantly from Organizational, Workplace, and Business literatures. Instruments tapping characteristics of individuals, process, and client outcomes were generated from a range of fields including those listed previously but also Medicine and Education. The breakdown of fields from which the identified instruments were generated suggests that Psychology and its subfields have contributed immensely to the evaluation of DIS, representing a higher average number of instruments than any other field across constructs (M = 3.91). Notably, the discipline from which the instruments emerged was consistent with the strengths of each field.

Table 4 Fields from which instruments originated

Tables 5 and 6 reflect the stakeholders targeted by each instrument and the contexts in which the instruments have been used, respectively. Across domains, the majority of instruments were developed to target the service provider rather than the service director, supervisor, or consumer. However, measures of intervention characteristics and process targeted stakeholders in the “other” category, encompassing a range of general staff as well as researchers. In line with the field from which the instruments originated and the scope of the review, the majority of instruments have since been used in mental health settings.

Table 5 Stakeholders targeted by instruments
Table 6 Contexts in which instruments have been used

Bibliometric data available for each of the identified instruments (see Table 3) makes it possible to deduce which instruments have been perceived favorably by researchers conducting DIS via publication counts for each instrument. This information is of course confounded by the year in which the instrument was developed and thus should be interpreted with caution. To date, instruments tapping inner setting are the most frequently used and published. Notably, compatibility instruments have an average of 11 publications, followed by combined instruments (e.g., culture and climate, average of 9.22 instruments). External change agent instruments have an average of 10 published articles. Implementation outcomes are receiving greater attention in the literature; despite having far fewer publications, there is steady growth over the recent years.

With data extraction and psychometric ratings ongoing (step 6), we can nevertheless provide a preliminary account of the quality of the identified instruments. Across the 48 constructs (including subconstructs), an average of 71% included explicit construct definitions. This suggests that the construct validity of approximately one quarter of the instruments, which is based on careful operationalization of constructs according to their theoretical underpinnings, is questionable. In the absence of explicit construct definitions, use of identified instruments by other teams requires investigators to make assumptions about the instrument’s construct validity based on available items, which may be challenging given the potential overlapping nature of constructs within domains (e.g., the construct of appropriateness is often used synonymously with perceived fit, relevance, compatibility, suitability, usefulness, and practicability; [8]). Until consensus among constructs and terms is achieved [23], this practice may compromise the generalizability of study findings.

A second set of preliminary results suggests that in general, the identified DIS instruments are of poor quality. Specifically, we developed a coding system to rate the stages of systematic development through which each instrument should progress. Eight stages were identified based on seminal work of Walsh and Betz [24]: (1) construct is defined, (2) initial items are generated by a group of experts, (3) pilot test of items with representative sample, (4) validity and reliability tests conducted based on pilot testing, (5) instrument is refined based on pilot results, (6) refined instrument is administered to the targeted sample, (7) validity and reliability tests are performed, and (8) psychometric properties are reported. Each instrument was coded such that 1 point was assigned for each aforementioned stage through which the instrument progressed as reported in the original articles. Table 3 indicates that on average, the instruments identified did not even pass through three (of a possible eight) full stages of “proper” instrument development based on our coding system. These preliminary results suggest that the systematic development and psychometric characteristics of the body of instruments available in DIS is weak at best. However, these findings need to be substantiated by our rigorous psychometric evaluation, which is currently underway, in order to place confidence in these observations.

A comparison of SIRC’s methodology to existing reviews and repositories

To date, using this multi-faceted and collaborative search, synthesis, and evaluation methodology, SIRC’s IRP has identified over 420 instruments tapping 48 constructs (including subconstructs) relevant to DIS. Use of this methodology, which combines systematic review techniques with email snowball sampling (to identify instruments in progress) and ongoing review of the latest publications, has resulted in a more comprehensive DIS instrument database than previous efforts. Specifically, although Chaudoir et al. [5] employed a systematic review of key DIS domains (i.e., structural, organizational, provider, patient, and innovation, as opposed to constructs: e.g., intervention adaptability, external policy, and incentives), they identified only 62 instruments which is substantially fewer than the 420+ instruments revealed by the SIRC methodology. We posit that the low number of instruments identified by Chaudoir et al. is due to the exclusion of instruments that assess implementation outcomes, arguably the most critical domain of DIS constructs to date, and due to the fewer number of domains included in their review.

Moreover, our review methodology is unique because unlike previous reviews, all literature pertaining to each instrument has been identified to enable accurate conclusions about individual instrument quality. Previous efforts to employ a collaborative instrument review process, notably the GEM [6], do not systematically locate all available literature to rate the quality of the instruments. Rather, the GEM approach encourages website users to provide their own ratings regardless of user knowledge of the extant literature.


This multi-faceted methodology has potential long-term implications for DIS. Upon creation of the repository, researchers and stakeholders will have a relevant and useful resource for identifying available and psychometrically sound DIS instruments, thereby reducing the need to create “homegrown” instruments (i.e., relevant for one-time use; [8]) to evaluate their DIS efforts. We anticipate that access to the repository will encourage repeated use of the same, high-quality instruments to measure similar constructs across settings, reduce instrument redundancy, and increase the potential for the DIS field to evolve more rapidly. In addition to being a resource for existing DIS instruments, the repository may stimulate new areas of research and instrument development given that some constructs are saturated whereas others are lacking in instrumentation. Our preliminary results also signal a need for new instrumentation targeting non-provider stakeholders such as leaders and external change agents (e.g., implementation practitioners or intermediaries), particularly in light of research identifying the role they play in implementation success (e.g., [25]). The ongoing application of our evidence-based assessment rating criteria leads us to anticipate a dearth of high-quality, psychometrically sound instruments, which will signal a need for instrument development of greater quality.

Although the above suppositions represent more short-term implications, long-term implications of this review are twofold, at minimum. First, the application of the EBA rating criteria as described in step 6 will aid in identifying psychometrically strong instruments and a potential consensus battery of high-quality, essential DIS instruments as a basic resource for researchers and stakeholders to advance cross-study comparisons. Second, it is our intention that the SIRC repository will be a dynamic resource. That is, the repository will grow with the evidence base to incorporate newly developed and/or tested instruments, as well as instruments identified via methodologies of colleagues completing relevant research (e.g., crowd sourcing methods). We believe this dynamic process will improve the efficiency and rigor of implementation science evaluations as a whole.


There are several noteworthy limitations inherent in this methodology. To ensure rigor and quality of the resulting repository, each step is meticulous and necessarily time-consuming and must be replicated by a second party. As a result, the intensity of time, resources, and personnel required by this comprehensive and multi-faceted methodology may be a potential limitation. Specifically, (a) initial literature reviews to identify instruments for targeted constructs take approximately 1.5–3 h, (b) cross-checking reviews take an additional 45 min–1 h, (c) instrument-specific literature reviews take an average of 2.5–4 h; (d) cross-checking instrument-specific literature reviews adds 1–3 h, and (e) rating requires an average of 50 min to complete. Because of limited funding, these preliminary results have taken roughly 2 years to achieve. It is highly encouraging, however, that the careful creation of project protocols and international support forthcoming for this project have allowed us to engage multiple core worksites and a large task force committed to realizing the goals of the SIRC IRP. Moreover, the lead authors (CCL, CS, and BJW) anticipate receiving grant funding from the National Institute of Mental Health to extend this work to also include pragmatic ratings of instruments, a critical domain for advancing the practice of implementation in real-world settings [26]. Another potential limitation of our work centers on the specific frameworks used to guide construct selection. Basing our work on the CFIR [7] and Implementation Outcomes Framework [8] provides a comprehensive conceptual framework, yet it is clear that DIS investigators employ diverse frameworks delineating unique constructs not included in the SIRC IRP [2]. Nonetheless, we are hopeful that the thoughtful selection of these comprehensive and complementary frameworks will identify and make accessible a range of high-quality instruments that will be relevant to the majority of interested researchers and stakeholders.

Conclusions and future directions

This multi-faceted and collaborative methodology is perhaps the most comprehensive attempt to identify, evaluate, and synthesize DIS instruments to date. Moving forward, we will review literature as it is published to ensure that this repository evolves with developing research, hence the need for a website platform. We have assigned a research assistant to review the Implementation Network monthly e-newsletter for additional instruments of relevance to our comprehensive review. In addition, a function for setting Google Scholar alerts according to our search strings will be implemented to review research published on a weekly basis to add relevant instruments and literature to our database.

In collaboration with our web master, we will design functionality to enable researchers and stakeholders to access, share (upload), and interact with the content. Researchers and stakeholders who are SIRC membersc will be able to access, contribute, and track the dynamic expansion of the repository, receive notifications when new instruments are rated and added, and will be encouraged to engage with the development efforts. In addition, the repository will have inherent functionality to invite researchers and stakeholders who access instruments to share their data with the “community”. Our long-term goal is to build a large, open-access dataset ripe for the application of more complex analyses of the instruments’ psychometric properties. The results of the rigorous SIRC Instrument Review Project methodology will position the field to engage in careful evaluation of DIS efforts. The resulting decision aid with head-to-head graphical comparisons of instrument qualities will facilitate instrument identification and selection with open access and position researchers and stakeholders to employ psychometrically validated instruments and contribute to focused instrument development efforts.


aImplementation science refers to the scientific study of strategies used to integrate evidence into real-world settings [27]. Implementation practice is the act of integrating evidence into real-world settings [28]. Instrument, in the case of this project, refers to quantitative tools, surveys, or measures that can be administered to individuals to obtain perspectives or information regarding their experience. Psychometric properties refer to outcomes of psychological testing of an instrument that reflects how well it measures a construct of interest with respect to reliability and validity.

bInstrument Review Task Force members listed in alphabetical order: Drs. Gregory Aarons, Cassidy Arnold, Melanie Barwick, Rinad Beidas, Helen Best, Elisa Borah, Craig Bryan, Adam Carmel, Mark Chaffin, Kate Comtois, Laura Damschroder, Dennis Donovan, Shannon Dorsey, Michelle Duda, Julia Felton, Dean Fixsen, Howard Goldman, Carmen Hall, Rochelle Hanson, Petra Helmond, Amanda Jensen-Doss, Sarah Kaye, Meghan Keough, Sara Landes, Cara Lewis, Marsha Linehan, Aaron Lyon, Michael McDonell, Kate McHugh, Maria Mancebo, Shari Manning, Christopher Martell, Erin Miga, Brian Mittman, Sandra Naoom, Byron Powell, Raphael Rose, Lisa Ruble, Joe Ruzek, Anju Sahay, Sonja Schoenwald, Rebecca Selove, Jeffrey Smith, Cameo Stanick, Bradley Steinfeld, Phil Ullrich, Elizabeth A. Wells, and Shannon Wiltsey Stirman.

cAnyone can register to be a SIRC member at and thus have access to the repository.

Change history

  • 03 January 2020

    Following publication of the original article [1] the authors reported an important acknowledgement was mistakenly omitted from the ‘Acknowledgements’ section. The full acknowledgement is included in this Correction article:



Consolidated Framework for Implementation Research


Dissemination and Implementation Science


Diffusion of innovations


Evidence-based assessment


Grid-Enabled Measures Project


National Institutes of Health


Promoting Action in Health Services Framework


Practical Robust Implementation and Sustainability Model


Research assistant


Society for Implementation Research Collaboration (formerly known as Seattle Implementation Research Conferences)


SIRC Instrument Review Project


  1. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: theory and application. Am J Med. 2006;119:166.e7.

    Article  Google Scholar 

  2. Martinez RG, Lewis CC, Weiner BJ. Instrumentation issues in implementation science. Implement Sci. 2014;9:118.

    Article  Google Scholar 

  3. Weiner B, Amick H, Lee S-Y. Conceptualization and measurement of organizational readiness for change: a review of the literature in health services research and other fields. Med Care Res Rev. 2008;65(4):379–436.

    Article  Google Scholar 

  4. Chor KHB, Wisdom JP, Olin S-CS, Hoagwood KE, Horwitz SM. Measures for predictors of innovation adoption. Adm Policy Ment Health Ment Health Serv Res. 2014;1–29. doi:10.1007/s10488-014-0551-7.

    Google Scholar 

  5. Chaudoir SR, Dugan AG, Barr CH. Measuring factors affecting implementation of health innovations: a systematic review of structural, organizational, provider, patient, and innovation level measures. Implement Sci. 2013;8:22.

    Article  Google Scholar 

  6. Rabin BA, Purcell P, Naveed S, Moser RP, Henton MD, Proctor EK, et al. Advancing the application, quality and harmonization of implementation science measures. Implement Sci. 2012;7:119.

    Article  Google Scholar 

  7. Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implement Sci. 2009;4:50.

    Article  Google Scholar 

  8. Proctor EK, Landsverk J, Aarons G, Chambers D, Glisson C, Mittman B. Implementation research in mental health services: an emerging science with conceptual, methodological, and training challenges. Adm Policy Ment Health Ment Health Serv Res. 2009;36:24–34.

    Article  Google Scholar 

  9. Tabak RG, Khoong EC, Chambers DA, Brownson RC. Bridging research and practice: models for dissemination and implementation research. Am J Prev Med. 2012;43:337–50.

    Article  Google Scholar 

  10. Rycroft-Malone J. The PARIHS framework – a framework for guiding the implementation of evidence-based practice. J Nurs Care Qual. 2004;19:297–304.

    Article  Google Scholar 

  11. Rogers EM. Diffusion of Innovations. New York: Free Press; 2003.

    Google Scholar 

  12. Feldstein AC, Glasgow RE. A practical, robust implementation and sustainability model (PRISM). Jt Comm J Qual Patient Saf. 2008;34:228–43.

    Article  Google Scholar 

  13. Damschroder LJ, Goodrich DE, Robinson CH, Fletcher CE, Lowery JC. A systematic exploration of differences in contextual factors related to implementing the MOVE! weight management program in VA: a mixed methods study. BMC Health Serv Res. 2011;11:248.

    Article  Google Scholar 

  14. Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8:19–32.

    Article  Google Scholar 

  15. Straus S, Tetroe J, Graham ID. Knowledge Translation in Health Care: Moving from Evidence to Practice. New York: Wiley; 2013.

    Book  Google Scholar 

  16. McKibbon KA, Lokker C, Wilczynski NL, Ciliska D, Dobbins M, Davis DA, et al. A cross-sectional study of the number and frequency of terms used to refer to knowledge translation in a body of health literature in 2006: a Tower of Babel. Implement Sci. 2010;5:16.

    Article  Google Scholar 

  17. Powell BJ, McMillen JC, Proctor EK, Carpenter CR, Griffey RT, Bunger AC, et al. A compilation of strategies for implementing clinical innovations in health and mental health. Med Care Res Rev. 2012;69:123–57.

    Article  Google Scholar 

  18. Hunsley J, Mash EJ. Introduction to the special section on developing guidelines for the evidence-based assessment (EBA) of adult disorders. Psychol Assess. 2005;17:251.

    Article  Google Scholar 

  19. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42.

    Article  Google Scholar 

  20. Aarons GA. Mental health provider attitudes toward adoption of evidence-based practice: The Evidence-Based Practice Attitude Scale (EBPAS). Ment Health Serv Res. 2004;6:61–74.

    Article  Google Scholar 

  21. Addis ME, Krasnow AD. A national survey of practicing psychologists’ attitudes toward psychotherapy treatment manuals. J Consult Clin Psychol. 2000;68:331.

    Article  CAS  Google Scholar 

  22. Beidas RS, Edmunds JM, Marcus SC, Kendall PC. Training and consultation to promote implementation of an empirically supported treatment: a randomized trial. Psychiatr Serv. 2012;63:660–5.

    Article  Google Scholar 

  23. Michie S, Fixsen D, Grimshaw JM, Eccles MP. Specifying and reporting complex behaviour change interventions: the need for a scientific method. Implement Sci. 2009;4:1–6.

    Article  Google Scholar 

  24. Walsh WB, Betz NE. Tests and Assessment. Upper Saddle River: Prentice-Hall; 1995.

    Google Scholar 

  25. Aarons GA, Ehrhart MG, Farahnak LR. The implementation leadership scale (ILS): development of a brief measure of unit level implementation leadership. Implement Sci. 2014;9:45.

    Article  Google Scholar 

  26. Glasgow RE. What does it mean to be pragmatic? Pragmatic methods, measures, and models to facilitate research translation. Health Educ Behav. 2013;40:257–65.

    Article  Google Scholar 

  27. Eccles MP, Mittman BS. Welcome to implementation science. Implement Sci. 2006;1:1–3.

    Article  Google Scholar 

  28. Weisz JR, Ng MY, Bearman SK. Odd couple? Reenvisioning the relation between science and practice in the dissemination-implementation era. Clin Psych Sci. 2014;2:58–74.

    Article  Google Scholar 

Download references


The preparation of this manuscript was supported, in kind, through the National Institutes of Health R13 award entitled, “Development and Dissemination of Rigorous Methods for Training and Implementation of Evidence-Based Behavioral Health Treatments” granted to PI: KA Comtois from 2010 to 2015. Dr. Bryan J. Weiner’s time on the project was supported by the following funding: NIH CTSA at UNC UL1TR00083. We would also like to acknowledge the numerous undergraduate research assistants (RAs) who contributed countless hours to this project. Indiana University RAs listed in alphabetical order: Hayley Ciosek, Caitlin Dorsey, Dorina Feher, Sarah Fischer, Amanda Gray, Charlotte Hancock, Hilary Harris, Elise Hoover, Taylor Marshall, Elizabeth Parker, Paige Schultz, Monica Schuring, Theresa Thymoski, Lucia Walsh, Kaylee Will, Rebecca Zauel, Wanni Zhou, Anna Zimmerman, and Nelson Zounlome. University of Montana RAs (undergraduate and graduate) listed in alphabetical order: Kaitlyn Ahlers, Sarah Bigley, Melina Chapman, May Conley, Lindsay Crosby, Bridget Gibbons, Eleana Joyner, Samantha Moore, Julie Oldfield, Kinsey Owen, Amy Peterson, and Mark Turnipseed. University of North Carolina RAs: Emily Haines and Connor Kaine.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Cara C Lewis or Cameo F Stanick.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

CCL and KAC initially conceptualized the project as a priority product of SIRC. CCL, CS, and BJW are project Co-PIs and collaborate on the design and coordination, while KAC provides project oversight. CCL and CS manage the undergraduate research assistants (URA) working at Indiana University and University of Montana, respectively. BJW leads a method core group at UNC on the instrument quality rating process, where MK serves as the rating expert and trainer. RGM wrote the project protocols and trains and supervises the work of the URAs as well as drafted the introduction of the manuscript and compiled appropriate references. CS drafted the “Methods” section of the manuscript. CCL drafted all other components of the manuscript. KAC, BJW, and MB all made significant contributions to the framing, editing, organization, and content of the manuscript. All authors read and approved the final manuscript.

Cara C Lewis and Cameo F Stanick contributed equally to this work.

Additional file

Additional file 1:

Search string parameters for literature review. Standard search string parameters developed for the construct literature reviews.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit

The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lewis, C.C., Stanick, C.F., Martinez, R.G. et al. The Society for Implementation Research Collaboration Instrument Review Project: A methodology to promote rigorous evaluation. Implementation Sci 10, 2 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: