Evaluation of measures of sustainability and sustainability determinants for use in community, public health, and clinical settings: a systematic review
Implementation Science volume 17, Article number: 81 (2022)
Sustainability is concerned with the long-term delivery and subsequent benefits of evidence-based interventions. To further this field, we require a strong understanding and thus measurement of sustainability and what impacts sustainability (i.e., sustainability determinants). This systematic review aimed to evaluate the quality and empirical application of measures of sustainability and sustainability determinants for use in clinical, public health, and community settings.
Seven electronic databases, reference lists of relevant reviews, online repositories of implementation measures, and the grey literature were searched. Publications were included if they reported on the development, psychometric evaluation, or empirical use of a multi-item, quantitative measure of sustainability, or sustainability determinants. Eligibility was not restricted by language or date. Eligibility screening and data extraction were conducted independently by two members of the research team. Content coverage of each measure was assessed by mapping measure items to relevant constructs of sustainability and sustainability determinants. The pragmatic and psychometric properties of included measures was assessed using the Psychometric and Pragmatic Evidence Rating Scale (PAPERS). The empirical use of each measure was descriptively analyzed.
A total of 32,782 articles were screened from the database search, of which 37 were eligible. An additional 186 publications were identified from the grey literature search. The 223 included articles represented 28 individual measures, of which two assessed sustainability as an outcome, 25 covered sustainability determinants and one explicitly assessed both. The psychometric and pragmatic quality was variable, with PAPERS scores ranging from 14 to 35, out of a possible 56 points. The Provider Report of Sustainment Scale had the highest PAPERS score and measured sustainability as an outcome. The School-wide Universal Behaviour Sustainability Index-School Teams had the highest PAPERS score (score=29) of the measure of sustainability determinants.
This review can be used to guide selection of the most psychometrically robust, pragmatic, and relevant measure of sustainability and sustainability determinants. It also highlights that future research is needed to improve the psychometric and pragmatic quality of current measures in this field.
This review was prospectively registered with Research Registry (reviewregistry1097), March 2021.
Maintaining the delivery and health impact of evidence-based interventions (EBIs) over time is a challenge across a range of community, public health, and clinical settings [1,2,3]. A 2020 systematic review of 18 multi-component school-based public health interventions found that none of the interventions continued to be delivered in their entirety (i.e., all components) once active implementation support (i.e., provision of start-up funding and/or other resources) ceased . Similarly, only seven of 18 evaluations sustained clinical practice guidelines in a variety of healthcare settings following active implementation in a recent systematic review . Understanding why EBI implementation attenuates over time, and how best to support their long-term delivery is necessary to ensure that implementation investments are worthwhile. This concept, referred to as “sustainability,” is an important outcome in implementation science .
Similar to other emerging fields, the definitions relating to concepts of sustainability have been varied and at times conflicting , emphasising the call for a nomenclature in this field. However, more recently a recommended definition of sustainability has been recognised as “the continued delivery of an innovation or intervention, potentially after adaptation, at a sufficient level to ensure the continued health impact and benefits of the intervention” . While sustainability determinants are defined as “the characteristics or factors associated with the continued use and impact of an EBI” [8,9,10]. Several frameworks recognise and conceptualise the complex and dynamic nature of sustainability [2, 11,12,13]. The Integrated Sustainability Framework developed by Shelton and colleagues (2018)  outlines recommendations on how sustainability should be conceptualised and measured. It also organises influential multi-level factors (i.e., determinants) into five domains (i.e., outer context, inner context, intervention characteristics, processes, and implementer and population characteristics) [2, 14].
Central to any field is measurement validity, or the ability to accurately measure relevant concepts, outcomes, and constructs. To do this, a measure should comprehensively and adequately cover the intended construct. This is known as content validity  and is recognised as one of the most important measurement properties . For measures of sustainability as an outcome to have adequate content validity, they should encompass the features of a multi-component definition, such as that proposed by Moore and reflect concepts of time, continued delivery of the EBI, maintained behaviour change, evolution and/or adaptation of the program, and continued health and other benefits . Measures should also illustrate reliability and evidence of other domains of validity (e.g., concurrent validity), to ensure accuracy and reduce error. Finally, measures should exhibit important pragmatic qualities, including easy access, use, scoring, and interpretation . Pragmatic qualities are less frequently evaluated but are essential in ensuring the uptake of reliable and valid measures.
Identifying and measuring sustainability, as well as factors related to sustainability (i.e., determinants), is complex given the diverse and dynamic settings being studied. Consequently, many existing measures have only been used once , illustrating limited standardisation in measurement. This makes it difficult to compare and synthesise findings across studies. Furthermore, there has been a lack of distinction between measures of sustainability determinants and sustainability as an outcome [2, 9].
High-quality systematic reviews on available measures, their psychometric and pragmatic properties, and how they have been empirically used are essential for providing evidence-based recommendations on which measures to use, identifying gaps in measurement and highlighting areas for future research . There are two systematic reviews exploring measures of sustainability as an implementation outcome in health care settings focused on mental health and substance use [18, 20]. Overall, psychometric assessment reporting was poor, with only one psychometric indicator; norms, reported in more than half of the identified sustainability measures. They also found that most (54%) measures were used only a single time. While these two reviews provide a thorough evaluation of sustainability measures, they were limited by a narrow focus on behavioral health settings and a subset of psychometric and pragmatic properties. A third review, by Moullin et al. , used snowball sampling to identify sustainment and sustainability measures across a broader range of community, public health, and clinical settings, offering general guidance about how and in what circumstances each measure could be used, but no formal assessment of their quality was undertaken.
Collectively, these three reviews offer an excellent foundation for informing a comprehensive systematic review and critical assessment of both the psychometric and pragmatic qualities of measures of sustainability (as an outcome) and sustainability determinants, across a range of settings. This review addresses important gaps by allowing researchers to identify where robust and suitable measures exist, to reduce unnecessary duplication, and provide practical guidance to end-users in selecting the most relevant measure for their setting.
Specifically, we aimed to:
Assess content validity by mapping the constructs covered by identified measures of: (a) sustainability as an outcome to the multidimensional definition of sustainability proposed by Moore et al. ; and (b) sustainability determinants to the domains and constructs outlined by the Integrated Sustainability Framework 
Assess the psychometric and pragmatic qualities of identified measures using a standardised assessment tool
Describe how each of the identified measures have been applied in empirical research.
This systematic review is reported according to the Preferred Reporting Items for Systematic review and Meta-Analysis Protocols checklist (PRISMA)  (see Additional file 1) and followed established procedures used by other systematic reviews of measures of implementation outcomes [18, 20, 22, 23]. It was registered prospectively with Research Registry (reviewregistry1097) prior to the final database search being conducted.
An extensive search strategy, informed by previous reviews of implementation measures [18, 24,25,26,27] and reviews on sustainability determinants , was employed to identify eligible measures of sustainability and sustainability determinants. We searched the following electronic databases on 6 of June 2021: the Cochrane Central Register of Controlled trials (CENTRAL), MEDLINE, EMBASE, PsycINFO, ERIC, CINHAL, and SCOPUS. The search included keywords relevant to the three levels of search terms: (i) terms relevant or synonymous with the constructs of interest, sustainability, and sustainability determinants (e.g., sustain*, implement*); (ii) psychometric properties (e.g., psychometric*, reliab*); and (iii) setting (e.g., public health, evidence-based medicine). Please see Additional file 2: Table S1a to S1G for an example of the search strategy. Similar to previous reviews, we defined a measure as a multi-item survey, questionnaire, instrument, tool, or scale  that is quantitatively scored. Reference lists of previous relevant reviews were also searched. New measures published outside our search date and identified through journal alerts and snowball searching were also included. For aims 1 and 2, only full-text articles were eligible for inclusion. The authors of conference abstracts were contacted to obtain full-text articles.
Online repositories of implementation measures, including the “Society for Implementation Research Collaboration Instrument Repository”  and the “Dissemination & Implementation Models in Health Research and Practice”  web tool, were also searched. Finally, a forward literature search was undertaken for each relevant measure, whereby two researchers independently searched the name of identified measures within Google Scholar. The first 100 hits were checked for relevance or until relevant articles were no longer being identified. A citation search of the original development paper for each measure was conducted to identify empirical studies that used each measure. For measures that did not have a specified name, only the citation search was conducted. These searches were conducted independently by pairs of researchers (either BM, AH, CG, SH, or KA) between April 2021 and May 2022. For the third aim, published scientific manuscripts, reports, abstracts, trial registrations, and protocol papers describing the empirical use of eligible measures were included.
Publications were included if they reported on the development, psychometric evaluation, or empirical use of a multi-item, quantitative, self-report measure that is scored, of sustainability as an outcome or sustainability determinants, designed to be used in a community, public health, or clinical setting. Individual measures were the unit of interest as the development and psychometric evaluation of measures are usually reported across multiple publications. Empirical studies that applied the identified measures were included, to allow for an evaluation of how identified measures have been used in the field. Only measures that assumed a reflective measurement model of sustainability or sustainability determinants were included (i.e., consist of items that sought to reflect the underlying construct of sustainability or sustainability determinants and did not alter or define the construct such as an index) . Publications of any language were included, and wherever possible, non-English publications were translated via colleagues or contacts proficient in the language of interest or Google translate. No restrictions were made on health condition or the target population. Published or unpublished full-text articles or papers were eligible. We excluded measures that were based on a formative measurement model (i.e., items define the underlying construct such as an index), as such measures were not relevant to the constructs we were assessing, and different properties are used to assess their rigor. Unscored checklists and single item tools were excluded, as these serve a different purpose than measures designed to quantify an underlying construct. Measures designed explicitly for a specific study and not for wider use in the field (i.e., one-time use measures) were excluded, as were qualitative measures.
The search results from the electronic databases were managed and duplicates identified using EndNote version X9.2 software (Thomson Reuters, PA, and U.S.) The de-duplicated library was imported into Covidence , where article screening occurred. Both title and abstract and full-text screening were conducted independently by two members of the research team (either AS, BM, AH, NN, NI, NM, or KA). Conflicts were resolved by a third member of the research team (AH or AS).
The pragmatic and psychometric evidence of each eligible measure was assessed and scored using the Psychometric and Pragmatic Evidence Rating Scale (PAPERS) [17, 31]. PAPERS includes 14 items that assesses nine psychometric properties and five pragmatic features (see Table 1). Each item is scored using a six-point Likert scale ranging from −1 (poor) to 4 (excellent) [17, 31]. The PAPERS criteria were applied to each individual measure, rather than an individual study or publication, as multiple publications often report on different aspects of a measure’s pragmatic and psychometric properties. For measures that had multiple reports of the same pragmatic or psychometric property, for instance in the case of multiple studies assessing the responsiveness, the median score was used. If the median value resulted in a non-integer, the score was rounded down [18, 23, 27]. Data were only assessed against the PAPERS psychometric criteria if it was being explicitly used to evaluate the psychometric properties of that measure. Due to the typically poor reporting of pragmatic indicators of a measure, grey literature, such as scoring manuals, were reviewed to assess such qualities. The quality of empirical studies was not assessed, as we were only interested in describing the application and use of eligible measures, aspects which are not influenced by the rigour of the research design or potential bias.
Data were extracted independently by two trained members of the research team (either NN, ED, AH, or AS), using a pre-piloted data extraction tool developed specifically for this study (Additional file 3). The data extraction form was programmed using REDCap; an electronic data capture tool hosted on the Hunter New England Population Health server [74, 75]. An overview of the main fields programmed in the data extraction tool are shown in Additional file 3.
To assess content coverage of the included measures, the items from each measure were mapped to constructs important to sustainability and sustainability determinants. For measures of sustainability (as an outcome), items were mapped to the five constructs outlined by Moore et al.  comprehensive definition of sustainability (see the “Introduction” section). Items from measures of sustainability determinants were first mapped to lower-level constructs that define five higher-level domains proposed by the Integrated Sustainability Framework (i.e., outer context, inner context, intervention characteristics, processes, and implementer and population characteristics)  (see  for a more detailed description of the Integrated Sustainability Framework domains and constructs). Item mapping followed similar procedures undertaken in previous reviews [23, 76], whereby two research team members proficient in the content area of sustainability (AH & AS), independently extracted and mapped the items from each measure to the domains of the relevant frameworks outlined above. We classified a measure as incorporating components of a specific construct if at least one item was mapped to that construct. Discrepancies were resolved through discussion and input by two review members. We classified each measure as assessing either sustainability (as an outcome) or sustainability determinants based on the content of their items and which definition (see above) the items predominantly aligned with.
Data was cleaned and summarised using SAS version 9.3. The constructs covered by each of the measures according to Moore et al's.  definition of sustainability for measures assessing sustainability as an outcome, and the five higher-level domains from the Integrated Sustainability Framework  for measures of sustainability determinants, were summarised and organised in a table. Descriptive statistics were used to summarise the quality of each measure against the proposed nine psychometric indicators and five pragmatic domains outlined by PAPERS . Where possible, a total quality rating score for each of the pragmatic and psychometric domains was calculated as well as overall, for each measure by summing together the relevant items. Total overall scores range from a possible −14 to 56 [17, 31]. Summary tables were produced that included information describing the characteristics of the measure, the specific setting, and any sub-groups in which the measure has evidence of validity. The use of each measure in empirical studies was summarised descriptively.
A total of 32,782 scientific articles were identified from the database search, from which 402 full texts were screened and 37 were included in the final review. An additional 186 relevant articles were identified from the grey literature search, resulting in 223 articles included in this review, representing 28 individual measures. See Additional file 2: Figure S1, for a summary of the article selection, and Additional file 2: Table S2 for a summary of exclusion reasons for measures included in previous reviews and repositories.
Overview of identified measures
Table 1 describes the characteristics of the included measures. Two measures assessed sustainability as an outcome, 25 assessed sustainability determinants, and one explicitly assessed both. Four measures were designed to assess different constructs other than those more directly related to sustainability or sustainability determinants. Twenty measures were based on a theory or framework, and 20 (of the 28 measures) included input from the target population during the development stage.
Seventeen measures were developed or psychometrically evaluated in the USA, four in Australia, two in the Netherlands, and one each in Sweden and UK. Three measures were developed and/or psychometrically evaluated in more than one country. All 28 measures were available in English, while only five measures were also available in a language other than English.
In relation to the scope of the identified measures, 11 were general measures designed to assess sustainability as an outcome or sustainability determinants in relation to any type of EBI within any setting. Four were general in terms of the target EBI but were restricted to a particular setting (e.g., clinical, public health, school). Seven could be used within any setting but were designed for a specific EBI or category of EBIs (e.g., health promotion programs, community-based programs, chronic disease prevention programs). Three were designed for a specific type of EBI or category of EBIs within a specific setting (e.g., depression care within a clinical/health care setting). Three were developed for assessing determinants of sustainability for the same specific EBI, the school-wide positive behavioral interventions and supports programs, which is delivered within the school setting.
Twenty measures were designed to be completed by both executive (e.g., supervisors, directors, administrators) and frontline staff (i.e., staff responsible for the day-to-day delivery of the EBI). Three measures were designed to be completed by executive staff only, and two by frontline staff only. Three were completed by researchers or purveyors responsible for monitoring or supporting the implementation of an EBI.
Content validity of identified measures
Table 2 describes the constructs covered by measures of sustainability according to Moore's definition . All three measures that assessed sustainability as an outcome covered the continued delivery of the EBI, while both the Provider Report of Sustainment Scale (PRESS) measure and the sustainment sub-scale from the SMSS incorporated aspects of behavior change. Only one measure incorporated concepts of time, evolution/adaptation, and continued benefits. None of the three measures incorporated all five main concepts related to sustainability as an outcome.
Table 3 describes the constructs covered by the 26 measures of sustainability determinants according to the higher-order domains of the Integrated Sustainability Framework . Ten measures covered aspects of all five higher-level domains. However, no measure covered all constructs that define the five higher-level, multi-level domains (see Additional file 2: Tables S3 to S7). “Inner context factors” was the most frequently covered domain with all but two measures (n=25) covering aspects of this domain. This was followed by the domains of “intervention characteristics” (n=23), “outer context” (n=18), “processes,” and “implementer and population characteristics” (n=17 measures each). When assessing the lower-level constructs that define the five higher-level domains of the Integrated Sustainability Framework, the “inner context factors” and “outer context factors” domains were the most broadly covered (Additional file 2: Tables S3 and S4). Conversely, the “interventionist and population” domain and “characteristics of the intervention” were the most sparsely covered domains with only one and no measures, respectively, assessing all aspects of these domains (Additional file 2: Table S6 and S7).
Psychometric and pragmatic qualities of identified measures
Table 1 details the overall PAPERS score for each measure, which were calculated by summing the ratings obtained from the individual items assessing the psychometric qualities (Table 4) together with the ratings for the individual items assessing the pragmatic qualities (Table 5). The PRESS measure, which measures sustainability as an outcome, was the highest-rated measure overall, with a total score of 35. Of the measures of sustainability determinants, the School-wide Universal Behavior Sustainability Index - School Teams (SUBSIST) measure obtained the highest PAPERS score with 29, followed by the Clinical Sustainability Assessment Tool (CSAT) and Sustainment Measurement System Scale (SMSS) each with a score of 28. Specifically, the SUBSIST had a higher overall score due to a larger number of psychometric properties being assessed compared to the CSAT and SMSS.
Table 4 details the median score for the psychometric quality indicators from the PAPERS scale for each measure. Overall, PRESS was rated the highest in psychometric quality with a score of 18 out of a possible 36, followed by the SUBSIST measure with a score of 14. At an individual psychometric property level, internal consistency was the most frequently assessed (84%, n=26), with median scores ranging from 1 (minimal/emerging) to 4 (excellent). The second most frequently assessed psychometric property was structural validity (61%, n=19; median range; −1 to 4); followed by norms (55%, n=17; median range: −1 to 4). Few measures were assessed for responsiveness (n=1) or predictive validity (n=1). Additional file 2: Figure S2 provides a head-to-head comparison of the psychometric ratings of included measures.
Table 5 details the median scores for the pragmatic qualities assessed as part of the PAPERS rating scale for each measure. Overall, the Levels of Institutionalization (Loln), CSAT, OPA Sustainability Assessment Tool, and the Program Sustainability Assessment Tool (PSAT) were rated the highest in pragmatic quality, with each of these measures scoring 18 out of a possible 20. All three of these measures assessed determinants of sustainability. Of the three measures of sustainability as an outcome, the PRESS measure scored the highest with a total score of 17. All pragmatic items were scored for all measures, with most of the information obtained from grey literature sources, such as websites or publicly available scoring manuals. In terms of individual items, the cost was the most highly rated with all measures scoring excellent (score of 4), as they were freely available either publicly from a website, within a published manuscript, or accessed via contact with the authors. The most poorly scored pragmatic quality was “ease of interpretation,” with only two measures scoring the highest rating of excellent and 17 scoring minimal/emerging (score of 1). Additional file 2: Figure S3 provides a comparison of the pragmatic ratings of included measures.
Empirical application of identified measures
Table 6 describes how each of the identified measures have been used in empirical research to date. Eleven measures have yet to be used in an empirical study; six of which were only published since 2020. The most frequently used measure of sustainability as an outcome was the Stages of Implementation Completion (SIC) measure, which has been used in 27 studies. For measures of determinants of sustainability, the most frequently used was the Change Process Capability Questionnaire (CPCQ) (n=34), followed by the Normalisation Measure Development questionnaire (NoMAD) (n=29) and Program Sustainability Assessment Tool (PSAT) (n=20). Geographically, the NoMAD was the most widely used across 15 countries. All other measures have been used in six or fewer countries. Of the 16 measures that have been used in empirical research, six were used to assess constructs other than sustainability determinants or sustainability as an outcome. Eleven measures were adapted prior to their use, despite only two measures (SIC and NoMAD) having been explicitly designed for adaptation in primary research. The most common adaptations included: removing items, adding items, changing the wording of items, changing the response scale, and deleting domains.
We identified a growing number of measures relating to sustainability determinants, and, to a lesser extent, measures of sustainability as an outcome. Despite this increase, we found that the included measures had limited coverage of the key constructs of sustainability and were of variable quality, and only a small number were consistently used in empirical studies. This review identifies areas where future research is warranted, to ensure improvements in this field while minimising research waste. It also provides important information that end-users can use to help compare and select the most appropriate measure for their setting.
General considerations across all identified measures
Most of the measures identified were developed and/or psychometrically evaluated in the USA (20 out of 28), limiting their cross-cultural validity. This may also limit content coverage of constructs, as the outer context (related to broader policy and social context) has been identified as an important determinant of sustainability . Only five of the 28 measures are available in languages other than English, of which only one, the NoMAD, has been translated and psychometrically evaluated in several languages. Translation and validation of measures is an extensive and costly process that requires specialised expertise . This is a major limitation of the field and has implications for equity, as it highlights the inadequate access that non-English speaking populations and countries have to rigorous and standardised measures relating to sustainability. Without this access, researchers often create their own measures or alternatively, translate, and adapt existing measures without proper validation. Creating or leveraging existing research consortiums that share resources across groups may help avoid this.
Only 11 (two for sustainability as an outcome and nine for sustainability determinants) of the 28 identified measures were designed for general use (see Table 1). Fortunately, simple changes to the referent in a measure (e.g., changing the referenced EBI) should not alter the psychometric properties. In at least five [36, 37, 41, 59, 61] measures, the items appeared to have content specific to the EBI and/or setting (beyond simple referent values) that would require extensive adaptation that may warrant new psychometric evidence. The advantages of generalised measures are the ability to standardise research, allowing for replication and comparability across studies, while reducing research waste due to use of one-off measures. The need for more generalised measures is emphasised by our finding that most measures were adapted before use in empirical studies in ways that might compromise their psychometric evidence. However, it can be difficult to ensure that generalised measures are sensitive and informative, as the issues affecting sustainability can vary and depend on the setting and EBI under investigation . Item banks, informed by item response theory, strike a balance between generalisability and specificity of a measure. The resulting standardised measures include survey items tailored to specific characteristics, such as settings, populations, and/or EBIs, which have been calibrated to create standardised scores that are comparable across the tailored items . The use of item banks for measures within implementation science is not a new concept and has been suggested by other reviews of implementation measures . Despite such calls few efforts have launched to create item banks for implementation science, which may be a focus for research consortia in the future.
The majority of the included measures (n=20) were designed to be completed by both the executive/management staff, who oversee the implementation of an EBI, and frontline staff, responsible for the day-to-day delivery of an EBI (see Table 1). In most instances, both executive and frontline staff are required to report on all items, regardless of their role in EBI delivery. Only the SIC, Sustainable Implementation Scale (SIS) and SUBSIST scales seem to distinguish issues between these two roles with separate questions for the different types of staff. The issues impacting on sustainability exist at varying levels within organisations [2, 8, 59]. Therefore, different levels of staff roles may have limited understanding of some determinants of sustainability or aspects of sustainability. For example, frontline staff may not be aware of budgetary constraints that administrators manage. Conversely, management may not possess the same level of day-to-day EBI implementation knowledge as front-line staff. If participants cannot accurately respond to a measures item, the usefulness of the data collected is compromised. Different scales, or at least items, within a scale may need to be completed by different types of staff to ensure that the full range of issues impacting sustainability are accurately captured.
Measures of sustainability as an outcome
Of the 28 included measures, only three were classified as measuring sustainability as an outcome. This may reflect the difficulties in adequately assessing sustainability as an outcome via self-report, standardised scales, to validly capture continued delivery and benefit of specific EBIs. Instead, it may be more appropriate to measure sustainability via other means, such as using a measure that asks directly about the continued delivery of the EBI or via observation. For instance, the SIC measure is an objective measure of the implementation process that records the timing and continued delivery of the main components of an EBI. It is also being extended to comprehensively cover the sustainability phase following implementation , as currently, it is focused predominantly on measuring the earlier phases of implementation. Following such extensions and their rigorous psychometric evaluation, the SIC will make for an appealing comprehensive measure of the implementation process, including the sustainability phase. However, in some instances (e.g., where resources and time may be limited), the SIC may not be appropriate as it is more complicated to administer, requiring specific training, input from multiple data sources, and completion by researchers and purveyors over an extended period of time. Alternatively, a general standardised measure such as the PRESS, which scored the highest of all measures on the PAPERS criteria, may be suitable in such instances where direct measurement of EBI delivery cannot be obtained. Importantly, despite its high relative rating, the PRESS still lacks evidence of important psychometric properties including predictive validity, concurrent validity, and responsiveness. Furthermore, none of the three measures of sustainability covered all five domains of Moore et al.  definition. This is likely due to most of the measure assessing more specific constructs or aspects of sustainability, rather than the broader definition of sustainability used by Moore. For instance, sustainment has been recognised as a distinct concept, defined as the ongoing delivery of an evidence-based intervention [2, 8, 11, 32] and which was the focus of some of the measures included in this review, including PRESS . As we were attempting to provide a comprehensive review of all quantitative measures related to sustainability we took a broad definition and included any related measures to sustainability. When developing and selecting measures for use, it is essential that one clearly defines the target construct and selects a measure that clearly aligns with their construct of interest.
Measures of determinants of sustainability
Compared to measures of sustainability as an outcome we identified a large number of measures that aligned with our definition of determinants of sustainability, with 26 (out of the total 28) measures identified. Eight of the 28 measures were published since 2020, highlighting a recent increase in measure development, but several limitations exist. In terms of content validity, only 10 covered all 5 higher-level domains of the Integrated Sustainability Framework (see Table 3). While some of the measures (e.g., Sustainment Leadership Scale) were designed to cover only specific domains of determinants, the trade-off is a lack of a comprehensive assessments of sustainability. Few measures comprehensively covered all aspects of the “outer contextual factors” domain, which is a critical domain warranting multiple perspectives.
In terms of the psychometric and pragmatic qualities, the quality of these measures varied substantially with the PAPERS ratings ranging from as low as 15 to as high as 29 out of a possible score of 56. For psychometric properties, the largest gaps relate to discriminant validity, predictive validity, and responsiveness, highlighting opportunities for future research. For the pragmatic criteria all measures rated well for the items of cost and language. However, ease of interpretation was rated as minimal/emerging for all but ten of the sustainability determinants measures (see Table 5). Very few provided explicit instructions on how to score and interpret the measure. In fact, only two measures, the “National Health Service (NHS) Sustainability Model and Guide”  and the “Office of Adolescent Health (OAH) Sustainability Assessment”  provided explicit and detailed cutoff values and labels to enable classification of those at a greater risk of not sustaining delivery of an EBI. However, neither of these two measures have undergone comprehensive psychometric evaluation, and thus, the validity of these cut-points has not yet been examined.
Recommendations for use of current measures
Based on the evidence presented in this review, there are limitations to all identified measures of sustainability and determinants of sustainability. However, we recommend the following.
If objective measures of sustainability are not available or feasible, the PRESS measure should be considered as a measure of sustainability as an outcome, as it is the most psychometrically robust and pragmatic to date. Future research should strive to establish evidence of predictive validity and responsiveness for the PRESS measure to further enhance its psychometric properties.
For measures of determinants of sustainability SUBSIST had the highest PAPERS score of 29. If evaluating school-wide positive behavioral interventions and supports, the SUBSIST should be considered as a measure of sustainability determinants for this EBI. However, it is not appropriate when considering other EBIs.
In the context of other EBIs the CSAT and SMSS both had an overall PAPERS rating of 28, illustrating favourable psychometric and pragmatic qualities compared to other measures of sustainability determinants. It is recommended that the CSAT is considered for use when assessing sustainability determinants in a clinical setting and SMSS for other settings.
In general, researchers wishing to use measures to assess the determinants of sustainability should carefully assess the psychometric and pragmatic qualities of each measure, as well as the specific characteristics to which the measure was designed to assess. The information provided in the tables within this paper should assist end-users to select the most robust and suitable measure for their context.
Furthermore, when selecting a measure for use, the specific construct wishing to be measured should be carefully considered and a measure selected that aligns with the construct of interest.
There are limitations that should be considered when interpreting these results. First, we only included measures that were explicitly stated to be designed for broad, standardised use. This decision was made to avoid inclusion of one-off study-specific measures. This process may have missed some relevant measures that could potentially be used elsewhere. Second, we only included quantitative measures as we were interested in reflective measures that offered an efficient and comprehensive means of measuring and tracking sustainability as an outcome and sustainability determinants. This decision resulted in the exclusion of several sustainability-related tools that can be used to help support the planning and assessment of sustainability (e.g., RE-AIM and extension of RE-AIM focused on sustainability [82, 83], Long-Term Success Tool ). While these tools are useful in planning for, or tracking aspects of sustainability, they are not designed solely for quantitative measurement and thus were beyond the scope of this review. These exclusions also highlight the difficulties that can be faced by researchers and practitioners when attempting to select an appropriate, rigorous, and standarised quantitative measure of these concepts. Third, we classified a measure as covering a particular construct of interest if it included at least one item relating to a construct. This is in contrast to other reviews that have used a criteria of at least two items [23, 76]. We used a more liberal approach to ensure that we did not underestimate the content coverage of current measure, as we were mostly interested in assessing whether measures were incorporating any aspect, even to a small extent, the specific constructs we were focused on. This may have overestimated the content validity of identified measures, as it is usually insufficient to adequately cover an entire construct with only one item. Four, we only searched the references lists of relevant reviews and not all eligible articles, which was a deviation from our original registered protocol. This deviation was due to the extensive volume of articles screened and identified. However, given the extensive search strategy employed, including published and grey literature, reference lists of previous reviews, snowball searching, and searching of online repositories of implementation measures, it is unlikely this deviation would have impacted significantly on our search results of eligible measures. Finally, we only evaluated the psychometric properties of measures using studies with data that was explicitly analyzed for psychometric evaluation. Studies with data analysed for other purposes and not with the aim of assessing the psychometric properties of the measure, for example, an empirical study assessing the association between the measure and another construct but not with the a-priori aim of assessing the measures validity, was not considered when scoring that measures’ psychometric properties. This approach was taken as it was considered to be the most appropriate as psychometric evaluations should be pre-specified, and was also the most manageable and conservative approach for a review of this size.
This systematic review identified and evaluated the psychometric and pragmatic properties of standardised measures of sustainability as an outcome and sustainability determinants for use across community, public health, and clinical settings. It provides a comprehensive guide that researchers and stakeholders can use to select the most psychometrically robust, pragmatic, and relevant measure of sustainability and/or sustainability determinants available for their setting. It also highlights where future research is needed to improve the psychometric and pragmatic quality of the current measures in this field.
Availability of data and materials
Data and materials relating to this review are available from the corresponding author on reasonable request.
Scheirer MA, Dearing JW. An agenda for research on the sustainability of public health programs. Am J Public Health. 2011;101(11):2059–67.
Shelton RC, Cooper BR, Stirman SW. The sustainability of evidence-based interventions and practices in public health and health care. Annu Rev Public Health. 2018;39:55–76.
Wiltsey Stirman S, Kimberly J, Cook N, Calloway A, Castro F, Charns M. The sustainability of new programs and innovations: a review of the empirical literature and recommendations for future research. Implement Sci. 2012;7(1):1–19.
Herlitz L, MacIntyre H, Osborn T, Bonell C. The sustainability of public health interventions in schools: a systematic review. Implement Sci. 2020;15(4). https://doi.org/10.1186/s13012-019-0961-8.
Ament SMC, de Groot JJA, Maessen JMC, Dirksen CD, van der Weijden T, Kleinjnen J. Sustainability of professionals' adherence to clinical practice guidelines in medical care: a systematic review. BMJ Open. 2015;5:e008073.
Proctor E, Luke D, Calhoun A, McMillen C, Brownson R, McCrary S, et al. Sustainability of evidence-based healthcare: research agenda, methodological advances, and infrastructure support. Implement Sci. 2015;10:88.
Moore JE, Mascarenhas A, Bain J, Straus SE. Developing a comprehensive definition of sustainability. Implement Sci. 2017;12(1):110.
Moullin JC, Sklar M, Green A, Dickson KS, Stadnick NA, Reeder K, et al. Advancing the pragmatic measurement of sustainment: a narrative review of measures. Implement Sci Commun. 2020;1:76.
Birken SA, Haines ER, Hwang S, Chambers DA, Bunger AC, Nilsen P. Advancing understanding and indetifying strategies for sustaining evidence-based practices: a review of reviews. Implement Sci. 2020;15(88). https://doi.org/10.1186/s13012-020-01040-9.
Luke DA, Calhoun A, Robichaux CB, Elliott MB, Moreland-Russell S. The program sustainability assessment tool: a new instrument for public health programs. Prev Chronic Dis. 2014;11:E12.
Chambers DA, Glasgow RE, Stange KC. The dynamic sustainability framework: addressing the paradox of sustainment amid ongoing change. Implement Sci. 2013;8(1):1–11.
Hodge LM, Turner KMT. Sustained implementation of evidence-based programs in disadvantaged communities: a conceptual framework of supporting factors. Am J Community Psychol. 2016;58(1-2):192–210.
Schell SF, Luke DA, Schooley MW, Elliott MB, Herbers SH, Mueller NB, et al. Public health programs capacity for sustainability: a new framework. Implement Sci. 2013;8:15.
Shoesmith A, Hall A, Wolfenden L, Shelton RC, Powell BJ, Brown H, et al. Barriers and facilitators influencing the sustainment of health behaviour interventions in schools and childcare services: a systematic review. Implement Sci. 2021;16(62). https://doi.org/10.1186/s13012-021-01134-y.
Boateng GO, Neilands TB, Frongillo EA, Melgar-Quinonez HR, Young SL. Best practices for developing and validating scales for health, social, and behavioral research: a primer. Front. Public Health. 2018;6(149). https://doi.org/10.3389/fpubh.2018.00149.
Terwee CB, Prinsen CAC, Chiarotto A, Westerman MJ, Patrick DL, Alonso J, et al. COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a delphi study. Qual Life Res. 2018;27(5):115–1170.
Lewis CC, Mettert KD, Stanick CF, Halko HM, Nolen EA, Powell BJ, et al. The psychometric and pragmatic evidence rating scale (PAPERS) for measure development and evaluation. Implement Res Pract. 2021;2:1–6.
Mettert K, Lewis C, Dorsey C, Halko H, Weiner B. Measuring implementation outcomes: an updated systematic review of measures’ psychometric properties. Implement Res Pract. 2020;1:1–29.
Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21(4):651–7.
Lewis CC, Fischer S, Weiner BJ, Stanick C, Kim M, Martinez RG. Outcomes for implementation science: an enhanced systematic review of instruments using evidence-based rating criteria. Implement Sci. 2015;10:155.
Page MJ, et al. PRISMA 2020 explanation and elaboration: updated gudance and exemplars for reporting systematic reviews. BMJ. 2021;372(n160). https://doi.org/10.1136/bmj.n160.
Allen P, Pilar M, Walsh-Bailey C, Hooley C, Mazzucca S, Lewis CC, et al. Quantitative measures of health policy implementation determinants and outcomes: a systematic review. Implement Sci. 2020;15(1):47.
Weiner BJ, Mettert KD, Dorsey CN, Nolen EA, Stanick C, Powell BJ, et al. Measuring readiness for implementation: a systematic review of measures’ psychometric and pragmatic properties. Implement Res Pract. 2020;1:1–29.
Clinton-McHarg T, Yoong SL, Tzelepis F, Regan T, Fielding A, Skelton E, et al. Psychometric properties of implementation measures for public health and community settings and mapping of constructs against the consolidated framework for implementation research: a systematic review. Implement Sci. 2016;11(1):148.
Khadjesari Z, Boufkhed S, Vitoratou S, Schatte L, Ziemann A, Daskalopoulou C, et al. Implementation outcome instruments for use in physical healthcare settings: a systematic review. Implement Sci. 2020;15(1):66.
Khadjesari Z, Vitoratou S, Sevdalis N, Hull L. Implementation outcome assessment instruments used in physical healthcare settings and their measurement properties: a systematic review protocol. BMJ Open. 2017;7(10):e017972.
Lewis CC, Mettert KD, Dorsey CN, Martinez RG, Weiner BJ, Nolen E, et al. An updated protocol for a systematic review of implementation-related measures. Syst Rev. 2018;7(1):66.
Society for Implementation Research and Collaboration. Sustainability Instruments. Available from: https://societyforimplementationresearchcollaboration.org/sustainability-measures/.
Grid-Enabled Measures Database. GEM. Available from: https://www.gem-measures.org/Login.aspx?ReturnURL=Public/Measurelist.aspx?cat=2.
Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia. Available at www.covidence.org.
Stanick CF, Halko HM, Nolen EA, Powell BJ, Dorsey CN, Mettert KD, et al. Pragmatic measures for implementation research: development of the Psychometric and Pragmatic Evidence Rating Scale (PAPERS). Transl Behav Med. 2021;11(1):11–20.
Moullin JC, Sklar M, Ehrhart MG, Green A, Aarons GA. Provider REport of sustainment Scale (PRESS): development and validation of a brief measure of inner context sustainment. Implement Sci. 2021;16(1):86.
Chamberlain P, Hendricks Brown C, Saldana L. Observational measure of implementation progress in community based settings: the Stages of Implementation Completion (SIC). Implement Sci. 2011;6:116.
Saldana L, Bennett I, Powers D, Vredevoogd M, Grover T, Schaper H, et al. Scaling implementation of collaborative care for depression: adaptation of the stages of implementation completion (SIC). Admin Pol Ment Health. 2020;47(2):188–96.
Saldana L, Chamberlain P, Wang W, Hendricks BC. Predicting program start-up using the stages of implementation measure. Admin Pol Ment Health. 2012;39(6):419–25.
Turri MG, Mercer SH, McIntosh K, Nese RNT, Strickland-Cohen MK, Hoselton R. Examining barriers to sustained implementation of school-wide prevention practices. Assess Eff Interv. 2016;42(1):6–17.
Kittelman A, Mercer SH, McIntosh K, Nese RNT. Development and validation of a measure assessing sustainability of tier 2 and 3 behavior support systems. J Sch Psychol. 2021;85:140–54.
Slaghuis SS, Strating MMH, Bal RA, Nieboer AP. A measurement instrument for spread of quality improvement in healthcare. Int J Qual Health Care. 2013;25(2):125–31.
Solberg LI, Asche SE, Margolis KL, Whitebird RR. Measuring an organization's ability to manage change: the change process capability questionnaire and its use for improving depression care. Am J Med Qual. 2008;23(3):193–200.
Malone S, Prewitt K, Hackett R, Lin JC, McKay V, Walsh-Bailey C, et al. The clinical sustainability assessment tool: measuring organizational capacity to promote sustainability in healthcare. Implement Sci Commun. 2021;2(1):77.
Williams RM, Zhang J, Woodard N, Slade JL, Santos LZ, Knott CL. Development and validation of an instrument to assess institutionalization of health promotion in faith-based organizations. Eval Program Plann. 2020;79:101781.
Bond GR, Drake RE, Rapp CA, GJ MH, Xie H. Indivdualization and quality improvement: two new scales to complement measurement of program fidelity. Admin Pol Ment Health. 2009;36(5):349–57.
Heiervang KS, Egeland KM, Landers M, Ruud T, Joa I, Drake RE, et al. Psychometric properties of the General Organizational Index (GOI): a measure of indivualization and quality improvement to comlement program fidelity. Admin Pol Ment Health. 2020;47:920–6.
Barab SA, Redman BK, Froman RD. Measurement characteristics of the Levels of Institutionalization Scales: examining reliability and validity. J Nurs Meas. 1998;6(1):19–33.
Goodman RM, McLeroy KR, Steckler AB, Hoyle RH. Development of level of institutionalization scales for health promotion programs. Health Educ Q. 1993;20(2):161–78.
Goodman RM, Steckler A. A framework for assessing program institutionalization. Knowl Soc. 1989;2(1):57–71.
Maher L, Gustafson DH, Evans A. Sustainability model and guide; 2010.
Finch TL, Girling M, May CR, Mair FS, Murray E, Treweek S, et al. Improving the normalization of complex interventions: part 2 - validation of the NoMAD instrument for assessing implementation work based on normalization process theory (NPT). BMC Med Res Methodol. 2018;18(1):135.
Rapley T, Girling M, Mair FS, Murray E, Treweek S, McColl E, et al. Improving the normalization of complex interventions: part 1 - development of the NoMAD instrument for assessing implementation work based on normalization process theory (NPT). BMC Med Res Methodol. 2018;18(1):133.
Vis C, Ruwaard J, Finch T, Rapley T, de Beurs D, van Stel H, et al. Toward an objective assessment of implementation processes for innovations in health care: Psychometric evaluation of the Normalization Measure Development (NoMAD) Questionnaire among mental health care professionals. J Med Internet Res. 2019;21(2):e12376.
Davis S. Ready for prime time? Using normalization process theory to evaluate implementation success of personal health records designed for decision making. Front Digit Health. 2020;2:575951.
Loch AP, Finch T, Fonsi M, Soarez PC. Cross-cultural adaptation of the NoMAD questionnaire to Brazilian Portuguese. Rev Assoc Med Bras (1992). 2020;66(10):1383–90.
Elf M, Nordmark S, Lyhagen J, Lindberg I, Finch T, Aberg AC. The Swedish version of the normalization process theory measure S-NoMAD: translation, adaptation, and pilot testing. Implement Sci. 2018;13(146). https://doi.org/10.1186/s13012-018-0835-5.
May CR, Finch T, Ballini L, MacFarlane A, Mair F, Murray E, et al. Evaluating complex interventions and health technologies using normalization process theory: development of a simplified approach and web-enabled toolkit. BMC Health Serv Res. 2011;11(1):1–11.
Hawe P, King L, Noort M, Jordens C, Lloyd B. Indicators to help with capacity building in health promotion; 2000.
Office of Adolescent Health. Building sustainable programs: the resource guide. 2014.
Office of Population Affairs. Resource guide for building sustainable programs. 2019.
Stamatakis KA, McQueen A, Filler C, Boland E, Dreisinger M, Brownson RC, et al. Measurement properties of a novel survey to assess stages of organizational readiness for evidence-based interventions in community chronic disease prevention settings. Implement Sci. 2012;7:65.
Hall A, Shoesmith A, Shelton RC, Lane C, Wolfenden L, Nathan N. Adaptation and Validation of the Program Sustainability Assessment Tool (PSAT) for use in the elementary school setting. Int J Environ Res Public Health. 2021;18(21):11414.
Mancini JA, Marek LI. Sustaining community-based programs for families: conceptualisation and measurement. Fam Relat. 2004;53(4):339–47.
McIntosh K, MacKay LD, Hume AE, Doolittle J, Vincent CG, Horner RH, et al. Development and initial validation of a measure to assess factors related to sustainability of school-wide positive behavior support. J Posit Behav Interv. 2010;13(4):208–18.
Hume A, McIntosh K. Construct validation of a measure to assess sustainability of school-wide behavior interventions. Psychol Sch. 2013;50(10):1003–14.
Kittelman A, Bromley KW, Mercer SH, McIntosh K. Validation of a measure of sustainability of school-wide behavior interventions and supports. Remedial Spec Educ. 2019;40(2):67–73.
McIntosh K, Mercer SH, Hume AE, Frank JL, Turri MG, Mathews S. Factors related to sustained implementation of schoolwide positive behavior support. Except Child. 2013;79(3):293–311.
Mercer SH, McIntosh K, Strickland-Cohen MK, Horner RH. Measurement invariance of an instrument assessing sustainability of school-based universal behavior practices. Sch Psychol Q. 2014;29(2):125.
The Board of Regents of the University System of Georgia by and on behalf of Georgia State University and the Georgia Health Policy Center. Positioning for sustainability: a formative assessment tool – quick course. 2011.
Markstrom U, Svensson B, Bergmark M, Hansson L, Bejerholm U. What influences a sustainable implementation of evidence-based interventions in community mental health services? Development and pilot testing of a tool for mapping corse components. J Ment Health. 2018;27(5):395–401.
Hodge LM, Turner KMT, Sanders MR, Filus A. Sustained Implementation Support Scale: validation of a measure of program characteristics and workplace functioning for sustained program implementation. J Behav Health Serv Res. 2017;44(3):442–64.
Askell-Williams H, Koh GA. Enhancing the sustainability of school improvement initiatives. Sch Eff Sch Improv. 2020;31(4):660–78.
Ehrhart MG, Torres EM, Green AE, Trott E, Willging CE, Moullin JC, et al. Leading for the long haul: a mixed-method evaluation of the Sustainment Leadership Scale (SLS). Implement Sci. 2018;13(17). https://doi.org/10.1186/s13012-018-0710-4.
Aarons GA, Hurlburt M, Horwitz SM. Advancing a conceptual model of evidence-based practice implementation in public service sectors. Admin Pol Ment Health. 2011;38(1):4–23.
Palinkas LA, Chou CP, Spear SE, Mendon SJ, Villamar J, Brown CH. Measurement of sustainment of prevention programs and initiatives: the sustainment measurement system scale. Implement Sci. 2020;15(1):71.
Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implement Sci. 2009;4(1):1–15.
Harris P, Taylor R, Minor BL, Elliott V, Fernandez M, O'Neal L, et al. The REDCap consortium: building an international community of software partners. J Biomed Inform. 2019;95(2019):103208.
Harris P, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap) – a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–81.
Chaudoir SR, Dugan AG, Barr CH. Measuring factors affecting implementation of health innovations: a systematic review of structural, organizational, provider, patient, and innovation level measures. Implement Sci. 2013;8(1):1–20.
Slaghuis SS, Strating MMH, Bal RA, Nieboer AP. A framework and measurement instrument for sustainability of work practices in long-term care. BMC Health Serv Res. 2011;11(314). https://doi.org/10.1186/1472-6963-11-314.
Finnerty MT, Rapp CA, Bond GR, Lynde DW, Ganju V, Goldman HH. The State health authority yardstick (SHAY). Community Ment Health J. 2009;45:228–36.
Saldana L. The stages of implementation completion for evidence-based practice: protocol for a mixed methods study. Implement Sci. 2014;9(1):43.
Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine. 2000;25(24):3186–91.
Saldana L, editor Operationalizing sustainment activities for two evidence-based practices using the stages of implementation completion (SIC). 11 th Annual Conference on the Science of Dissemination and Implementation; 2018: AcademyHealth.
Shelton RC, Chambers DA, Glasgow RE. An extension of RE-AIM to enhance sustainability: addressing dynamic context and promoting health equity over time. Front Public Health. 2020;8:134.
Measures and Checklists. 2019. Available from: http://www.re-aim.org/resources-and-tools/measures-and-checklists/.
Lennox L, Doyle C, Reed J, Bell D. What makes a sustainability tool valuable, practical and useful in realworld healthcare practice? A mixed methods study on the development of the long term success tool in Northwest London. BMJ Open. 2017;7:e014417.
The author team would like to thank all involved in contributing to this extensive review. Specifically, we would like to Hannah Brown for assisting with drafting the database search. Debbie Booth for reviewing, advising, and executing the database search. Nicole McCarthy and Karly Austin for assistance with article screening. Sophie Hamilton and Carly Gardner for assisting with aspects of the grey literature search and conducting readability scores on relevant measure.
This project is funded through the National Health and Medical Research Council (NHMRC) as part of NN’s Medical Research Future Fund (MRFF) Investigator Grant (APP1194785) and was supported by work undertaken as part of an NHMRC Centre for Research Excellence grant (APP1153479). NN is supported by a MRFF Investigator Grant (APP1194785); LW is supported by an NHMRC Investigator Grant (APP1197022); RCS by an American Cancer Society Research Scholar Grant (RSG-17-156-01-CPPB); SY by an Australia Research Council Discovery Early Career Research Awards (DE170100382); RS by an NHMRC MRFF Investigator Grant (APP1194768); NI by a support grant from the Faculty of Health, Arts and Design Swinburne University of Technology; and AS by a University of Newcastle PhD scholarship (ref. 315402). The funders had no role in the study design, conduct of the study, analysis, or dissemination of findings.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Hall, A., Shoesmith, A., Doherty, E. et al. Evaluation of measures of sustainability and sustainability determinants for use in community, public health, and clinical settings: a systematic review. Implementation Sci 17, 81 (2022). https://doi.org/10.1186/s13012-022-01252-1