Spread tools: a systematic review of components, uptake, and effectiveness of quality improvement toolkits

Background The objective was to conduct a systematic review of toolkit evaluations intended to spread interventions to improve healthcare quality. We aimed to determine the components, uptake, and effectiveness of publicly available toolkits. Methods We searched PubMed, CINAHL, and the Web of Science from 2005 to May 2018 for evaluations of publicly available toolkits, used a forward search of known toolkits, screened references, and contacted topic experts. Two independent reviewers screened publications for inclusion. One reviewer abstracted data and appraised the studies, checked by a second reviewer; reviewers resolved disagreements through discussion. Findings, summarized in comprehensive evidence tables and narrative synthesis addressed the uptake and utility, procedural and organizational outcomes, provider outcomes, and patient outcomes. Results In total, 77 studies evaluating 72 toolkits met inclusion criteria. Toolkits addressed a variety of quality improvement approaches and focused on clinical topics such as weight management, fall prevention, vaccination, hospital-acquired infections, pain management, and patient safety. Most toolkits included introductory and implementation material (e.g., research summaries) and healthcare provider tools (e.g., care plans), and two-thirds included material for patients (e.g., information leaflets). Pre-post studies were most common (55%); 10% were single hospital evaluations and the number of participating staff ranged from 17 to 704. Uptake data were limited and toolkit uptake was highly variable. Studies generally indicated high satisfaction with toolkits, but the perceived usefulness of individual tools varied. Across studies, 57% reported on adherence to clinical procedures and toolkit effects were positive. Provider data were reported in 40% of studies but were primarily self-reported changes. Only 29% reported patient data and, overall, results from robust study designs are missing from the evidence base. Conclusions The review documents publicly available toolkits and their components. Available uptake data are limited but indicate variability. High satisfaction with toolkits can be achieved but the usefulness of individual tools may vary. The existing evidence base on the effectiveness of toolkits remains limited. While emerging evidence indicates positive effects on clinical processes, more research on toolkit value and what affects it is needed, including linking toolkits to objective provider behavior measures and patient outcomes. Trial Registration PROSPERO registration number: PROSPERO 2014:CRD42014013930. Electronic supplementary material The online version of this article (10.1186/s13012-019-0929-8) contains supplementary material, which is available to authorized users.


Background
Diffusion of innovations is a complex process. While research studies continue to show successful interventions to improve healthcare, their dissemination is slow [1][2][3]. Implementations of proof of concept studies and adoption of interventions shown to be effective in research studies into routine clinical practice is delayed or not achieved at all.
In recent years, a number of organizations have developed "toolkits" for healthcare quality improvement [4]. Toolkits are resource and tool collections designed to facilitate spread across settings and organizations and to ease the uptake and implementation of interventions or intervention bundles and practices. They are a resource for documentation of interventions, for implementation of successful interventions, and for scaling up initiatives developed in pilot or demonstration sites into large-scale rollouts. Toolkits may include a variety of materials useful to organizations to help introduce an intervention, practical tools to help incorporate best practices into routine care such as pocket cards for healthcare providers, or patient education materials. There is currently no definition of nor standard approach to toolkit contents or formats.
A variety of healthcare research agencies publish toolkits. The US Agency for Healthcare Research and Quality (AHRQ) alone has published a large number, on topics ranging from allergy and immunologic care to urologic care. The AHRQ Healthcare Innovations Exchange website has tracked the development of tools or toolkits to improve quality and reduce disparities (website maintenance ended in 2017). Users may browse the resources online or download them free of charge. Little is known, however, about uptake of published toolkits. While exact copying of the intervention is possible, a process of re-invention in the new context is also likely to occur. Re-invention may change the intervention to some extent during the diffusion process as it transitions from the developer to the adopter, with or without the help of a toolkit [5], potentially resulting in decreased but still significant effort for toolkit adaptation [6]. To date, we know very little about successful components that may be useful across toolkits, about the toolkit adoption process, or about what makes toolkits easier or harder to adopt.
Furthermore, little is known about the effectiveness of published toolkits. A scoping review describing toolkits assembled for individual research projects concluded that the toolkits often did not specify the evidence base from which they draw and their effectiveness as a knowledge translation strategy was rarely assessed [1,7]. The effectiveness of a toolkit is likely to depend on its quality, the effectiveness of the intervention, and the setting characteristics. However, for published toolkits, an additional consideration is apparent. Toolkits applied in new settings may not be as effective as seen in the original implementation of the intervention bundle that led to the development of the toolkit. Potential reasons include diminished healthcare provider motivation, reduced staff buy-in, or other aspects of low readiness (e.g., healthcare providers were not instrumental in initiating and shaping the interventions).
Our objective was to conduct a systematic review on the spread of interventions intended to improve healthcare quality through toolkits. This systematic review aims to determine the following key questions: The review explores the types of tools included in toolkits, measures and results that describe the uptake and utility, and the effectiveness of published toolkits to inform users and developers of toolkits.

Methods
We registered in PROSPERO, registration number PROSPERO 2014:CRD42014013930. The reporting follows the PRISMA guidelines (see Additional file 1).

Searches
We searched the databases PubMed, CINAHL, and Web of Science for evaluations of toolkits in May 2018. The PubMed search strategy is given in full in Additional file 2. The strategy searched for the term "toolkit" in the title, abstract, keywords, or full text of the publication (Web of Science only). We did not limit the search to publications using the MeSH term "diffusion of innovation" because the pilot search strategy showed that known toolkit evaluations were not systematically tagged with this term. We limited to English-language citations published since 2005 to identify current toolkits readily applicable to US settings.
In addition, we searched resources from nine organizations dedicated to healthcare improvement to find pub- . We also screened the category "QualityTool" in AHRQ's database of innovations. A "forward search" identified any publication that had cited the titles of the toolkits we located. We screened included studies and relevant reviews and contacted content experts to identify additional relevant publications.

Study inclusion and exclusion criteria
Two independent reviewers screened titles and abstracts to avoid errors and bias. We obtained publications deemed as potentially relevant by at least one reviewer as full text. Full text publications had to meet the outlined criteria to be eligible for inclusion in the review. Discrepancies were resolved through discussion in the review team. In the absence of a universally agreed definition of a toolkit, the project team developed the outlined working definition.
Participants and condition being studied: Publications evaluating toolkits in healthcare delivery organizations were eligible. The review was not limited to toolkits targeting specific clinical conditions, but toolkits had to be aimed at healthcare. Toolkits aimed primarily at other than healthcare provider professions (e.g., policy makers in non-healthcare delivery settings), or aimed at students not yet involved in healthcare delivery (e.g., nursing students) were excluded. Toolkits only aimed at patients, such as patient education material or patient self-management programs, were excluded. Intervention and toolkit definition: Studies evaluating the use of toolkits designed to aid healthcare delivery organizational were eligible. A "toolkit" was defined as an intervention package, or set of tools. Toolkits had to be aimed at quality improvement (an effort to change/improve the clinical structure, process, and/or outcomes of care by means of an organizational or structural change) [8] of healthcare; toolkits to increase research capacity or workforce issues were excluded. Test batteries, image processing protocols, or computer software termed "toolkit" were not eligible. Toolkits had to be either publicly or commercially available. Comparator/study design: Studies evaluating the use of existing toolkits were eligible. Studies supporting the development of toolkits and reporting on earlier versions rather than the currently available toolkits were excluded. Controlled and uncontrolled studies with historic (e.g., pre-post studies) or concurrent comparators (e.g., randomized controlled trials, RCTs) were eligible. Comparators could include active controls (a different intervention) or passive controls (e.g., status before the introduction of the toolkit). Outcome: Publications reporting on patient, provider, or organizational findings were eligible.
Studies had to report on structured evaluations (e.g., surveys); informal or anecdotal evaluation statements were not sufficient. Timing: To capture current and relevant toolkits developed in accordance with current standards and applicable material, evaluated toolkits must have been published in 2005 or more recently, or be still available. Setting: Implementations of toolkits were included regardless of the setting, but the original toolkits had to be aimed at quality improvement in health care. Toolkits developed for other than healthcare delivery organizations such as school settings or laboratories as well as toolkits primarily focusing on health system improvements in conflict zones or disrupted healthcare systems were excluded.
We consolidated publications reporting on the same sample of participants. Evaluations published in academic journals as well as gray literature (conference abstracts, dissertations) were eligible. The literature flow diagram is shown in Fig. 1.

Potential effect modifiers and reasons for heterogeneity
The review included a large number of study designs and study outcomes to allow a comprehensive overview of the available evidence on toolkits. In particular, the study design (e.g., comparative studies, post-only study) and the study outcomes (e.g., feasibility, patient health outcome) were sources of heterogeneity across studies.

Data extraction strategy
One reviewer abstracted and a second experienced systematic reviewer checked the data; disagreements were resolved by discussion. We determined categories based on the initial review of publications and used a piloted-tested data extraction form to ensure standardized data abstraction.
We extracted the toolkit name, the developing organization, the general area of application, the toolkit components, and type of availability (publicly or commercially). In addition, information on the evaluationincluding study design, participants, setting, and additional non-toolkit components-were extracted.
We documented the uptake and adherence to toolkit components (e.g., number of downloaded toolkits); utility and feasibility; healthcare provider measures including knowledge, attitudes, and barriers; procedural, structural, and organizational changes (e.g., number of ordered tests); and patient outcomes including patient health outcomes and patient-reported satisfaction. We added effectiveness results from the development phase of the toolkit where available.

Study quality assessment
We used the Quality Improvement Minimum Quality Criteria Set (QI-MQCS) to assess studies [9]. The QI-MQCS is a 16-item scale designed for critical appraisal of quality improvement intervention publications; the domains are described in Additional file 2. The synthesis for the primary outcome integrates the appraisal finding; results for all included studies are documented in Additional file 2.

Data synthesis and presentation
We documented the included studies in an evidence table (with supporting tables in the appendix) and summarized evaluation results in a narrative synthesis. Given the diversity of the identified studies, the quality of evidence assessment was limited to assessing inconsistency in study results across studies and study limitations of identified studies. The synthesis followed the key questions. Key question 1 was organized by the developed framework of components. Key question 2 was organized by outcome category: uptake and utility. Key question 3 was organized by provider outcomes, procedure/organizational results, and patient outcomes. The primary outcome of the review was patient health outcomes. The synthesis differentiated evidence from studies with concurrent and with historic comparator. For each toolkit, the evaluation of the intervention spread (i.e., using an available toolkit to disseminate practices and tools included in the toolkit) was also contrasted with initial results obtained in the organization where the toolkit had been first developed (where information was available).

Review statistics
The electronic search for "toolkit" publications and a forward search for 156 specific toolkits (see Additional file 2) published by AHRQ, CMS, WHO, IHI, RWJF, AORN, ECRI, CDC, VA, or on the AHRQ Innovation Exchange identified 5209 citations. We obtained 661 citations as full text articles; of these, 77 studies were identified that met inclusion criteria (Fig. 1).

Study characteristics
Four included evaluations of groups randomized to an intervention or a control condition. Six studies provided a comparison to concurrent (non-randomized) control groups that did not participate in toolkit implementation. Forty-two studies presented pre-and post-intervention data for at least one outcome but did not include a concurrent comparator to account for secular trends independent of the intervention. Twenty-five studies reported only post-intervention data and provided no comparison to the status before or without the toolkit. Assessment methods and reported details varied widely and included online and written staff surveys, administrative data, medical chart review data, and web statistics. The range of healthcare organizations involved in the evaluation varied widely from single hospital evaluations (10%) to studies with data on 325 institutions; and 22% of studies, often those that reported on web download statistics, did not report on the number of institutions. The number of participating staff members, often healthcare providers asked to use tools contained in the toolkit in clinical practice, ranged from 17 to 704, but the number of participants was only reported in 47% of studies. Of those studies reporting patient data, 59% reported the number of patients the data were based on; the number varied and ranged from 43 to 337,630.
Sixty-nine percent of included evaluations described elements in addition to the toolkit such as workshops and presentations to introduce the toolkit or the intervention promoted in the toolkit. The developer of the toolkit was part of the evaluation of the toolkit in more than half of the included studies (59%); toolkits were evaluated by independent study groups in 27% of studies (14% unclear).
Most evaluations were conducted in the USA (75%); other countries contributing to the study pool were Canada, the UK, Australia, Mongolia, and an international evaluation with multiple countries. In 34% of studies, the evaluation setting was a hospital; in 32%, toolkits were evaluated in primary care facilities; other organizations included community health centers, ambulatory care clinics, long-term care facility, specialty clinics (e.g., multiple sclerosis clinic), a hospice, and in some cases the characteristics were not reported.
The details of the included studies are shown in the evidence table (Table 1).

Quality assessment
As a critical appraisal tool, the QI-MQCS targets the informational quality of QI studies and informs decisions about applicability of results to other settings. The number of criteria met per study ranged from 3 to 14 (mean 9.78, SD 3.04). Since the objective of this systematic review was to assess the spread of QI interventions through the use of toolkits, 100% of included publications/studies addressed Spread and described the ability of the intervention to be replicated in other settings.
In addition, for ten of the 16 domains, more than 50% of the included publications met the minimum QI-MQCS criteria. The top five described aspects related to study initiation and included Organization motivation (description of the organization reason, problem, or motivation for the intervention, 93%); Intervention rationale (description of the rationale linking the intervention to the effects, 88%); Intervention (description of the processes, strategies, content, and means of achieving the effects associated with the intervention and considered to be permanent as opposed to activities considered to be temporary for the purpose of introducing the intervention, 70%); Implementation (description of the approach to designing and/or introducing the intervention, 81%); and Data sources (documentation of how data were obtained and whether the primary outcome was defined, 82%). The other five domains, for which more than 50% of studies met minimum QI-MQCS criteria, included Organizational characteristics (description of setting demographics and basic characteristics, 68%); Timing (clear outline of the timeline for intervention implementation and evaluation so that follow-up time can be assessed, 60%); Adherence/fidelity (level of compliance with the intervention, 57%); Organizational readiness (description of QI culture and resources available for the intervention, 64%); and Limitations (outline of limitations and the quality of the interpretation of findings, 68%).
The five domains, for which less than 50% of studies met minimum QI-MQCS criteria, addressed evaluation of results and included Study design (documentation of the evaluation approach with respect to study design, 36%); Comparator (description of the control condition against which the intervention was evaluated, 26%); Health outcomes (inclusion of patient health outcomes in the evaluation, 17%); Penetration/reach (reporting of the proportion of eligible units that participated in the intervention, 29%); and Sustainability (information on the potential for maintaining or sustaining the intervention with or without additional resources, 40%).

Key question 1: what are common elements of quality improvement toolkits?
The evaluated toolkits addressed a variety of quality improvement approaches. Most focused on a specific clinical topic rather than general healthcare provider behaviors. Seven toolkits addressed weight management; four toolkits evaluated in five studies addressed fall prevention; three, emergency preparedness; three each patient safety and three perinatal care; and two (evaluated in three studies) were aimed at vaccination. We identified two toolkits each addressing the topics asthma management, cancer screening, elective delivery, health literacy, hospital-acquired infections, hospital readmission, medical errors, mental health, pain management, screening, smoking cessation, and substance use. The other toolkits addressed antimicrobial stewardship, autism communication, brain injury symptom management, cancer care, cardiac care, care quality, clinical decision making for critical care, depression care, diabetes care, end of life care, geriatric care, heart failure, hepatitis C care, kidney disease care, medication management, multiple sclerosis symptom management, newborn screening, nursing best practices, obstetric care, parental education, pediatric preventive care, psychotherapy Procedures: Prior to having attended the workshop, 58% included physical-activity content in more than half of their sessions, while 29% addressed physical activity in < 25% of patient sessions. In the sessions in which physical activity was discussed, 73% spent < 25% of the session on physical activity content, and 30% discussed physical activity for < 10% of each session. At the 8-12-month follow up, 66% included physical-activity content in more than half of their sessions, while only 18% addressed physical activity in < 25% of sessions. However, in the sessions in which physical activity was discussed, 78% spent < 25% of the session on this topic. 8 themes emerged: more frequently and confidently discussing physical activity in sessions (27%); increasing focus on resistance training (26%); providing patients with physical-activity procedures and written information (14%); feeling better equipped to assess current physical-activity levels (7%); assisting patients in working around barriers to being involved in physical activity (5%); recommending specific ctivities (4%); encouraging other health professionals to integrate physical activity into practice (4%) and other (12% Uptake: NR Feasibility: Of those respondents who found the toolkit very helpful (for clarity of design, comprehensiveness of information, and overall impression of the toolkit) approximately 60% had been part of the HBPC program for 5 years or less. The percentage of respondents who reported the toolkit to be helpful decreased as length of time in the HBPC program increased (22-25% for 6-10 years and 15-18% for ≥ 11 years). These results indicate that helpfulness of the toolkit was associated with fewer number of years with the HBPC program (p < 0.05). Length of time in the HBPC program manager role was not found to be associated with perceived helpfulness of the toolkit. On a 4-point Likert scale, respondents were asked if they agreed or disagreed that the topics covered in the toolkit were relevant to their preparedness protocol. Of those who implemented their disaster preparedness protocol more frequently (3-5 times/ year or 1-2 times/year), two-thirds (66-67%) strongly agreed that the topics covered in the toolkit were relevant. Conversely, of those who implemented their protocol very infrequently or never, only 23% strongly agreed that the topics covered in the toolkit were relevant to their work (p < 0.05). When asked, How often do you see yourself using this toolkit?, 8% indicated that they will never use the toolkit. The rest indicated that they would use the toolkit moderately or extensively (data not shown). HBPC program representatives were asked to describe the types of support they would need to implement the toolkit. They suggested speaking with others who have implemented the toolkit, sharing it with leadership and hospital-wide committees, collaborating with recommend the toolkit to others. Time was the most frequently indicated barrier (45%) followed by: "the quality of some of the evidence in the toolkit is questionable" (23%); "the toolkit provides 'recipes' that do not allow for enough decision making by the therapist" (15%) and "I don't know where to find or access the toolkit" (15%). 31% indicated no barriers. When asked if anything was missing from the ATT, participants suggested some additional treatment strategies, as well as requesting patient handouts or pictures and a more concise summary of the research. Providers: 49% indicated that they had changed their clinical practice based on the knowledge gained from the ATT, 87% agreed that they feel more justified in applying physiotherapy treatments that they were already using. 9% indicated they were very aware of the evidence before exploring the ATT, this increased to 45 Other intervention: Trackers for chosen; staff education: (a) accuracy with anthropometric measures to facilitate correct diagnosis of overweight and obesity, (b) assessment and evaluation of the child's lifestyle behaviors through use of a questionnaire, (c) consistent health messaging related to nutrition and physical activity, and (d) use of motivational interviewing to guide a mutually established action plan.
Uptake: Each clinic implemented the "5210" program with all child office encounters and not just for wellness visits. Feasibility: An unexpected finding was the importance of establishing incentives and a reward system for "5210" participants. Providers: NR Procedures: Profound changes occurred with large shifts in documentation of BMI percentile (from 27 to 98%; p < 0.05), education and counseling (from 9 to 87%; p < 0.05), and accurate diagnosis of overweight or obesity (from 0 to 32%; p < 0.05). There was a statistically significant decrease in documentation of blood pressure readings (from 72 to 60%; p < 0.05). Use of the screening questionnaire increased from 0 (was not utilized before the project) to 88%. Patients: The education foci that were prioritized and selected by 89% included eat more fruits and vegetables (35%), spend less time watching television and playing video/ computer games (25%), and drink more water (21%) and less sugar-based beverages (8%). Parents, especially of younger children, commented that the questionnaire heightened the awareness of the lifestyle habits of the family and motivated the parent to make changes in their diet and physical activity. Uptake: Universal assessment and screening tools and patient handouts were used on a daily basis by most respondents; 1/ 4 reported using the staged treatment, motivational interviewing guide, and quick reference guides weekly; 33% indicated that the ICD-9 codes, Utility: 29% of respondents cited ICD-9 codes and reference articles as the most useful tools; 64% rated ICD-9 codes as very useful and 57% found the reference articles very Result categories: uptake: uptake of toolkit or toolkit components and adherence to toolkit/components; utility: information on the feasibility of using the toolkit, acceptability of the toolkit and its components, reported barriers and facilitators, and staff satisfaction with the toolkit; provider: effects on learning, self-reported confidence, or attitudes, self-reported behavior changes, and intentions; procedures: changes in procedures, organizational results (e.g., tests ordered, costs); patients: patient health outcomes, patient satisfaction, and other patient-reported outcomes a The evidence table is organized by toolkit topic decision support, staff trauma support, and wrong site surgery. The toolkits varied in length and complexity and included a large variety of elements. Most toolkits were downloadable online and free of charge. The toolkit format was often a consolidated text document with written material. Some toolkits used a website with downloadable individual tools and links to additional online resources. Some toolkits included other material such as alcohol hand rubs or peak flow meters, in branded packages, and eight toolkits included a software program. Table 1 includes the toolkit components; further details, including the link to a downloadable copy of the toolkit, can be found in Additional file 2.

Implementation toolkit elements
As the summary Table 2 documents, the majority of the 72 toolkits evaluated in 77 studies included material designed to help with the introduction and implementation of the specific intervention promoted in the toolkit. This typically included educational material such as research summaries, supporting evidence for healthcare interventions, and further reading lists. Some toolkits included downloadable slide decks for presentations to staff, links to online videos to introduce the clinical issue or the intervention, information on achieving change in organizations such as action plan templates, institutional selfassessment tools, templates to collect performance data to facilitate audits and research, templates or actual material to raise awareness such as posters, and many included practical "implementation tips." As the evidence table shows, many toolkits included unique additional practical tools such as letters to management staff to raise awareness; briefing notes; detailed material for training courses (e.g., daily timetable or teach-back technique) to facilitate staff education; and other tools useful for staff such as a list of frequently asked questions, cost calculators, worksheets, or example forms.

Provider toolkit elements
Tools that targeted healthcare providers specifically were also included in most toolkits. Tools encompassed care plans, treatment and management algorithms, decision support, or clinical practice guidelines. In addition, many toolkits included assessment scales that providers could apply in clinical practice. Some toolkits also included pocket cards for clinicians, checklists to be used in clinical consultations, written scripts for healthcare providers, practice demonstration videos for providers to perform the intervention, and ready-to-use forms for patient care. A few toolkits included additional tools such as body mass index (BMI) calculators, spirometers, alcohol hand rubs, or prescription pads (see Table 1).

Patient toolkit elements
As the evidence and summary tables show, about two-thirds of toolkits included material for direct dissemination to patients. In the large majority, these were informational handouts or more comprehensive educational materials such as treatment brochures. Some toolkits included bilingual material and several contained posters and ward notices directed at patients. Other, less common resources directly targeting patients or caregivers included patient selfassessment tools, checklists (such as for appointments), activity journals and diaries, links to online resources for patients, educational videos, or peak flow meters for patients.

Key question 2: what is the uptake and utility of published quality improvement toolkits?
A majority of included studies reported on the uptake and/or utility of the evaluated toolkit.

Uptake
Fifty-five percent of studies reported information on the uptake and use in practice of and the adherence to the toolkit or its components, but the type and informational value of reported data varied widely.
Several reported download statistics for online tools or requests for the toolkit [11, 15, 29-31, 67, 88, 90], but most studies reported no denominator and reported the total number of downloads at the time of the publication with no further detail. Three studies that reported a point of reference stated that 2000 toolkit copies were downloaded in 7 months [11], that 725 copies had been downloaded in 1 year [15], or that the toolkit had been accessed by 8163 practitioners over 255 days [67]. Some studies tracked which or how many individual tools included in the toolkit had been adopted by the end users [21,24,25,29,34,35,40,46,51,56,61,64,69,75,76,78,81,88]. The evidence table shows variable uptake with no studies reporting full uptake of the toolkit. Uptake of components ranged from 10% (fitness prescription pads) [21] to 87% (recall/reminder system installed) [24].
Five studies documented staff awareness of the toolkit and whether the distributed toolkit had been reviewed by eligible users; the studies with numerical results reported high, but not perfect review rates (81-86%) [13,29,56,62,68]. Two studies reported on the proportion of eligible participating sites that adopted the toolkit; results ranged from 53 to 98% [14,19]. Several studies reported on adoption of the intervention promoted in the toolkit: 98.7% of VA facilities have MOVE! programs in place [37], 10 to 15% of teams were unable to get beyond the planning stage and 50 to 65% implemented the medical error prevention practices partially or fully [27], 67% of provinces and 53% of hospitals implemented an  emergency preparedness program [14], 7/10 sites successfully implemented a discharge program as planned [78], one indicated that all components of a protocol to prevent hospital-acquired infections had been implemented (but some had already been in place before the project) [40], one study reported that 54% of hospitals completed 14 of 17 intervention bundle elements [77], all teams had implemented best practices in all  [25], one reported varying results across intervention components (e.g., 80% identification of children with special health care needs) [24], all sites reported using at least 5/14 strategies to increase vaccination rates [51], and one study indicated that each participating clinic implemented a specific weight management program strategy in all child office encounters and not just for wellness visits [64]. Individual studies reported the proportion of adopting hospitals out of those approached [19,27,30,76], tracked the number of sites completing the toolkit evaluation project [38,76,85], surveyed how clinicians used the tools [22], or recorded which sites continued to use the toolkit after the implementation period, with or without substantial changes [10,50].

Utility
Half of included studies reported on the utility, feasibility, or acceptability of, the satisfaction with, or the barriers to using the toolkit, its components, or the intervention promoted in the toolkit. Reported satisfaction with the toolkit was generally high. One study reported that 50% of respondents found the toolkit information "some or very much helpful" [32], another reported 75% of respondents found the toolkit "extremely or very helpful" [15], one study reported ratings of "being helpful to staff" that ranged between 73 and 92% [33], one study documented that clinicians were "extremely satisfied or satisfied" in 11/11 discussions [70], in one study 86% of respondents agreed that the toolkit was helpful in clinical decision-making [62], and another study reported that 85% of staff who had read the toolkit found it helpful [29]. One study reported that most staff at three out of four sites believed the toolkit improved efficiency for adult vaccinations [51], one study found that all participants were "very satisfied or satisfied" with the overall usefulness of the toolkit [17], and one highlighted that the toolkit enabled comprehensive disease management and improved overall patient care [43]. In another study, most staff and stakeholders had described the toolkit as a useful resource [69], and three studies indicated that feedback was "positive" [22,23,63]. Two studies reported mixed feedback [67,79]: while most providers found the toolkit moderately or very useful, several noted that they already were doing what was recommended [79]. One study found that the perceived helpfulness of the toolkit decreased over time after implementation of the intervention [89].
For feasibility, ten studies indicated that the interventions or best practices included in the toolkit were not feasible [13,21,25,27,34,59,73,[84][85][86]. For example, a quarter of participants in one study reported that systematic screening for obesity was not feasible in clinical practice [21]. Up to 91% of teams found implementing the recommended practices difficult in another study [27], and one study highlighted that 54% of users reported that incorporating health literacy techniques added time to the patient's visit, although all thought the time was worthwhile [34].
Several studies ranked or rated individual toolkit components and found variation in the utility of different components [17,26,31,35,49,63,65,85,89]. For example, one study reported that 29% of respondents found ICD codes and reference articles the most useful tools in a pediatric obesity toolkit [35]. One study reported a wide range of perceived usefulness across components (cost calculator 10%, patient health questionnaire 68%) [31], one study reported that all participants were satisfied with the algorithms while only 83% were satisfied with the included office strategies to improve screening [17], one indicated that the provided frameworks for implementation were helpful and that the major success element was alcohol hand rubs [26], and one study reported on videos as the most positively rated component among individual tools [49]. Four studies assessed how to improve the toolkit or which components were missing [31,39,62,67].
Individual studies reported ratings across dimensions such as ease of use [41], estimated time spent using the toolkit [48], or which intervention components (e.g., patient partnering) were most difficult to implement [25].

Key question 3: what is the effectiveness of published quality improvement toolkits?
We systematically extracted any information reported on process, provider, and patient effects.
The randomized controlled trials (RCTs) reported positive results for process outcomes. A Fall TIPS toolkit study reported patients on the intervention units were more likely to have fall risk documented (p < .0001) [16]. An evaluation of the America-on-the-Move toolkit reported control providers provided nutrition counseling to overweight patients in 40 to 49% of visits compared to 30 to 39% in intervention providers but the statistical significance of the difference was not reported [39]. Intervention practices increased vaccination rates more than controls (p = 0.34) in a study that used the 4-Pillars Toolkit for Increasing Childhood Influenza Immunization [46]. One RCT and five controlled trials did not report procedure outcomes [18,28,32,44,77,87]. One controlled trial indicated that the control group missed or weakly addressed on average 3.3 of nine key intensive care unit care but no significant test was reported [81].
Pre-post studies that compared baseline and follow-up performances and that reported a statistical significance test for the difference were generally positive but there was variation across different procedures. The median percent of patients with asthma using inhaled corticosteroids, patients with an action plan, and patients using spirometry increased statistically significantly after introducing the Colorado Asthma Toolkit [19]. In another study, performance on quality measures for antenatal steroid administration increased from 77 to 100% (< .01) [36]. The Fall TIPS toolkit was associated with an increase from 1.7 to 2.0 in the mean number of fall risk assessments completed per day 1 month after implementation (p < .003) [61] [23]. An evaluation of an Acute Postoperative Pain Management Toolkit reported statistically significantly improvement in two pain management indicators (patients who had a pain score used to assess pain at rest and movement, patients with documented pain management plan) [12]. Compared to baseline, nurses were almost twice as likely to advise smokers to quit (p < .005), and more likely to assess willingness to quit, assist with a quit plan, and to recommend the smoking helpline (p < .0001) 6 months after the implementation of a smoking cessation toolkit [83]. One study showed a significant increase (p = .03) in the number of patients reporting a dialogue about weight management [82].
Five pre-post studies with numerical data reported mixed results. The Bright Futures Training Intervention Project toolkit was associated with statistically significant increases in the use of a preventive service prompting system and the proportion of families asked about special health care needs, but not the proportion of children who received a structured developmental assessment [24]. A toolkit to support multiple sclerosis management was associated with some improvements in documented assessments and care plan documentation [43]. A prepost study evaluating the 4 Pillars Toolkit found different results for the different vaccines and different sites [51]. Medication list but not allergy list accuracy improved after introducing the Ambulatory Patient Safety Toolkit [25]. Another study showed improvements in documentation of BMI percentile (p < .05), education and counseling (p < .05), accurate diagnosis of overweight or obesity (p < .05) but a decrease in documentation of blood pressure readings (p < .05) [64].
Studies also reported on healthcare provider attitudes [21,26,32,43,44,49,52,60,62,63,68,69,76,78,86]. For example, one study reported 76 to 84% of providers indicated that posters made staff think about their hand hygiene [26], one indicated that positive perceptions of the importance and usefulness of body mass index increased [21], one reported increased awareness of multiple sclerosis symptoms [43], one indicated that the impact on patients varied by site [52], and one found no difference in safety perception, culture of safety awareness, sensitivity, and competence behaviors between the toolkit exposed and control groups [32].
Some studies reported on self-reported provider knowledge, confidence and perceived competence, and results were positive throughout [30,34,44,60,62,65,[67][68][69][70]76]. Examples included that 77% of users agreed that their knowledge of health literacy was improved [34], participants' ratings of knowledge gain and confidence in geriatric competencies improved [30], and provider confidence in the ability to provide physical activity and exercise counseling and greater knowledge about physical activity improved [44].
Three studies tested provider knowledge; one found no difference in general concussion knowledge between intervention and control groups but intervention physicians were less likely to recommend next day return to play after concussion [18]. A congenital heart disease toolkit improved knowledge (pretest average score 71% improved to 93%, p < .0001) [66], and one study documented that only three of the ten knowledgebased questions were answered correctly by more than 85% of participants on the pre-test but all ten questions were answered correctly by at least 95% of participants on the post-test after implementing a patient safety toolkit [88]. One study reported that adherence to targeted provider behaviors increased significantly for 62% of behaviors but not for counselor competence [50].
None of the RCTs reported on patient outcomes. The studies with concurrent control groups reported mixed results within and across studies. A controlled trial (12/16 QI-MQCS domain criteria met) evaluating the impact of shared decision making supported by a toolkit reported higher asthma quality of life (MD 0.9; CI 0.4, 1.4) and fewer asthma control problems (MD − 0.9; CI − 1.6, − 0.2) in the intervention group [87]. Another controlled trial (13/16 QI-MQCS) found a single counseling appointment using the Diabetes Physical Activity and Exercise Toolkit was not associated with significant changes in physical activity or clinical outcomes compared to standard care [44]. The Guidelines Applied in Practice-Heart Failure Tool Kit was associated with a reduction in the baseline-adjusted 30-day readmission rate but not 30-day mortality comparing the toolkit and a control cohort (7/16 QI-MQCS) [28]. A state perinatal quality collaborative reported that women in hospitals engaged in the initiative experienced a 21% reduction in severe maternal morbidity among hemorrhage patients compared to baseline while the non-participating California hospitals showed no changes (1.2% reduction, n.s.); the collaborative used a toolkit to disseminate the intervention bundle (13/16 QI-MQCS) [77].
Two pre-post studies reported a statistically significant reduction in the incidence rate of hospital-acquired infections. One study (14/16 QI-MQCS) reported a reduction in carbapenemase-producing Enterobacteriaceae outbreaks and no further occurrence of extensively drug-resistant Acinetobacter baumannii after introducing a CDC toolkit and additional safety procedures such as limiting access to rooms and common areas [40]. A study (13/16 QI-MQCS) evaluating the AORN toolkit accompanying the Universal Protocol for Correct Site Surgery reported that after the introduction of the protocol, the rate of wrong site surgery increased initially [33]. A study (3/16 QI-MQCS) evaluating a toolkit on elimination of nonmedically indicated (elective) deliveries before 39 weeks gestational age indicated that there were no transfers to the neonatal intensive care unit compared to five transfers pre-intervention (p < .022) for non-medically indicated deliveries between 37/0 and 38/6 pregnancy weeks [55]. A study (13/16 QI-MQCS) evaluating a toolkit-based intervention to reduce central line associated bloodstream infections reported that the rate of infections decreased by 24% (p = .001) [84]. The remaining pre-post studies reported improved patient outcomes for some or all outcomes but the statistical significance was not reported (QI-MQCS assessments ranged from four to 14 domain criteria met) [10,45,52,57,61,78].

Comparison of original intervention and toolkit supported effects
For six toolkits, results of the initial intervention that led to the development of the toolkit had been published. However, no definitive comparison between initial intervention and success of spreading the intervention via the toolkit could be achieved due to the paucity of data and differences in study designs and metrics.
A toolkit intervention to reduce central line associated bloodstream infections referred to a published RCT that had established the effectiveness of the interventions for intensive care unit patients. The toolkit intervention established a 24% infection rate reduction and the authors highlighted the routine practice evaluating achieved results that are comparable to the original trial results (modeled hazard ratio 0.63, 2.1 vs 3.4 isolates per 1000 days, p = .01) [84,91]. A toolkit for postoperative pain management was based on an initiative that had achieved a 13% increase in preoperative patient education and 19% increase in patients with at least one documented postoperative pain score [92]. Corresponding results associated with toolkit-based spread showed a 28% increase of patients with pain assessments [12]. An electronic fall prevention toolkit was tested in two studies [16,23] and results were also available from the development of the toolkit. The intervention was associated with a reduced rate of falls [93] but the RCT testing the toolkit-assisted spread evaluation did not report on patient outcomes and it is unclear whether the toolkit can replicate the results in different organizations. An antenatal corticosteroid therapy toolkit was developed as part of a quality care collaborative that reported that antenatal steroid administration rate increased from 76 to 86% [94]. The results associated with implementation of the later developed toolkit was 100% performance of state quality measures for antenatal steroid performance administration compared to 77% at baseline [36]. The Project Re-Engineered Discharge toolkit was associated with a readmission rate reduction of 32% compared to baseline but the 30-day readmission rate was not reported [45]. The original hospital discharge program reported reduced hospital utilization within 30 days of discharge in an RCT comparing to usual care (30-day readmission rate 0.149 vs 0.207) [95]. The four pillars toolkit for influenza and pneumococcal vaccinations has been evaluated in multiple publications [46,51]. The development phase of the toolkit has also been documented, but reported information was limited to areas of improvement that resulted in the final tool [96]. A relapse prevention group counseling toolkit was associated with counselor adherence to toolkit content in 13 out of 21 targeted behaviors [50]. Data from the development phase of the toolkit were available but not directly comparable; one study reported significant improvements in content adherence after 3 h of training [97], the other study reported on acceptability and sustainability of toolkit use [98].

Discussion
There are few methods other than toolkits to document complex healthcare interventions or to support their use outside of initial intervention sites, yet little theoretical or empirical literature addresses toolkit use. We reviewed over a decade of published evaluations of toolkits used as a method for spreading quality improvement interventions for healthcare delivery organizations. This review documents the frequency of key toolkit elements and the effects of using publicly available toolkits. We hope this review will stimulate further thought on use of toolkits, on toolkit evaluation, and on toolkit reporting.
The toolkits and their evaluations included highly variable sets of information. Among toolkit elements, the toolkits we identified most commonly included introductory and implementation information (e.g., educational material for staff) and healthcare provider tools for clinical practice (e.g., care plans); and two-thirds included material for patients (e.g., information leaflets). Among evaluation elements, studies most often rated satisfaction with the toolkit and/or ratings of the utility of individual tools; while satisfaction was usually high, usefulness ratings varied. Rates of toolkit uptake across eligible users could provide invaluable information on issues such as ease of adoption, needed toolkit improvements, or equity in terms of making toolkit benefits accessible to all eligible subjects. However, only half of studies reported on toolkit uptake; these studies typically showed varied uptake between providers and/or settings. The reported information on toolkit uptake also often lacked a denominator or point of reference, such as the time period of tracked downloads, how many providers or sites were eligible, or how the uptake compared to other toolkits. A qualitative study of clinic and community members perspectives on intervention toolkits highlighted that information on the use of the toolkit is critical; simply disseminating toolkits does not guarantee its use [99].
We found the existing evidence base on toolkit effectiveness to be very limited despite the substantial number of publications on toolkits. We looked for effectiveness information not only in the searched toolkit publication, but in any related studies of the toolkit. While more than half of the included studies reported on adherence to clinical procedures, only some assessed effects on healthcare providers. In addition, the existing evidence base for healthcare provider effects associated with toolkits focuses on self-reported behavior changes or intentions. While reported results were positive and often indicated substantial improvement, objective tests for behavior changes are largely absent from the literature.
Quality improvement theory emphasizes the importance of completing the intervention and evaluation cycle through an assessment of impacts on patient care and outcomes, but we found few such assessments. Few studies reported on patient outcomes and there is a lack of evaluations showing improved health outcomes to be associated with toolkits. Toolkits are commonly aimed at intervention spread; however, the evidence base for their effectiveness for this purpose is limited. Identified RCTs reported positive results for spread sites; however, the number of high-level evidence studies that allow strong effectiveness conclusions is small. While pre-post assessments tended to be positive, studies with concurrent control groups reported mixed results within and across studies. More evaluations of toolkit effects on patient care and outcomes are needed to determine whether the use of toolkits translates into improvements for patients.
Throughout, study results were often insufficiently reported and the assessed outcomes were very diverse. Furthermore, the identified studies were often not designed to assess the effect of the toolkit per se because the intervention included other components in addition to the toolkit. Use of stronger study designs for assessing toolkit effectiveness as a method of spread, such as presenting comparisons to the status prior to their implementation or to a control group, would increase the value of toolkit spread studies.
An optimistic review interpretation is that studies of toolkit effectiveness showed no deterioration when the toolkit was applied in new settings. Very few published studies are available that directly address this comparison, however. While some studies described the development of the toolkit as following a successful intervention implementation, very few studies reported numerical results that allowed a direct comparison between the original intervention and the results of facilitating the spread of the intervention through a toolkit.
The reported detail in the included studies varied widely and no study met all of the QI-MQCS criteria, a critical appraisal tool for quality improvement evaluation publications [9]. We included studies reported in abbreviated form such as conference abstracts, hence some information important to practitioners was sometimes not available but a large majority of studies reported a rationale for implementing the toolkit in their organization and provided information on the intended change in organizational or provider behavior that they were aiming to achieve with the toolkit. We anticipate that future evaluations of toolkits can increase their impacts by focusing on the information most likely to be useful to potential users or to fellow developers of toolkits. These include, for example, uptake rates, resources required for toolkit adoption, and resources required for toolkit maintenance. Information on toolkit adaptations required for adoption in different organizational contexts would also be helpful. Furthermore, while the reported satisfaction with the toolkits was generally reported to be positive, there were often large variations in ratings of the utility of specific components or tools. Further evaluations should consider the merits of assessing individual toolkit components in addition to evaluating the toolkit as a whole.
There is no standard definition of a toolkit and guidance for toolkit developers and users is only beginning to emerge [100]. A strength of this review is our focus on quality improvement interventions in healthcare, using a definition based on our prior experience with quality improvement and implementation research [8,9,[101][102][103][104][105][106]. A limitation is that we used a self-applied definition of what constitutes a toolkit and we only searched for studies using the term "toolkit." A broader review of tools and of similar resources not referenced as "toolkits" would be an important addition to the literature.
The included studies and evaluated toolkits were very heterogeneous, limiting generalizable conclusions that can be drawn across studies, and the diversity is reflected in the evidence and summary tables. Nonetheless, the review was limited to publications and toolkits that used the term "toolkit" and we included only toolkits reported in published literature. Our review included gray literature in that we purposefully included conference abstracts and dissertations; we know, however, that we missed information on unpublished use of toolkits especially in large organizations. Furthermore, the number of studies contributing the effectiveness key question was limited, in particular studies reporting on the primary outcome-patient health. Limitations in the quality of evidence hindered more detailed analyses and conclusions, including answers to the question whether toolkits developed in another context can achieve the same results in a new context.
Finally, our review concentrated on the large number of toolkits that are currently publicly available, free of charge or for purchase. Toolkits not explicitly designed for ongoing spread (e.g., toolkit distributions for onetime interventions) were beyond the scope of the review. A prior systematic review on toolkits reported limited evidence for toolkits as a general intervention component or implementation strategy. Of eight methodologically acceptable evaluations identified by the review, six showed at least partial effectiveness in changing clinical outcomes; however, the review concluded that more rigorous study designs were needed to explain the factors underlying toolkit effectiveness and successful implementation [107].

Conclusions
This review documents over a decade of evaluations of publicly available quality improvement toolkits and provides insight into the components, the uptake, and the current evidence base of the effectiveness of this tool for spread. Available uptake data are limited but indicate variability. High satisfaction with toolkits can be achieved but the usefulness of individual tools may vary. The existing evidence base on the effectiveness of toolkits remains limited. While emerging evidence indicates positive effects on clinical processes, more research on toolkit value and what affects it is needed, including linking toolkits to objective provider behavior measures and patient outcomes. Considering the potential importance of toolkits as a method for maximizing the impacts of healthcare improvement interventions, a stronger research focus on the conduct and reporting of toolkit intervention and evaluation components is critical.