We have compared a variety of search strategies designed to identify quality improvement intervention publications in electronic databases. Overall, these strategies produced moderate results in simultaneously achieving a manageable total yield, as well as acceptable recall, recall-to-yield ratios, and precision.
Although the total retrieval rate varied widely, only one strategy resulted in a yield of fewer than 7,000 publications. Our investigation was restricted to MEDLINE; when adding further pertinent databases to the search, the retrieval rate is likely to double. However, we searched without restricting clinical field, setting, patient characteristic, outcome, or publication year, which represents an uncommon scenario [19–22].
The recall rates ranged from 5% to 53% of identified publications across the three reference sets suggesting only moderate sensitivity. This rate does not reach the standards of methodological search filters . Dickersin et al. summarized the proportion of correctly identified references of gold standard reference sets for 18 topics, and reported weighted mean results of 51% of all publications, 77% within journals indexed in MEDLINE, and 63% for selected MEDLINE journals . Search strategies to capture certain study designs, particularly RCTs, are readily available , but their level of usage is limited [8, 25]. The reported recall rates are approaching other clinical topic filters, for example a strategy to identify palliative care literature had reported sensitivity rates of 65% after modifying an existing search strategy that achieved a 45% rate [26, 27]. A study investigating the recall for RCTs of selected interventions, such as physician reminders, reported recall rates of 58% for MeSH terms and 11% for text words. The 'QI hedges' achieved sensitivities of 100% while maintaining a specificity of 89% for identifying evaluations of 'methodologically sound' evaluations of provider interventions . However, by comparison the strategies produce a yield between 933,460 (search strategy: random:.ti, ab. OR educat:.tw. OR exp patient care management) and 15,691,611 (search strategy: control: trial:.mp. OR journal.mp. OR MEDLINE.tw. OR random: trial:.tw) of MEDLINE publications, considerable more than the search strategies presented here.
A further potential explanation for the limited recall rates may lie in the nature of the reference sets. The publication selections of the two expert selected sets were based on each member's understanding of quality improvement rather than an agreed exact and presumably narrower definition. The filter performance was consistently better for the more homogenous EPOC reference set (with the exception of the CQI methods filter); however, the expert selected sets represent the kind of quality improvement publications a variety of stakeholders is interested in retrieving, which can be diverse in nature. Furthermore, the reference sets included between 25 and 29 publications, with a total of 78 unique publications. A study investigating the optimal sample size for bibliographic retrieval studies determined that at least 99 high-quality publications are needed for a 10% or less width of the 95% confidence intervals when developing or validating search strategies .
The selected quality improvement publications covered diverse individual interventions with great variation across approaches, research fields, general topics, settings, participants, and methods of delivery. Scrutinizing the individual publications represented in the reference sets there were no unifying themes shared by all articles that could be used as key words in an electronic search. Some publications were so specific that they had no electronically usable identifiers in common with other publications, although expert screeners identified the publications as relevant to quality improvement. A limitation of our study is that the search terms were not selected through a computerized method, and this subjective component may have contributed to the relatively low recall rates in comparison to computer-based methods [9, 11]. The individual terms were combined through the Boolean operators 'OR' and 'AND' as well as proximity operators, rather than individually tested and simply combined cumulatively in the final search strategy (e.g., term one OR term two OR term three), adding levels of complexity, and the potential for yield and filter failure was simultaneously considered. In addition, our aim in developing the search strategies was generalizability for use in quality improvement literature reviews, rather than maximizing the retrieval of selected reference publications. We explicitly considered the recall-to-yield ratio. Every filter increases the risk of missing pertinent studies. Comprehensive search strategies may identify a large number of relevant studies, but the extent of retrieval volume may be beyond what is conceivably practical.
We identified a simple text word strategy ('quality' AND 'improv*' AND 'intervention*') as the 'best-case' scenario. Although adding synonyms to the chosen terms would have increased the recall rate and presumably the sensitivity, the expected increase in noise caused us to work only with the truncation function of PubMed and MEDLINE (Ovid). However, this feature is limited; some publications  were not identified because the authors used the term 'program' instead of 'intervention,' and could be found only by using the known synonym approach. Similarly, intervention components evolve and approaches can only be identified if the feature is known at the time of searching. Given the vast number of ways of describing an intervention and the continuous development of new approaches, the attempt to solve this problem with 'brainstorming' synonyms appears problematic. The CQI term approach did not prove to be fruitful for identifying quality improvement intervention publications. While particular methods may frequently be used in the development of the interventions, these methods do not generally appear in the title or abstract of the publication.
Most of the search terms and strategies we have presented may be of use to facilitate literature syntheses for specific needs. Identifying quality improvement interventions for particular conditions, clinical fields, contexts, or outcomes will limit search volumes, and the key terms, individual strategies, or combinations of strategies may be adopted for more targeted searches. However, the performance of the presented filters is limited, and further research into optimal strategies is required. Validated search strategies are needed in order to be able to evaluate literature reviews and their likely success in covering the universe of pertinent studies; the need for search validations is albeit not specific to quality improvement interventions literature reviews .
It is disturbing that, despite our best efforts, we were only moderately successful in identifying pertinent quality improvement interventions. Users of PubMed and MEDLINE depend heavily on the assigned MeSH terms through the NLM. The introduction of a specific MeSH term would significantly facilitate the access to the growing evidence base on quality improvement. Better labeling of publications to ensure identification is also a responsibility of authors. Indeed, the first item of the SQUIRE guidelines suggests the including the term 'quality improvement' in the title of the publication . Without a concerted effort by authors, journals, and medical databases to label quality improvement publications so that they can be identified in literature searches, access to evidence and knowledge accumulation in the field is likely to remain limited.