Can computerized clinical decision support systems improve practitioners' diagnostic test ordering behavior? A decision-maker-researcher partnership systematic review

Background Underuse and overuse of diagnostic tests have important implications for health outcomes and costs. Decision support technology purports to optimize the use of diagnostic tests in clinical practice. The objective of this review was to assess whether computerized clinical decision support systems (CCDSSs) are effective at improving ordering of tests for diagnosis, monitoring of disease, or monitoring of treatment. The outcome of interest was effect on the diagnostic test-ordering behavior of practitioners. Methods We conducted a decision-maker-researcher partnership systematic review. We searched MEDLINE, EMBASE, Ovid's EBM Reviews database, Inspec, and reference lists for eligible articles published up to January 2010. We included randomized controlled trials comparing the use of CCDSSs to usual practice or non-CCDSS controls in clinical care settings. Trials were eligible if at least one component of the CCDSS gave suggestions for ordering or performing a diagnostic procedure. We considered studies 'positive' if they showed a statistically significant improvement in at least 50% of test ordering outcomes. Results Thirty-five studies were identified, with significantly higher methodological quality in those published after the year 2000 (p = 0.002). Thirty-three trials reported evaluable data on diagnostic test ordering, and 55% (18/33) of CCDSSs improved testing behavior overall, including 83% (5/6) for diagnosis, 63% (5/8) for treatment monitoring, 35% (6/17) for disease monitoring, and 100% (3/3) for other purposes. Four of the systems explicitly attempted to reduce test ordering rates and all succeeded. Factors of particular interest to decision makers include costs, user satisfaction, and impact on workflow but were rarely investigated or reported. Conclusions Some CCDSSs can modify practitioner test-ordering behavior. To better inform development and implementation efforts, studies should describe in more detail potentially important factors such as system design, user interface, local context, implementation strategy, and evaluate impact on user satisfaction and workflow, costs, and unintended consequences.


Background
Much of medical care hinges on performing the right test, on the right patient, at the right time. Apart from their financial cost, diagnostic tests have downstream implications on care and, ultimately, patient outcomes.
Yet, studies suggest wide variation in diagnostic test ordering behavior for seemingly similar patients [1][2][3][4]. This variation may be due to overuse or underuse of tests and may reflect inaccurate interpretation of results, rapid advances in diagnostic technology, and challenges in estimating tests' performance characteristics [5][6][7][8][9][10]. Thus, developing effective strategies to optimize healthcare practitioners' diagnostic test ordering behavior has become a major concern [11].
A variety of methods have been considered, including educational messages, reminders, and computerized clinical decision support systems (CCDSSs) [2,[12][13][14]. For example, Thomas et al. [15] programmed a laboratory information system to automatically produce reminder messages that discourage future inappropriate use for each of nine diagnostic tests. A systematic review of strategies to change test-ordering behavior concluded that most interventions assessed were effective [2]. However, this review was limited by the low quality of primary studies. More recently, Shojania et al. [16] quantified the magnitude of improvements in processes of care from computer reminders delivered to physicians for any clinical purpose. Pooling data across randomized trials, they found a modest 3.8% median improvement (interquartile range [IQR], 15.9%) in adherence to test ordering reminders.
CCDSSs match characteristics of individual patients to a computerized knowledge base and provide patientspecific recommendations. The Health Information Research Unit (HIRU) at McMaster University previously conducted a systematic review assessing the effects of CCDSSs on practitioner performance and patient outcomes in 1994 [17], updated it in 1998 [18], and most recently in 2005 [19]. However, these reviews have not focused specifically on the use of diagnostic tests.
In this current update, we had the opportunity to partner with local hospital administration, clinical staff, and representatives of our regional health authority, in anticipation of major institutional investments in health information technology. Many new studies have been published in this field since our previous work in 2005 [19] allowing us to focus on randomized controlled trials (RCTs), with their lessened risk of bias. To better address the information needs of our decision-making partners, we focused on six separate topics for review: diagnostic test ordering, primary preventive care, drug prescribing, acute medical care, chronic disease management, and therapeutic drug monitoring and dosing. In this paper, we determine if CCDSSs improve practitioners' diagnostic test ordering behavior.

Methods
We previously published detailed methods for conducting this systematic review available at http://www.implementationscience.com/content/5/1/12 [20]. These methods are briefly summarized here, along with details specific to this review of CCDSSs for diagnostic test ordering.

Research question
Do CCDSSs improve practitioners' diagnostic test ordering behavior?

Partnering with decision makers
The research team engaged key decision makers early in the project to guide its design and endorse its funding application. Direction for the overall review was provided by senior administrators at Hamilton Health Sciences (one of Canada's largest teaching hospitals) and our regional health authority. JY (Department of Medicine) and DK (Chair, Department of Radiology) provided specific guidance for the area of diagnostic test ordering by selecting from each study the outcomes relevant to diagnostic testing. HIRU research staff searched for and selected trials for inclusion, as well as extracted and synthesised pertinent data. All partners worked together through the review process to facilitate knowledge translation, that is, to define whether and how to transfer findings into clinical practice.

Search strategy
We previously published the details of our search strategy [20]. Briefly, we examined citations retrieved from MEDLINE, EMBASE, Ovid's Evidence-Based Medicine Reviews, and Inspec bibliographic databases up to 6 January 2010, and hand-searched the reference lists of included articles and relevant systematic reviews.

Study selection
In pairs, our reviewers independently evaluated each study's eligibility for inclusion, and a third observer resolved disagreements. We first included all RCTs that assessed a CCDSS's effect on healthcare processes in which the system was used by healthcare professionals and provided patient-specific assessments or recommendations. We then selected trials of systems that gave direct recommendations to order or not to order a diagnostic test, or presented testing options, and measured impact on diagnostic processes. Trials of systems that simply gave advice for interpreting test results were excluded (such as Poels et al. [21]), as were trials of diagnostic systems that only reasoned through patient characteristics to suggest a diagnosis without making test recommendations (such as Bogusevicius et al. [22]). Systems that provided only information, such as cost of testing [23] or past test results [24] without actionable recommendations or options were also excluded.

Data extraction
Pairs of reviewers independently extracted data from all eligible trials, including a wide range of system design and implementation characteristics, study methods, setting, funding sources, patient/provider characteristics, and effects on care process and clinical outcomes, adverse effects, effects on workflow, costs, and practitioner satisfaction. Disagreements were resolved by a third reviewer or by consensus. We attempted to contact primary authors of all included trials to confirm extracted data and to provide missing data, receiving a response from 69% (24/35).

Assessment of study quality
We assessed the methodological quality of eligible trials with a 10-point scale consisting of five potential sources of bias, including concealment of allocation, appropriate unit of allocation, appropriate adjustment for baseline differences, appropriate blinding of assessment, and adequate follow-up [20]. For each source of bias, a score of 0 indicated the highest potential for bias, whereas a score of 2 indicated the lowest, generating a range of scores from 0 (lowest study quality) to 10 (highest study quality). We used a 2-tailed Mann-Whitney U test to assess whether the quality of trials has improved with time, comparing methodologic scores between trials published before the year 2000 and those published later.

Assessment of CCDSS intervention effects
In determining effectiveness, we focused exclusively on diagnostic testing measures and defined these broadly to include performing physical examinations (e.g., eye and foot exams), blood pressure measurements, as well as ordering laboratory, imaging, and functional tests. Patient outcomes were excluded from this study because, in general, they are most directly affected by treatment action and could not be attributed solely to diagnostic testing advice, especially in systems that also recommended therapy. Impact on patient outcomes and other process outcomes was assessed in our other current reviews on primary preventive care, drug prescribing, acute medical care, chronic disease management, and therapeutic drug monitoring and dosing.
Whenever possible, we classified systems as serving at least one of three purposes: disease monitoring (e.g., measuring HbA 1c in diabetes), treatment monitoring (e. g., measuring liver enzymes at time of statin prescription), and diagnosis (e.g., laboratory tests to detect source of fever). We classified trials in each area depending on whether they gave recommendations for that purpose and measured the outcome of those recommendations. Trials of systems for monitoring of medications with narrow therapeutic indexes, such as insulin or warfarin, are the focus of a separate report on CCDSSs for toxic drug monitoring and dosing and are not discussed here.
We looked for the intended direction of impact: to increase or to decrease testing. We considered a system effective if it changed, in the intended direction, a prespecified primary outcome measuring use of diagnostic tests (2-tailed p < 0.05). If multiple pre-specified primary outcomes were reported, we considered a change in ≥50% of outcomes to represent effectiveness. We considered primary those outcomes reported by the author as 'primary' or 'main,' or if no such statements could be found, we considered the outcome used for sample size calculations to be primary. In the absence of a relevant primary outcome, we looked for a change in ≥50% of multiple pre-specified secondary outcomes. If there were no relevant pre-specified outcomes, systems that changed ≥50% of reported diagnostic process outcomes were considered effective. We included studies with multiple CCDSS arms in the count of 'positive' studies if any of the CCDSS arms showed a benefit over the control arm. These criteria are more specific than those used in our previous review [19]; therefore, some studies included in our earlier review [19] were re-categorised with respect to their effectiveness in this review.

Data synthesis and analysis
We summarized data using descriptive measures, including proportions, medians, and ranges. Denominators vary in some proportions because not all trials reported relevant information. We conducted our analyses using SPSS, version 15.0. Given study-level differences in participants, clinical settings, disease conditions, interventions, and outcomes measured, we did not attempt a meta-analysis.

Results
Our assessment of trial quality is summarized in Additional file 1, Table S1; system characteristics in Additional file 2, Table S2; study characteristics in Additional file 3, Table S3; outcome data in Table 1 and Additional file 4, Table S4; and other CCDSS-related outcomes in Additional file 5, Table S5.

Disease monitoring
Systems in 49% (17/35) of trials (median quality score, 7; ranging from 4 to 10) gave recommendations for monitoring active conditions, all focusing on chronic diseases . Their effectiveness for improving all processes of care and patient outcomes was assessed in our review on chronic disease management. Here we looked specifically for their impact on monitoring activity and found that 35% (6/17) increased appropriate monitoring [25][26][27][28][29][30][31]41].
In the context of diabetes, four of eight trials successfully increased timely monitoring of common targets such as HbA1c, blood lipids, blood pressure, urine albumin, and foot and eye health [26][27][28][29][30]41]. One of two systems that focused primarily on monitoring of hypertension was effective at increasing the frequency of appropriate blood pressure measurement [31]. One of three trials that focused on dyslipidemia improved monitoring of blood lipids [25]. Another three systems gave suggestions for monitoring of asthma [35,37,39,40], angina [39,40], chronic obstructive pulmonary disease (COPD) [37], and one for a combination of renal disease, obesity, and hypertension [47][48][49], but all failed to change testing behavior.

Treatment monitoring
Systems in 23% of trials (8/35) [34,[50][51][52][53][54][55][56][57] provided suggestions for laboratory monitoring of drug therapy. Trials in this area were generally recent and of high b Outcomes are evaluated for effect as positive (+) or negative (-) for CCDSS, or no effect (0), based on the following hierarchy. An effect is defined as ≥50% of relevant outcomes showing a statistically significant difference (2p< 0.05): 1. If a single primary outcome is reported, in which all components are applicable, this is the only outcome evaluated. 2. If >1 primary outcome is reported, the ≥50% rule applies and only the primary outcomes are evaluated. 3. If no primary outcomes are reported (or only some of the primary outcome components are relevant) but overall analyses are provided, the overall analyses are evaluated as primary outcomes. Subgroup analyses are not considered. 4. If no primary outcomes or overall analyses are reported, or only some components of the primary outcome are relevant for the clinical care area, any reported prespecified outcomes are evaluated. 5. If no clearly pre-specified outcomes are reported, any available outcomes are considered. 6. If statistical comparisons are not reported, 'effect' is designated as not evaluated (...). c Gives suggestions for monitoring of disease and treatment and is included in both categories. Outcomes were analyzed separately in each category but overall analysis of effectiveness (reported in text) was assessed for all diagnostic testing outcomes. quality (median score, 8.5; range, 2 to 10; 75% (6/8) published since 2005). They targeted a wide range of medications (described in Additional file 4, Table S4) and are discussed in detail in our review of CCDSSs for drug prescribing, which looked for effects on prescribing behavior and patient outcomes. Focusing on their effectiveness for improving laboratory monitoring, we found that 63% (5/8) improved practices such as timely monitoring for adverse effects of medical therapy [34,52,53,[55][56][57]. However, two of the trials demonstrating an impact were older and had low methodologic scores [56,57].

Other
Finally, five trials did not specify the clinical purpose of recommended tests [15,[65][66][67][68], or suggested tests for several purposes but without data necessary to isolate the effects on testing for any one purpose. Three of five focused on reducing ordering rates and were successful [15,66,67]. Javitt et al. intended to increase test ordering and measured compliance with suggestions, but did not evaluate the outcome due to technical problems [65]. Overhage et al. meant to increase 'corollary orders' (tests to monitor the effects of other tests or treatments), but did not present statistical comparisons of their data on diagnostic process outcomes [68].

Costs and practical process-related outcomes
Potentially important factors such as user satisfaction, adverse events, and impact on cost and workflow were rarely studied (see Additional file 5, Table S5). Because most systems also gave recommendations for therapy, we were usually unable to isolate the effects of testordering suggestions on these factors, and here we discuss systems that gave only testing advice.
Two trials estimated statistically significant reductions in the cost of care, but estimates were small in one study [37] and imprecise (large confidence interval) in the other [28,29]. A third study estimated a relatively small reduction in annual laboratory costs ($35,000), but presented no statistical comparisons [66].
Three trials formally evaluated user satisfaction. One study found mixed satisfaction with a system for monitoring of diabetes and postulated that this was due to technical difficulties [26,27]. Another found that 78% of users felt CCDSS suggestion for ordering of HIV tests had an effect on their test-ordering practices, despite failing to show an effect of the CCDSS in the study [58]. The third study found that, regardless of high satisfaction with the local CPOE system, satisfaction with reminders about potentially redundant laboratory tests was lower (3.5 on a scale of 1 to 7) [66].
Only one study formally looked for adverse events caused by the CCDSS [66]. The system was designed to reduce potentially redundant clinical laboratory tests by giving reminders. Researchers assessed the potential for adverse events by checking for new abnormal test results for the same test performed after initial cancellation. Fifty-three percent of accepted reminders for a redundant test were followed by the same type of test within 72 hours, and 24% were abnormal, although only 4% provided new information and 1% led to changes in clinical management.
One study made a formal attempt to measure impact on user workflow and found that use of the CCDSS did not increase length of clinical encounters [45]. However, this outcome was not prespecified and the study may not have had adequate statistical power to detect an effect.

Discussion
Our systematic review of RCTs of CCDSSs for diagnostic test ordering found that overall testing behavior was improved in just over one-half of trials. We considered studies 'positive' if they showed a statistically significant improvement in at least 50% of diagnostic process outcomes.
While the earliest RCT of a system for this purpose was published in 1976, most examples have appeared in the past five years, and evaluation methods have improved with time. Systems' diagnostic test ordering advice was most often intended to increase the ordering of certain tests in specific situations. Most systems suggested tests to diagnose new conditions, to monitor existing ones, or to monitor recently initiated drug treatments. Trials often demonstrated benefits in the areas of diagnosis and treatment monitoring, but were seldom effective for disease monitoring. All four systems that were explicitly meant to decrease unnecessary testing were successful [15,62,63,66,67]. CCDSSs may be better suited for some purposes than for others, but we need more trials and more detailed reporting of potential confounders, such as system design and implementation characteristics, to reliably assess the relationship between purpose and effectiveness.
Previous reviews have separately synthesized the literature on ways of improving diagnostic testing practice and on the effectiveness of CCDSSs [2,[12][13][14][17][18][19]69]. Our current systematic review combines these areas and isolates the impact of CCDSS on diagnostic test ordering. However, several factors limited our analysis. Importantly, we chose not to evaluate effects on patient outcomes because many systems also gave treatment suggestions that affect these outcomes more directly than does test ordering advice. Some systems gave recommendations for testing but their respective studies did not measure the impact on test ordering practice and were, therefore, excluded from this review [70][71][72]. Only 37% of trials assessed impact on test ordering activity as a primary outcome, and others may not have had adequate statistical power to detect testing effects.
We did not determine the magnitude of effect in each study, there being no common metric for this, but simply considered studies 'positive' if they showed a statistically significant improvement in at least 50% of diagnostic process outcomes. As a result, some of the systems considered ineffective by our criteria reported statistically significant findings, but only for a minority of secondary or non-prespecified outcomes. Indeed, the limitations of this 'vote counting' [73] are well established and include increased risk of underestimating effect. However, our results remain essentially unchanged from our 2005 review [19] and are comparable to another major review [74], and a recent 'umbrella' review of high-quality systematic reviews of CCDSSs in hospital settings [75].
Vote counting prevented us from assessing publication bias but we believe that, along with selective outcome reporting, publication bias is a real issue in this literature because most systems were tested by their own developers.
We observed an improvement in trial quality over time, but this may simply reflect better reporting after standards such as Consolidated Standards of Reporting Trials (CON-SORT) were widely adopted. Thirty-one percent of the authors we attempted to contact did not respond, and this may have particularly affected the quality of our extraction from older, less standardised reports.
While the number of RCTs has increased, the majority of these studies did not investigate or describe potentially important factors, including details of system design and implementation, costs and effects on user satisfaction, and workflow. Reporting such information is difficult under the space constraints of a trial publication, but supplementary reports may be an effective way to communicate these important details. One example comes from Flottorp et al. [62,63] who reported a process evaluation exploring the factors that affected the success of their CCDSS for management of sore throat and urinary tract infections. Feedback from practices showed that they were generally satisfied with installing and using the software, its technical performance, and with entering data. It also showed where they faced implementation challenges and which components of the intervention they used.
Our systematic review uncovered only three studies evaluating CCDSSs that give advice for the use of diagnostic imaging tests [35,61,64]. Effective decision support for ordering of imaging tests may be particularly relevant for the delivery of high quality, sustainable, modern healthcare, given the high cost and rapidly increasing use of such tests, and emerging concerns about cancer risk associated with exposure to medical radiation [11,76,77].

Conclusions
Some CCDSSs improve practitioners' diagnostic test ordering behavior, but the determinants of success and failure remain unclear. CCDSSs may be better suited to improve testing for some purposes than others, but more trials and more detailed descriptions of system features and implementation are needed to evaluate this relationship reliably. Factors of interest to innovators who develop CCDSSs and decision makers considering local deployment are under-investigated or under-reported. To support the efforts of system developers, researchers should rigorously measure and report adverse effects of their system and impacts on user workflow and satisfaction, as well as details of their systems' design (e.g., user interface characteristics and integration with other systems). To inform decision makers, researchers should report costs of design, development, and implementation.

Additional material
Additional file 1: Study methods scores for trials of diagnostic test ordering. Methods scores for the included studies.
Additional file 2: CCDSS characteristics for trials of diagnostic test ordering. CCDSS characteristics of the included studies.
Additional file 3: Study characteristics for trials of diagnostic test ordering. Study characteristics of the included studies.
Additional file 4: Results for CCDSS trials of diagnostic test ordering. Details results of the included studies.
Additional file 5: Costs and CCDSS process-related outcomes for trials of diagnostic test ordering. Cost and CCDSS process-related outcomes for the included studies.
Computerized Clinical Decision Support System (CCDSS) Systematic Review RBH was responsible for study conception and design; acquisition, analysis and interpretation of data; critical revision of the manuscript; obtaining funding; and study supervision. He is the guarantor. PSR acquired, analyzed, and interpreted data; drafted and critically revised the manuscript; and provided statistical analysis. JJY acquired, analyzed, and interpreted data; and critically revised the manuscript. JD acquired data and drafted the manuscript. DK analyzed and interpreted data, and critically revised the manuscript. JAM acquired, analyzed, and interpreted data; drafted the manuscript; and provided statistical analysis as well as administrative, technical, or material support. LWK and TN acquired data and drafted the manuscript. NLW acquired, analyzed, and interpreted data; drafted the manuscript; and provided administrative, technical, or material support, as well as study supervision. All authors read and approved the final manuscript.
Competing interests RBH, NLW, PSR, JJY, DK, JD, JAM, LWK, TN received support through the Canadian Institutes of Health Research Synthesis Grant: Knowledge Translation KRS 91791 for the submitted work. PSR was also supported by an Ontario Graduate Scholarship, a Canadian Institutes of Health Research Strategic Training Fellowship, and a Canadian Institutes of Health Research 'Banting and Best' Master's Scholarship. Additionally, PSR is a co-applicant for a patent concerning computerized decision support for anticoagulation, which was not discussed in this review, and has recently received awards from organizations that may benefit from the notion that information technology improves healthcare, including COACH (Canadian Organization for Advancement of Computers in Healthcare), the National Institutes of Health Informatics, and Agfa HealthCare Corp. JJY received funding to his institution through an Ontario Ministry of Health and Long-Term Care Career Scientist award; as well as funds paid to him for travel and accommodation for participation in a workshop sponsored by the Institute for Health Economics in Alberta, regarding optimal use of diagnostic imaging for low back pain. RBH is acquainted with several CCDSS developers and researchers, including authors of papers included in this review.