Effect of a web-based audit and feedback intervention with outreach visits on the clinical performance of multidisciplinary teams: a cluster-randomized trial in cardiac rehabilitation

Background The objective of this study was to assess the effect of a web-based audit and feedback (A&F) intervention with outreach visits to support decision-making by multidisciplinary teams. Methods We performed a multicentre cluster-randomized trial within the field of comprehensive cardiac rehabilitation (CR) in the Netherlands. Our participants were multidisciplinary teams in Dutch CR centres who were enrolled in the study between July 2012 and December 2013 and received the intervention for at least 1 year. The intervention included web-based A&F with feedback on clinical performance, facilities for goal setting and action planning, and educational outreach visits. Teams were randomized either to receive feedback that was limited to psychosocial rehabilitation (study group A) or to physical rehabilitation (study group B). The main outcome measure was the difference in performance between study groups in 11 care processes and six patient outcomes, measured at patient level. Secondary outcomes included effects on guideline concordance for the four main CR therapies. Results Data from 18 centres (14,847 patients) were analysed, of which 12 centres (9353 patients) were assigned to group A and six (5494 patients) to group B. During the intervention, a total of 233 quality improvement goals was identified by participating teams, of which 49 (21%) were achieved during the study period. Except for a modest improvement in data completeness (4.5% improvement per year; 95% CI 0.65 to 8.36), we found no effect of our intervention on any of our primary or secondary outcome measures. Conclusions Within a multidisciplinary setting, our web-based A&F intervention engaged teams to define local performance improvement goals but failed to support them in actually completing the improvement actions that were needed to achieve those goals. Future research should focus on improving the actionability of feedback on clinical performance and on addressing the socio-technical perspective of the implementation process. Trial registration NTR3251


Background
The number of chronically ill patients is increasing, requiring hospitals to reconsider their role and responsibility in chronic disease management [1,2]. At the same time, health organizations are under public pressure to increase their accountability and to deliver optimally efficient and effective care [3]. The field of cardiac rehabilitation (CR) typically faces these challenges. CR offers cardiovascular disease patients a need-based, cost-effective, multidisciplinary approach to regain physical capacity, improve psychosocial condition, achieve lifestyle changes, and reduce future cardiovascular risk [4][5][6][7]. The efficacy of CR has been studied extensively [6,8] and was recently shown to be associated with a substantial survival benefit [9]. However, lack of guideline concordance limits the ability of CR to reach its full potential [10][11][12]; computerized clinical decision support (CDS) has previously been shown to have the potential to improve this [13]. However, considerable non-concordance remained due to organizational and procedural barriers not being addressed because individual CDS users considered them beyond their own influence and responsibility [14]. This finding stressed the need for an intervention specifically directed at decision-making processes at the team rather than at an individual level. This coincides with the approach advised by the American Heart Association (AHA) [11], advocating that entire multidisciplinary CR teams should implement coordinated, joint efforts to reinforce the importance of outpatient CR among healthcare systems, providers, and the public [11].
The AHA also promotes the use of quality indicators to monitor and improve clinical performance, for example using audit and feedback (A&F) strategies. A&F involves providing professionals with periodic objective summaries of their clinical performance [15] and is considered to be effective because it can support professionals in assessing their own clinical performance [15]. Previous studies suggested A&F to be the most effective if feedback is provided by a supervisor or colleague, more than once, both verbally and in writing; if baseline performance is low; if it includes explicit goals and an action plan; and if combined with educational meetings [15][16][17][18]. Other suggested effect modifiers are the perceived quality of the data underlying the feedback, motivation, and interest of the recipient, organizational support for quality improvement (QI), and the way in which performance targets or benchmarks are derived [19].
We used these successful characteristics described in the literature [15][16][17][18][19] to guide the development of a multifaceted A&F intervention to improve clinical performance in the field of CR in the Netherlands [20]. To further maximize its effect, our intervention specifically focused on engaging multidisciplinary teams and their managers rather than individual professionals [20]. The objective of this study was to assess the effectiveness of the multifaceted A&F intervention in a cluster-randomized trial among CR centres in the Netherlands. We measured effects on 11 care processes and six patient outcomes for CR (primary outcomes). Our secondary outcomes included overall performance, data completeness, and difference in guideline concordance with respect to prescribing CR therapies.

Study design
Centres participating in the trial were randomized to receive feedback limited to either psychosocial rehabilitation (disease-specific education and lifestyle modification; study group A) or physical rehabilitation (exercise training and relaxation and stress management training; study group B). In this way, both groups received an intervention, whilst serving as each other's control. We refer to the study protocol for further details of the experimental design [20].

Eligibility of participants
Dutch CR centres working with an electronic patient record (EPR) system for CR were eligible to participate. Multidisciplinary CR teams included cardiologists, physical therapists, nurses, psychologists, dieticians, social workers, and/or rehabilitation physicians. Teams were required to allocate dedicated time for study activities from at least the local CR coordinator (usually a specialized nurse), a cardiologist, one professional from another discipline, and the centre's manager. Recruitment took place from July 2012 until December 2013. All CR patients who started rehabilitation in one of the participating centres during the study period were eligible for inclusion in our analyses. CR is recommended for all patients who have been hospitalized for an acute coronary syndrome (ACS) and for those who have undergone a cardiac intervention [5,21]. Patients entering outpatient CR in the Netherlands are offered a comprehensive, individualized rehabilitation programme with a typical duration of 6-12 weeks, consisting of one or more of the four group-based therapies supplemented by individual counselling when indicated. Consistent with international guidelines, the Dutch guidelines for CR [22,23] state that the individualized programme should be based on a need assessment procedure where data items concerning the patient's physical and psychosocial condition are gathered.

Intervention
Our intervention comprised three main components: (i) periodic performance feedback reports, (ii) goal setting and action planning, and (iii) educational outreach visits. To facilitate the first two components, we developed a web-based system called 'CARDSS Online' [24]. Participants were requested to upload their anonymized patient data quarterly, after which the system created new feedback reports. Within days after new reports were released, educational outreach visits were held with the local multidisciplinary team to reflect on the feedback, to set goals and plan actions, and to update existing action plans following a continuous A&F improvement cycle [25]. Participants were offered four iterations of this cycle facilitated by a researcher through outreach visits, as well as up to two additional iterations facilitated via telephone.

Feedback reports
Feedback reports available through CARDSS Online consisted of performance scores on a set of indicators; each indicator represented a care process or patient outcome for CR. The performance scores were accompanied by benchmark information represented by 'traffic light' coloured icons (Fig. 1). Red, yellow, or green colours were assigned based on the centre's performance score relative to peer performance using the concept of achievable benchmarks [20]. A grey colour was assigned if there were insufficient data (<10 patients) available to compute a score. The processes and outcomes in the indicator set were defined in close collaboration with a panel of CR professionals [26]. The eight indicators related to psychosocial rehabilitation were only shown to centres in group A, whereas group B only saw the nine indicators related to physical rehabilitation (Appendix 1). All these processes and outcomes were measured as dichotomous variables at patient level. We also fed back nine indicators related to general CR processes (four patients and five centre levels) to centres in both groups (see Appendix 2).

Goal setting and action planning
After receiving a feedback report, participants could use CARDSS Online to develop a QI plan by selecting indicator areas for improvement (related to quality indicators in the feedback report). For each area they targeted, they could specify the problem and its presumed causes, set an improvement goal, and plan concrete actions on how to reach that goal. Actions were assigned to specific team members and were set with a due date. At each A&F iteration, the QI plan was updated by marking actions as 'completed' , 'cancelled' , or 'in progress' , by planning new actions, or by adding new areas for improvement to the plan. If all actions for a specific improvement area were completed, that area was removed from the QI plan.

Educational outreach visits
Educational outreach visits were conducted by one investigator (MvE or WG; both with a non-clinical, QI background) and typically lasted 2.5 h. All members of the local multidisciplinary team were invited to attend this session, and the visits always had the same structure. First, the investigator gave a short presentation to explain the purpose of the visit and the intervention. Next, the team discussed and reflected upon their most recent feedback report and created or updated their QI plan. The role of the investigator was to answer questions about the feedback (e.g. patient inclusion or exclusion criteria for specific indicators), help teams to plan actions that were achievable within the study period, and upon request provide lists of patients who had not received the recommended clinical practice or experienced outcome of interest for a specific indicator.

Outcome measures
Our primary outcome was the difference in improvement between the two study groups with respect to each of the 17 indicators (11 care processes and six patient outcomes) for which exactly one study group received feedback. First, we evaluated improvement per indicator at patient level; additionally, we compared, at centre level, overall performance (number of indicators at or above benchmark level) and data completeness (number of indicators for which centres recorded complete data) at baseline and 1 year of follow-up.
Secondary outcome measure was the difference in change in guideline concordance with respect to prescribing the four main CR therapies. Concordant prescribing was defined as prescribing a therapy for patients who were indicated to receive it and not prescribing a therapy for patients who were not indicated to receive it according to the Dutch clinical CR guidelines [22,23]. Additionally, we measured change in concordance with respect to actual attendance of these four therapies by patients.

Patient involvement
To ensure that patients' perspectives were reflected in the intervention, patients were involved in the development of the quality indicators that were used to give feedback to CR professionals and also served as primary outcome measures of the study [26].

Data collection and validation
We used routinely collected patient data from centres' EPRs. At the time we conducted our study, two commercial vendors of EPR systems for CR were available in the Netherlands. Both systems incorporated the Dutch CR guidelines [22,23] and followed the same data model. Data collection was structured as part of the needs assessment procedure and fed into the CDS module providing prescription recommendations for each of the four CR therapies [27].
Centres participated for a minimum of 1 year, with data collection ending in December 2014. At the end of the trial, we performed an audit to assess data quality and completeness by comparing our study database to an independent data source (typically the centres' local patient clinic schedules). From the analyses for each of the four CR therapies, we omitted centres with more than 25% discrepancies between the study database and the independent data source for prescribing that therapy. For further details, we refer to the study protocol [20].

Sample size
To calculate the minimally required number of centres participating in the trial, we used data from a previous trial [7]. Calculations were based on the normal approximation to the binomial distribution, using a type I error risk (alpha) of 5%, and 80% power. Based on the results, we aimed to include at least 19 centres that would treat 350 CR patients, on average, during the study period of 1 year. Further details can be found in [20].

Cluster randomization and allocation
Randomization of centres was stratified by size (more versus less than 30 patients starting treatment per month) (Fig. 2). Per stratum, we generated a randomization scheme with randomly assigned block sizes of either two or four centres using dedicated software. This scheme was concealed to those enrolling and allocating centres [20]. Due to the nature of the intervention, it was not possible to blind participants, or those involved in providing the intervention, to allocation.

Statistical analysis
To assess the effect of the intervention, we performed separate mixed effects logistic regression analyses [13,28] for each of the care processes and patient outcomes (primary outcome) and four therapies (secondary outcome) for which exactly one study group received feedback. To this end, we included covariates 'study group' , 'time' , and 'study group × time'. We focused on the interaction term to assess the difference in change over 1-year study follow-up between the two groups-that is, the effect of the intervention-because we expected clinical performance to improve gradually as a result of our intervention. We used random effects to adjust for the variation in baseline performance between centres (random intercept for each centre) and the variation in effect over time (random slope for time). To adjust for differences in case mix, we included age, gender, and indication for CR at patient level and size (average weekly patient volume) and type (specialized rehabilitation centre or part of a university or teaching hospital versus part of a non-teaching hospital) at centre level as covariates.
To assess the effects on the overall performance (number of indicators at or above benchmark level) and data completeness (number of indicators for which centres recorded complete data), we used mixed effects linear regression. Per centre, we assessed for both the change in percentage between baseline and at 1-year study follow-up. Additionally, we explored secular trends in the four patient-level general processes, that were shown to both groups, by performing mixed effects logistic analyses while withholding 'study group' and 'study group × time' as covariates. Changes in the five centre-level processes were assessed by counting the number of such processes that were in place at baseline and follow-up. Finally, we performed separate mixed effects logistic regression analyses to assess in concordance with guideline recommendations for attendance of each of the four CR therapies as measured at the end of the programme.
We used Multiple Imputation by Chained Equations (MICE) to handle missing data on outcomes and confounders [29]. To verify the robustness of our findings, we performed a sensitivity analysis with complete cases only. All analyses were performed using R version 3.1.2 (R Foundation for Statistical Computing; Vienna, Austria).

Participants
Eighteen of 22 eligible CR centres accepted our invitation to participate in the trial. Our randomization scheme assigned ten centres to group A (receiving feedback with respect to psychosocial rehabilitation) and eight to group B (receiving feedback with respect to physical rehabilitation). However, due to an algorithmic error in our software, two centres in group B received the intervention associated with group A, leading to an eventual distribution of 12 centres in group A and six in group B (see Fig. 1). Table 1 shows the baseline characteristics of centres and patients. The distribution of all characteristics, except for centre type, was equal between the groups. During the study period, a total of 14,847 patients started CR in the participating centres. Table 2 shows detailed information on how, and to what extent, the main components of the A&F intervention were implemented in the participating centres. There were no differences between the study groups in their mean study period, size of the local multidisciplinary teams, and attendance to the educational outreach visits. Local multidisciplinary teams consisted of 7.1 members on average. isits were attended by 5.1 (74%) members on average, but we observed a decrease in attendance over time from 5.83 (84%) members in the first visit to 5.1 (72%) in the fourth. Among the attendants, there were typically a nurse, physiotherapist, cardiologist or manager, and a psychologist or social worker; sometimes, also, a dietician, sports physician, or medical secretary attended. Cardiologists and/or managers could typically only attend for 30 to 60 min. As it turned out challenging to plan visits at a time during which sufficient team members were available, the average duration for A&F iterations was 4.0 months (SD 1.4) instead of the intended 3 months. For the same reason, one centre in group A completed only three A&F iterations instead of the per protocol minimum of four iterations. There were no differences between study groups in the number of indicators that teams selected into their QI plan nor in the number of actions they planned for each of those indicators. The mean number of indicators in a QI plan decreased from 8.0 (SD 2.4) during the first A&F iteration to 5.0 (SD 3.2) in the final iteration. Teams reported to have achieved their improvement goals for 1.8 indicators per A&F iteration on average. The complete study population achieved  Table 3 shows the effect on clinical performance as measured by the 11 care process and six patient outcome indicators. For none of the care processes nor patient outcomes in our study, the intervention led to significant differences in performance between study groups. We observed a positive secular trend for indicator 8 'Patients receive a discharge letter with remaining lifestyle goals' in both the control (OR 5.39; 95% CI 2.14 to 13.56) and intervention group (OR 4.61; 95% CI 2.29 to 9.30). In the control group, we observed positive secular trends for indicators 12 'Completion of stress management and relaxation therapy' (OR 2.47; 95% CI 1.25 to 4.88 per year), 14 'Improvement in exercise capacity' (OR 1.28; 95% CI 1.11 to 1.47), and 17 'Vigorously active lifestyle norm met at discharge' (OR 1.29; 95% CI 1.15 to 1.45). We found negative trends for indicator 11 'Exercise training completed' in the control (OR 0.44; 95% CI 0.27 to 0.74) and for indicator 4 'Disease specific education completed' in the intervention (OR 0.44; 95% CI 0.29 to 0.67) group. Our sensitivity analysis for clinical performance showed similar results (see Appendix 1); we found a positive secular trend in the control group for indicator 17 and a negative trend for indicator 4 in both the control and intervention group and no significant differences in performance between groups.

Effects on clinical performance
Overall clinical performance did not significantly improve in centres (effect 4.1% per year; 95% CI −1.13 to 8.53). Data completeness improved by 4.5% per year (95% CI 0.65 to 8.36). Appendix 2 shows the secular trends in the five general processes that were shown to both groups. We found a positive effect for indicator 21 'Cardiologist and GP receive a report after CR' (OR 3.42; 95% CI 2.24 to 5.24) and a negative effect for indicators 18 'Median time between hospital discharge and needs assessment procedure' (OR 0.7; 95% CI 0.54 to 0.91) and 20 'Rehabilitation evaluated at discharge' (OR: 0.43; 95% CI 0.28 to 0.64). In the complete case analysis (Appendix 4), we did not find any significant effect. Table 4 shows the effects on guideline concordance with respect to each of the four therapies. In the control group, we observed a positive concordance trend for prescribing exercise therapy (OR 2.52; 95% CI 1.03 to 6.16). We found negative trends for prescribing disease-specific education (OR 0.62; 95% CI 0.43 to 0.89) and lifestyle modification (OR 0.37; 95% CI 0.15 to 0.92) in the intervention group. Concerning concordance with respect to therapy attendance, we found a negative trend in the control group for relaxation and stress management (OR 0.44; 95% CI 0.23 to 0.83) and in the intervention group for disease-specific education (OR 0.51; 95% CI 0.32 to 0.81). For none of the therapies, the intervention led to significant differences in concordance trends, for neither prescription nor attendance. Overall, concordance rates for prescription of all four therapies were higher compared to attendance rates. Concordance rates were highest for prescribing relaxation and stress management (85.1%) followed by education (77.8%). The lifestyle modification showed the lowest concordance rates, for both prescription (44.4%) and attendance (37.4%). Our

Discussion
We evaluated an A&F intervention in a large clusterrandomized trial among 18 CR centres and 14,847 patients. Our intervention modestly improved data completeness and engaged teams to set improvement goals, but it yielded no improvement of clinical performance by multidisciplinary CR teams. A Cochrane review of 140 randomized A&F trials showed a median effect of 4.3% improvement in quality of care, with a minority of studies showing a strong positive effect [15]. In the review, the authors identified characteristics that may enhance A&F effectiveness, such as the use of educational outreach visits, providing feedback multiple times, and involving the entire team in action planning and goal setting activities [15][16][17][18][19]. We incorporated all of these characteristics in our intervention. Additionally, we built on the findings of an extensive barrier analysis which identified the need to target decision-making by multidisciplinary teams in order to increase guideline concordance in the field of CR [14]. The resulting intervention encouraged multidisciplinary teams to develop and revise (up to five times) improvement plans based on indicatorbased performance that was provided in quarterly feedback reports. Less than 20% of similar studies use iterative cycles of change, and only 14% of them repeatedly use data over time [30]. However, despite our efforts to design an effective intervention, the intervention did not improve clinical performance. Apparently, there are other, unidentified factors that are equally or more important to achieve change in clinical practice.
The authors of the Cochrane review recommended that the development and evaluation of A&F interventions should be informed by the explicit use of theory [15]. Although we designed our study before this recommendation was published, our intervention is well-founded in the Model for Improvement [25]. The model encourages teams improving their practice following plan-do-study-act (PDSA) cycles. This also fits within Control Theory [31], which poses that A&F effects are achieved through a mechanism of three steps: (i) performance feedback convinces health professionals that change is necessary and to set improvement intentions, (ii) intentions are translated into action, and (iii) action impacts the outcome of interest. We contributed to a more in-depth understanding of Control Theory by performing a quantitative process evaluation alongside our trial [32]. While performance scores and benchmark comparisons clearly influenced health professionals' improvement intentions, a substantial amount of feedback information was lost in the translation to improvement intentions because professionals disagreed with the benchmarks, deemed improvement unfeasible, or did not consider the indicator an essential aspect of care quality [32]. This consequently impeded intentions to improve practice, i.e. the first step in the mechanism posed by Control Theory, and thus explains part of the ineffectiveness of our intervention. The current study further revealed that another part of the ineffectiveness can be explained by the fact that professionals were not able to

0.147
We used Multiple Imputation by Chained Equations to handle missing data. Baseline period: first 3 months of study period; follow-up period: complete study period minus the baseline period. Performance trends: odds ratios associated with a 1-year study follow-up, adjusted for patients' age, gender, indication for CR, and centres' type and size. translate their intentions into completed actions, i.e. the second step of the mechanism, before the study end. The electronic nature of our intervention enabled us to monitor and measure improvement processes at the centre level: teams selected areas for improvement and planned and managed their QI activities within the same web-based system through which they were provided with performance feedback. By doing so, teams were able to develop a QI plan entirely tailored to their local organization. Nevertheless, we found that our intervention successfully encouraged teams to define local performance improvement goals, but it largely failed to support them with actually completing the actions needed to achieve those goals: 79% of intended actions remained uncompleted until the end of the study. Previous research in the field of intensive care [33,34] and general practice [35,36] suggested that failures to complete improvement actions may be due to organizational barriers such as competing priorities or a lack of leadership or professional barriers such as a lack of individual skills or knowledge to take effective improvement actions. Our A&F intervention did not completely solve these barriers. The professional barriers may be reduced by extending the intervention with ready-to-use improvement tools [16,33]. The few A&F studies that incorporated such support did so in different ways: through facilitated group discussions to reflect upon the feedback and identify improvement strategies [37] and by including suggestions in the feedback reports for how to address deficiencies in practice [38]. As the surplus value of adding supportive improvement tools to A&F interventions has not yet been investigated, we suggest that this be a focus of future research. Examples of persisting organizational barriers within our study context were related to lacks of resources (e.g. budget ceilings imposed by insurers), competing interests between managers from different clinical disciplines, and poor attendance of clinical leadership (cardiologists and managers) at outreach visits. This was supported by a qualitative study in the context of our intervention, which revealed that participants considered team commitment and organizational readiness important yet difficult factors to operationalize [39]. Recently, socio-technical frameworks have been proposed to design and evaluate A&F interventions, such as the Triangle Model [40], Team Strategies and Tools to Enhance Performance and Patient Safety (TeamSTEPPS) framework [41], the Systems Engineering Initiative for Patient Safety (SEIPS) model [42] and the 8-dimension socio-technical model [43]. Such socio-technical models typically approach the implementation process as consisting of multiple components that continuously interact with and change each other, including people, teams, tasks, tools and technologies, underlying organizational conditions, and the surrounding context. To address this complexity, future studies could consider using sociotechnical models as the underlying theoretical framework to guide the development, implementation, and evaluation of A&F interventions.
A limitation of our study is that we implemented our A&F intervention shortly after centres had started working with a new EPR. Although the reuse of routinely collected EPR data has minimized data collection burden for participating clinicians, the EPR implementation may have conflicted with the time and resources available for working on actual performance improvement. Second, some outcome measures showed little room for improvement (i.e. ceiling effects), making them less likely to change significantly over the course of the study, and as such less suitable for assessing the effectiveness of our intervention. Other outcome measures might have been difficult to improve because they relied on patients' compliance with prescribed therapies, which is a well-known barrier to guideline concordance [44]. Third, there may have contamination between groups due to an overall increase of awareness of clinical performance and quality improvement. This may have resulted in professionals working on other aspects of CR care, even though they had been randomized to target only psychosocial (group A) or physical rehabilitation (group B). Fourth, we included one centre less in our study sample than estimated in our sample calculations. Although we exceeded the estimated required number of patients per centre, we cannot rule out lack of statistical power as a potential explanation for finding no significant effects. Finally, two centres were incorrectly assigned to group A due to an algorithmic error in our software. However, since there were no differences in baseline characteristics between study groups, we believe the unequal distribution of centres did not influence our final results.

Conclusions
We designed a web-based A&F intervention in the field of CR guided by an extensive analysis of barriers in the field and by incorporating characteristics proven successful in the A&F literature. The intervention had no effect on the measured care processes, patient outcomes, or guideline concordance. Our intervention did modestly increase data completeness and engaged teams to define local performance improvement goals but failed to support them in actually completing the improvement actions that were needed to achieve those goals. Future studies should focus on improving A&F interventions and their evaluation, for instance by improving the actionability of feedback on clinical performance and by addressing the socio-technical perspective of implementation processes more extensively.