Audit and feedback and clinical practice guideline adherence: Making feedback actionable

Background As a strategy for improving clinical practice guideline (CPG) adherence, audit and feedback (A&F) has been found to be variably effective, yet A&F research has not investigated the impact of feedback characteristics on its effectiveness. This paper explores how high performing facilities (HPF) and low performing facilities (LPF) differ in the way they use clinical audit data for feedback purposes. Method Descriptive, qualitative, cross-sectional study of a purposeful sample of six Veterans Affairs Medical Centers (VAMCs) with high and low adherence to six CPGs, as measured by external chart review audits. One-hundred and two employees involved with outpatient CPG implementation across the six facilities participated in one-hour semi-structured interviews where they discussed strategies, facilitators and barriers to implementing CPGs. Interviews were analyzed using techniques from the grounded theory method. Results High performers provided timely, individualized, non-punitive feedback to providers, whereas low performers were more variable in their timeliness and non-punitiveness and relied on more standardized, facility-level reports. The concept of actionable feedback emerged as the core category from the data, around which timeliness, individualization, non-punitiveness, and customizability can be hierarchically ordered. Conclusion Facilities with a successful record of guideline adherence tend to deliver more timely, individualized and non-punitive feedback to providers about their adherence than facilities with a poor record of guideline adherence. Consistent with findings from organizational research, feedback intervention characteristics may influence the feedback's effectiveness at changing desired behaviors.


Background
Audit and feedback (A&F) has been used for decades as a strategy for changing the clinical practice behaviors of health care personnel. In clinical practice guideline (CPG) implementation, A&F has been used to attempt to increase guideline adherence across a wide variety of settings and conditions, such as inpatient management of chronic obstructive pulmonary disease (COPD) [1], test ordering in primary care [2,3], and angiotensin-converting enzyme (ACE) inhibitor and beta-blocker usage in cardiac patients [4]. Recent reviews, however, indicate that the effectiveness of A&F as a strategy for behavior change is quite variable. Grimshaw and colleagues [5] reported a median effect size of A&F of +7% compared to no intervention using dichotomous process measures, with effect sizes ranging from 1.3% to 16%; however, that same review reported non-significant effects of A&F when continuous process measures were used. Along similar lines, Jamtvedt and colleagues [6] reported a median adjusted relative risk of non-compliance of .84 (interquartile range (IQR): .76-1.0), suggesting a performance increase of 16% (IQR: no increase to 24% increase). Such studies attribute much of the variability in effect of the interventions to (often unrecognized) differences in the characteristics of the feedback used in the intervention and/or to the conditions under which A&F is more likely to be effective [6][7][8][9].
Earlier A&F research has suggested that the timing of feedback delivery can influence the resulting behavior change [10], as can the credibility of the feedback source [11][12][13]. Research from the organizational literature suggests a host of other potential explanatory phenomena as potentially affecting the effectiveness of feedback, such as its format (e.g., verbal vs. written), its valence (i.e., whether it is positive or negative) [14], and its content (e.g., whether it is task-focused or person-focused, individual or group based, normative or ipsative) [15]. Our own research has noted that facilities with higher CPG adherence (i.e., high performing facilities, or HPF) relied more heavily on chart data as a source of feedback and placed greater value on educational feedback approaches than facilities with lower guideline adherence (low performing facilities, or LPF) [16]. Taken together, these research findings indicate a need to further explore the characteristics of A&F and their impact on the desired behavioral change. Building on our previous work on barriers and facilitators of clinical practice guideline implementation, the purpose of the analyses reported here is to address this need in the A&F literature by exploring how HPF and LPF differ in the way they use clinical audit data for feedback purposes.

Measurement of clinical practice guideline adherence
Guideline adherence was measured via External Peer Review Program (EPRP) rankings. EPRP is a random chart abstraction process conducted by an external contractor to audit performance at all VA facilities on numerous quality of care indicators, including those related to compliance with clinical practice guidelines. We obtained data for fiscal year 2001 reflecting facility-specific adherence to guideline recommendations for six chronic conditions usually treated in outpatient settings: diabetes, depression, tobacco use cessation, ischemic heart disease, cardiopulmonary disease, and hypertension. Each condition is monitored via multiple performance indicators; in total, 20 performance indicators were used to describe compliance across the six conditions. Facilities were rank ordered from 1-15 (15 being the highest performer) on each performance indicator. HPF tended to rank consistently high across most disease conditions, and LPF tended to consistently rank low across most disease conditions; consequently, all 20 performance indicator ranks were summed to create an indicator rank sum (IRSUM) score [higher IRSUM scores indicate higher performance]. Facilities then were rank-ordered according to their IRSUM score to identify the three highest and the three lowest performing facilities, which were used for sample selection.

Site selection
The data herein were part of a larger data collection effort at 15 VA facilities designed to examine barriers and facilitators to CPG implementation [17]. These facilities were selected from four geographically diverse regional networks using stratified purposive sampling. To be invited to participate, facilities had to be sufficiently large to accommodate at least two primary care teams, each containing at least three MD providers. In order to address the present paper's specific research question, only the highest and lowest performing facilities (based on their IRSUM score described above) were included in the sample. Thus, the final sample for this paper consisted of employees at three HPF and three LPF.

Participants
One-hundred and two employees across six facilities were interviewed. Within each facility, personnel at three different organizational levels participated: Facility leadership (e.g., facility director, chief of staff), middle management and support management (e.g., quality assurance manager, primary care chief, information technology manager), and outpatient clinic personnel (e.g., physicians, nurses, and physicians' assistants). All three levels were adequately represented in the sample (see Table 1). No significant differences in the distribution of participants were found by facility or organizational level (χ 2 10 = 17.4, n.s.). Local contacts at each facility assisted in identifying clinical and managerial personnel with the requisite knowledge, experience, and involvement in guideline implementation to serve as potential participants. The study was locally approved by each facility's institutional review board (IRB), and participation at each facility was voluntary. An average of nine interviews occurred at each facility, for a total of 54 interviews at the six facilities (Table 1).

Procedure
Three pairs of interviewers were deployed into the participating sites during the spring of 2001. The interviewers were research investigators of various backgrounds (e.g., medicine, nursing, organizational psychology, clinical psychology, and sociology), with in-depth knowledge of the project, and most were involved with the project since its inception. None of the interviewers was affiliated with any of the participating facilities.
Each pair travelled to a given site for two days, where together the interviewers conducted one-hour, semi-structured interviews either individually or in small groups, depending on the participants' schedule and availability (see appendix for interview guide and protocol). Interviewers took turns leading the interview, while the secondary interviewer concentrated on active listening, notetaking, and asking clarifying questions. Interviewers discussed their own observations after each interview, and compiled field notes for each facility based on these observations and discussions. To minimize interviewer bias, interviewer pairs were (a) blinded to the facility's performance category, and (b) split and paired with different partners for their following site visit. All interviewers were trained a priori on interviewing and field note protocol.
Participants were asked how CPGs were currently implemented at their facility, including strategies, barriers and facilitators. Although interviewers used prepared questions to guide the interview process, participants were invited to (and often did) offer additional relevant information not explicitly solicited by the interview questions. The interviews were audio recorded with the participants' consent for transcription and analysis.

Data analysis
Interview transcripts were analyzed using a grounded theory approach [18,19]. Grounded theory consists of three analytic coding phases: open, axial, and selective codingeach is discussed below. Transcripts were analyzed using Atlas.ti 4.2, a commonly used qualitative data analysis software program [20].

Open coding
Automated searches were conducted on the interview transcripts for instances of the following terms: "feedback," "fed back," "feeding back," "report" and its variations (e.g., reporting, reports, reported), "perform" and its variations (e.g., performing, performed, performance), "audit" and its variations (e.g., auditing, audited, audits), and "EPRP". All word variations were captured via a truncated word search. The results were then manually reviewed for relevance, and only passages that specifically discussed feedback on individuals' adherence to clinical practice guidelines were included. Examples of excluded feedback references included feedback about the computer interface to information technology personnel, or anecdotal comments received from patients about provider adherence. This review resulted in 122 coded passages across the 54 interviews in the six facilities, for an average of 20 coded passages per facility.

Axial coding
In this phase of analysis, the passages identified during open coding are compared and thematically organized and related. This process resulted in identification of four characteristics of feedback from the data: timeliness, individualization, customizability and punitiveness. Each is discussed in more detail in the results section. Passages identified during open coding were categorized among these four properties and were organized by facility according to each of these properties. To ensure coding quality and rigor, code definitions were explicitly documented as soon as they emerged, and were continuously referred to throughout the coding process. Code assignment justifications were written for each passage as it was categorized, and coded passages were re-examined to insure that code assignments were consistent with code definitions. Patterns in the high performing facilities were compared among each other, searching for potential commonalities, as were patterns in the low-performing facilities. Once patterns were identified we relied on the corpus of field notes and informal observations from interviewers to provide interpretive context.

Selective coding
This phase of analysis involves integrating and refining the ordered categories from the axial coding phase into a coherent model or theory, usually based on a core or central category from the data. Based on the pattern of passages examined during axial coding, the "customizability" category emerged as the critical phenomenon around which a model grounded in the data was constructed, centering on the concept of actionable feedback. This is discussed in more detail in the results section.

Feedback characteristic patterns in high and low performing facilities
Four characteristics emerged from the data that described the nature of feedback received by clinicians at VA outpatient facilities. Table 2 summarizes the patterns of feed-back use across the six facilities. Each characteristic is discussed in more detail below.

Timeliness
This refers to the frequency with which providers receive feedback. Monthly or more frequent feedback reports were considered timely; quarterly or less frequent reports were considered untimely. We chose monthly feedback as the timeliness threshold because, given usual time intervals between appointments within VA, quarterly or less frequent feedback may not give the provider sufficient time to change his/her behavior in time for a patient's next appointment.
All facilities reported delivering feedback in a timely manner. However, as seen in Table 2, the evidence for timeliness of feedback is more mixed in the low-performing facilities than in the high-performing facilities. Conflicting reports of timely and untimely feedback delivery were observed in the low-performing facilities, whereas timely feedback delivery was clearly the dominant practice in high-performing facilities (all names and initials in quotations are fictitious, to protect participant confidentiality): --R.R., a support management employee in a HPF.

Individualization
This refers to the degree to which providers receive feedback about their own individual performance, as opposed to aggregated data at the team, clinic or facility level. As can be seen from the table, none of the low-performing facilities provided individualized feedback to their providers. In most cases, individual providers received facility level data from the EPRP report.
To be honest, most of the monitoring has really been done through the EPRP data collection. If one looks at some of the other guidelines, such as our COPD guideline, there we really don't have a formal system set up for monitoring that. So if one really looks at performance and outcomes, EPRP remains probably our primary source of those types of data.
--B.F., An executive level employee in a LPF.
In contrast, all three high-performing facilities reported providing individual level data to their providers:

Punitiveness
This concerns the tone with which the feedback is delivered. Two out of the three HPF explicitly reported that they approached underperforming providers in a nonpunitive way to help them achieve better adherence rates. --M.B., a support management employee in a HPF.
In contrast, employees at one LPF made explicit mention of the punitive atmosphere associated with low guideline adherence rates.

Sometimes I almost thought that it was in the overall presentation. If it wasn't so threatening and if it was interactive, and if it was, you can show me and we're going to work with you ... then you can get a better buy-in than you can if just saying, this is it. Do it! Heads will roll! We'll chop off one finger and then we'll go for a hand and a foot, kind of thing! --C.C., a clinician in a LPF.
We're down here in the trenches and if something goes wrong, somebody pounds on our head. Otherwise, they leave us alone.

--A.B., a clinician in a HPF.
For the rest of the facilities, however, there were insufficient reports in either direction to indicate the presence of a punitive or non-punitive approach to delivering feedback.

Customizability
This referred to the ability to view performance data in a way that was meaningful to the individual provider. No facilities reported having customizable reports or tools that allowed individual providers to customize their performance information to their needs. Some facilities, however, did report having some capability to customize (even though that capability was not being employed), as expressed by this respondent: --S.M., a clinician in a HPF.
These reports came exclusively from high-performing facilities; however, there were several reports, both from HPF and LPF, about the utility and desirability of having such information.

A model of actionable feedback
From the pattern of the feedback properties, a hierarchical ordering can be postulated to arrive at a model of actionable feedback (see Figure 1). At a minimum, feedback must be timely in order to be useful or actionable -one can easily imagine situations where the most thoughtful, personalized information would be useless if it were delivered too late. Next, feedback information must be about the right target. In this case, since clinical practice guideline adherence is measured at an individual level (i.e., the data from which adherence measures are constructed concern individual level behaviors such as ordering a test or performing an exam), clinician feedback should be about their individual performance rather than aggregated at a clinic or facility level to maximize its effectiveness [21,22]. Third is non-punitiveness -feedback delivered in a non-punitive way is less likely to be resisted by the recipient regardless of content [15,23,24], thus making it more actionable. Finally, customizability engages the individual with the data, making him/her an active participant in the sense-making process, rather than a passive recipient of information. The proposed hierarchical ordering is reflected in the data. As seen in table 2, four out of six facilities reported using EPRP data to deliver timely feedback to their providers. The HPF provided individualized feedback to their providers, whereas the LPF indicated that they used facility level, rather than provider-specific reports as a feedback source. Only the top two performing facilities specifically indicated that they approached feedback delivery non-punitively, whereas no evidence of this existed either way in the other facilities (save for one LPF which reported explicit instances of punitive feedback delivery). No facilities reported providing their clinicians with the ability to customize their own individual performance data, although all facilities expressed a desire for this capability. Thus, as we move up the facility rankings from the lowest to the highest performer, more of the properties appear to be present. This hierarchical ordering thus leads us to postulate the underlying dimension of "actionable feedback."

Discussion
We employed a qualitative approach to study differences in how high-and low-performing facilities used clinical audit data as a source of feedback. HPF delivered feedback in a timely, individualized, and non-punitive manner, whereas LPF were more variable in their timeliness, and relied on more standardized facility-level reports as a source of feedback, with one facility reporting a punitive atmosphere. The concept of actionable feedback emerged as the core category in these data, around which timeliness, individualization, non-punitiveness, and customizability can be hierarchically ordered.
The emergent model described above is consistent with existing individual feedback theories and research. Feedback intervention theory (FIT) [15] posits that in order to have a positive impact on performance, feedback should be timely, focused on the details of the task, particularly on information that helps the recipient see how his/her behavior should change to improve performance (correct solution information), and delivered in a goal-setting context. These propositions are consistent with empirical A Model of Actionable Feedback Figure 1 A Model of Actionable Feedback. *The use of the term optimal to describe the effect on performance is relative -by this we mean optimal, given the variables in the emergent model. There are certainly other factors which could affect performance, although they are not exhibited here.
research. Timely feedback has long been positively associated with feedback effectiveness in the organizational literature, [13] as has the need for individualized feedback [21,22]. Feedback delivered in a non-punitive way has been empirically linked to increased likelihood of feedback acceptance [25], a critical moderator of the relationship between feedback and performance [26]. Finally, although the effect of customizable feedback on feedback acceptance and subsequent performance has not been directly examined in the literature, this relationship can be inferred from related research and theory. Research indicates that clinicians want to access and interact with computerized clinical data more naturally and intuitively than is currently offered by EMR systems [27]. FIT proposes that feedback interventions that direct attention toward task details tend to improve performance. The ability of the provider to customize his or her specific performance data into something that is meaningful to him/ her is likely to direct attention to the details of the performance measure in question, thereby increasing the likelihood of subsequent performance improvements. This research has implications for both research and practice. First, it suggests that A&F is not an all-or-nothing intervention: how feedback is delivered plays an important role in its effectiveness. Thus, some of the mixed findings in the A&F literature [5,6] could be partially explained by differences in feedback characteristics. Future research should consider such characteristics when designing A&F interventions.
Second, from a practice perspective, this research reminds administrators that A&F, whether for administrative or developmental purposes, is more than simple reporting of performance data. Feedback needs to be meaningful in order for recipients to act on it appropriately. Electronic tools such as VA's Computerized Patient Records System (CPRS) can help provide clinicians timely, individualized and customizable feedback -if used correctly. For example, CPRS is capable of generating individualized, customized reports, however, this capacity is not widely known, and thus remains underused. VA is already taking steps to make this capability better understood, with a re-engineering of CPRS to make template creation and report generation a simpler task for the user, and by offering training on the use of these tools system-wide [28]. However, whether feedback is punitively delivered is strictly a human matter; administrators should take care to adopt an educational, developmental perspective to feedback delivery. All of this, of course, assumes that the data fed back to the clinician are valid and reliable. Issues of sample size (whether sufficient cases of a given indicator exist to calculate a stable estimate for an individual provider), reliability, and appropriateness of behaviours and outcomes as indicators of quality (e.g., Does the clinician really have the power to control a patient's blood pressure level if the patient consistently refuses to follow his/her plan of care?) should be carefully considered when developing and selecting behaviours and outcomes as indicators of clinician performance for feedback purposes.

Limitations
First, the study's relatively small sample size of six facilities, three in each performance condition, potentially limits the transferability of our results. VA facilities tend to be highly variable across multiple dimensions, and thus this study's findings might not apply to other VA facilities, or to outpatient settings outside the VA. However, two features of this research make us guardedly optimistic about the transferability of the findings. The six sites varied significantly by size, geography, facility type (i.e., tertiary vs. general medicine and surgery), and primary care capabilities; this variation did not significantly differ between HPF and LPF. The presence of a pattern of feedback characteristics, despite the variability in site characteristics, supports the idea that this pattern may be transferable to other facilities. Additionally, the feedback characteristics emergent from the data are consistent with existing research and theory on feedback characteristics, which suggests that our model could be transferable not only to other VA clinics, but potentially to other outpatient settings as well.
Second, the density of reports (20 passages per facility) is somewhat low, which potentially limits the credibility of the findings. However, participants were not explicitly interviewed on the subject of performance feedback, but rather on more general strategies and facilitators of clinical practice guideline implementation. Given the large domain of other available strategies and facilitators that participants mentioned [29], the consistency with which the feedback theme repeats itself across the six facilities strengthens the credibility of these findings, despite the low report density.
Finally, although the emergent feedback characteristics were consistent with previous research, we did not review or validate our findings with the study participants, as data collection and analysis did not occur concurrently. This is an inherent limitation of secondary data analysis and of our reliance on data collected to gain insight into the facilities' CPG implementation strategies and barriers rather than feedback characteristics. Future research should consider both qualitative and quantitative replication of the model.

Conclusion and future directions
We conclude that facilities with a record of successful guideline adherence tend to deliver more timely, individualized, and non-punitive feedback to providers about their individual guideline adherence than facilities with a poor record of guideline adherence. Consistent with organizational research, feedback characteristics may influence the feedback's effectiveness at changing desired behaviors. Future research should more fully explore the nature and effects of feedback characteristics on their effectiveness in clinical settings, the utility of customizing clinical audit data so that it is meaningful to individual providers, and the effects of meaningful feedback on subsequent performance, especially in comparison to or conjunction with a financial incentive or similar pay-forperformance arrangement. Meanwhile, administrators should take steps to improve the timeliness of individual provider feedback, and deliver feedback from a perspective of improvement and professional development rather than one of accountability and punishment for failure. Interviews were scheduled to be one hour in length, with one half-hour between interviews for interviewers to compile notes on the completed interview and conduct administrative tasks (e.g., labeling the interviews on the memory card, recording interviewee information in a participant record). In some cases, the interviews went somewhat over the one-hour mark, but never more than approximately 10 minutes. In a very few instances, the participants' comments were concise enough that the interview ended before the one-hour mark. However, most interviews lasted approximately one hour.