Descriptive statistics will be calculated for all variables of interest. Continuous measures will be summarized using means and standard deviations whereas categorical measures will be summarized using counts and percentages.
We hypothesize that the enhanced feedback intervention will lead to greater improvements in quality of care for patients with diabetes and/or IHD. The analysis to test this hypothesis will be carried out using multilevel hierarchical modeling (using the generalized estimating equation approach) to control for the effects of clustering as well as adjusting for multiple covariates, including the variables used in the minimization (baseline values for BP, LDL, the composite process score, and the number of patients with diabetes and/or IHD). Analysis will be performed on an intention-to-treat basis. No interim analyses are planned. Prior to analysis, other covariates will be assessed for the presence of multicollinearity; when the tolerance statistic value < 0.4 only one member of a correlated set will be retained for the model. Primary analyses will be conducted on patient level variables, combining patients with diabetes and/or IHD. Sub-group analyses will be performed on patients with only IHD, only diabetes, or both, to assess the same outcome variables.
The efficacy of the worksheet intervention will be assessed as a planned secondary analysis in two ways. First, we will test whether full completion of the worksheet resulted in improved outcomes. Full completion of the worksheet will be evaluated according to whether they declared specific and measurable goals, completed all sections of the action plan, and confirmed their commitment with their signature. Second, we will examine if physicians achieved greater improvements in the specific clinical topics that they chose to target using the worksheet.
All analyses will be carried out using the SAS Version 9.2 statistical program (SAS Institute, Cary, NC, USA).
Based on pilot data, systolic BP is expected have a standard deviation of 20 mmHg. A clinically important difference in systolic BP is estimated to be 7 mmHg; this is a difference often seen with initiation of treatment and is associated with reduction in cardiovascular risk . To have 80% power to find a difference in systolic BP of 7 mmHg using a two-sided unpaired t-test with α = 0.05 would require 258 total patients. To account for clustering, this sample size must be multiplied by a variance inflation factor (VIF) = [1 + (n - 1) × ICC], where n is the mean cluster sample size and ICC is the intra-class correlation coefficient, a measure of the degree of correlation within clusters . From baseline data, the mean cluster size (number of patients with diabetes and/or IHD in each practice) is approximately 328. Using a presumed ICC of 0.05 (based on ICCs seen in the literature), the VIF equals 17.4. Thus, 4,489 patients with diabetes and/or IHD are required to find a difference of 7 mmHg in BP, which equates to 13.7 clusters.
For LDL values, pilot data show a standard deviation of 0.90. Therefore, using the same calculations, the trial will have power to show an absolute difference in LDL of 0.32 mmol/L; this difference has been shown to be associated with reduction in cardiovascular risk . This type of small improvement in the management of these very common chronic diseases could translate into a large impact on the population scale.
Based on pilot data, the standard deviation for the composite process primary outcome is expected to be 1.61. For this outcome, pilot data were also used to find that the ICC was 0.0059, but to be conservative this can be rounded up to 0.01, giving a VIF of 4.28. Therefore, to show an absolute difference in the final composite process score of 0.3 (effect size 0.19), a sample size of 3,878 patients would be needed, which equates to 11.82 clusters.
Most of the power in cluster-trials comes from the number of clusters, rather than the number of patients. Therefore, dropout of a few participating physicians (or many of their patients) would only minimally decrease power. We do not expect dropout of entire clinics; clinic managers are committed to this project and have facilitated the recruitment of individual physicians at each clinic. However, even with a loss of two of the fourteen clinics, the same calculations indicate that we would have 80% power to find differences of 8 mmHG BP or 0.36 mmol/L LDL.
Qualitative analysis and process evaluation
Previous qualitative studies have isolated timeliness, customizability, and a non-punitive tone as key criteria for 'actionable' feedback . Evidence from the organizational literature suggests that the recipient must be satisfied with the feedback for it to be accepted and acted upon . Unfortunately, the literature has not provided clear direction regarding how to design feedback interventions targeted at family physicians to accomplish this goal.
One previous study has assessed eight Ontarian physicians' reactions to a 20-minute one-on-one performance assessment presentation based on chart audits and patient questionnaires, and found that physicians welcomed it . Even though the data were garnered directly from charts, the participants expressed concerns about government involvement in the performance improvement process. Another study revealed a general scepticism amongst physicians regarding quality improvement interventions based on secondary databases . Nevertheless, the MOHLTC, is now using administrative data to send all family physicians 'Diabetes Testing Reports' regarding their patients with diabetes . These reports from the government will provide far less data to the physicians (and nothing regarding patients with IHD) compared to the intervention described in this protocol. This context provides an opportunity to work with physicians receiving two types of diabetes feedback to explore the barriers and facilitators to Ontario family physicians' acceptance and utilization of performance feedback, and to examine the perceived actionability of various approaches to the design and delivery of feedback. While the ongoing government feedback will likely enrich the qualitative component of the study, we do not believe that it will impact the inferences made from the trial. All participants will receive the government feedback, but the government feedback does not explicitly encourage goal setting or action plans.
Semi-structured, individual interviews will be conducted using an interview guide, developed based on a review of the literature and consideration of the twelve domains described by Michie et al. to explain behaviour change in response to an intervention . We will use 'stratified purposeful sampling' ; we will select participants with those features believed to be relevant, not with the goal of probabilistic representativeness, but for informational representativeness. For instance, guideline adherence and quality of care may be inversely related to years in practice  and physician gender , so variety will be sought in these factors. Additionally, the participants will be chosen to represent varying levels of baseline performance, because this was found to be an important predictor in the Cochrane review. It is expected that saturation may be accomplished with approximately 12 interviews . The sample will be weighted with about two-thirds of participants having received the enhanced feedback. To account for time away from patient care, an honorarium will be offered for participation.
Interviews will be recorded and a transcription service used to produce verbatim electronic transcripts. These will be stored with encryption software on a password-protected computer drive. Identifying factors will be omitted. NVivo™ software will be used to assist with the data analysis. The framework analysis approach, as described Ritchie and Spencer  (and more succinctly by Pope et al.) aims to accurately reflect the original accounts of the participants through the use of inductive techniques, yet start out deductively with preset goals. As such, it represents an ideal foundation for analyzing qualitative data within a pragmatic, mixed methods study such as this one. For example, it has been successfully used in the past as part of a mixed method study investigating barriers and facilitators to guideline uptake in the ICU .
The identification of themes will be tracked along with dates of interpretations to provide an audit trail documenting the analysis. This is one way that trustworthiness in the results can be increased . Next, an index of themes will be developed by combining a priori objectives and issues identified in the literature with those raised by the participants and recognized through the readings. This process will occur after seven interviews have been completed and will be repeated in part by a second researcher (JB). It is thought that multiple coding provides a system of check and balances to ensure that all possible themes are given consideration . Disagreements will be settled through consensus and this process may lead to changes in the interview guide. At this point, disconfirming evidence to ensure saturation of themes will be sought from further participants through the use of snowball sampling (by asking participants to suggest colleagues that may have unique perspectives on feedback). In this way, elements of multiple coding and the constant comparative method will be incorporated. Therefore, the qualitative protocol will meet the criteria described by Kuper for judging qualitative research .