Protocol for the process evaluation of a complex intervention designed to increase the use of research in health policy and program organisations (the SPIRIT study)

Background Process evaluation is vital for understanding how interventions function in different settings, including if and why they have different effects or do not work at all. This is particularly important in trials of complex interventions in ‘real world’ organisational settings where causality is difficult to determine. Complexity presents challenges for process evaluation, and process evaluations that tackle complexity are rarely reported. This paper presents the detailed protocol for a process evaluation embedded in a randomised trial of a complex intervention known as SPIRIT (Supporting Policy In health with Research: an Intervention Trial). SPIRIT aims to build capacity for using research in health policy and program agencies. Methods We describe the flexible and pragmatic methods used for capturing, managing and analysing data across three domains: (a) the intervention as it was implemented; (b) how people participated in and responded to the intervention; and (c) the contextual characteristics that mediated this relationship and may influence outcomes. Qualitative and quantitative data collection methods include purposively sampled semi-structured interviews at two time points, direct observation and coding of intervention activities, and participant feedback forms. We provide examples of the data collection and data management tools developed. Discussion This protocol provides a worked example of how to embed process evaluation in the design and evaluation of a complex intervention trial. It tackles complexity in the intervention and its implementation settings. To our knowledge, it is the only detailed example of the methods for a process evaluation of an intervention conducted as part of a randomised trial in policy organisations. We identify strengths and weaknesses, and discuss how the methods are functioning during early implementation. Using ‘insider’ consultation to develop methods is enabling us to optimise data collection while minimising discomfort and burden for participants. Embedding the process evaluation within the trial design is facilitating access to data, but may impair participants’ willingness to talk openly in interviews. While it is challenging to evaluate the process of conducting a randomised trial of a complex intervention, our experience so far suggests that it is feasible and can add considerably to the knowledge generated. Electronic supplementary material The online version of this article (doi:10.1186/s13012-014-0113-0) contains supplementary material, which is available to authorized users.


Background
There is global interest in ensuring that health policy and program development is informed by reliable research [1]. Previous studies have improved our understanding of the constraints that policymakers and program developers face in their efforts to use research [2][3][4], and tools have been developed to support these efforts [5], but there is still little evidence about what strategies are most effective in building individual or organisational capacity to use research more effectively [6,7]. Even less is known about how and why such strategies work and what makes them effective in one context but not another [8]. SPIRIT (Supporting Policy In health with Research: an Intervention Trial) was developed to address this pressing need (see Additional file 1 for a glossary of terms used in this article) [9].

Supporting Policy In health with Research: an Intervention Trial (SPIRIT)
SPIRIT is testing the effects of a year-long multicomponent intervention designed to increase the capacity of health policy agencies to use research. SPIRIT uses a stepped wedge cluster randomised trial design. Six government agencies that develop and implement state-wide or national health policies and programs located in Sydney, Australia, will receive the intervention. Between 15 to 60 staff are expected to participate at each site. All agencies receive six intervention components: (i) audit, feedback and goal setting; (ii) a leadership program; (iii) organisational support for research; (iv) the opportunity to test systems for accessing research and reviews; (v) research exchanges; and (vi) educational symposia for staff. The development of these components was informed by change principles, as shown in Table 1.
The components include content that is tailored to suit the interests and needs of each agency but have standardised essential elements (i.e., the hypothesised 'active ingredients' of the intervention). We assume that, for the intervention to be optimally effective, the essential elements of each component should be delivered in each agency. The design of the SPIRIT intervention is based on a program logic model which outlines how the intervention is hypothesised to bring about change. Proximal and distal outcomes include: 1. Organisational capacity to use research (individual knowledge and skills; staff perceptions of the value of research; and organisational support for the use of research as demonstrated through leadership support, policies, tools and systems); 2. Research engagement actions (accessing and appraising research; generating new analyses and research including evaluation of current programs and policies; and interacting with researchers); 3. Research use (the different ways research informs policy or program work).
The outcome measures comprise an online survey and two structured interviews. A detailed description of SPIRIT, including the program logic model, is available online at http://bmjopen.bmj.com/content/4/7/e005293.full#F1 [9].
Process evaluation of interventions to increase the use of research in health policy and program agencies A detailed process evaluation is being conducted as part of the evaluation of SPIRIT. High quality process evaluations are critical for interpreting the outcomes of trials of complex interventions [10,11] where there is seldom a clear causal chain [12,13]. Process evaluations are increasingly used in trials of complex interventions [12,[14][15][16], including those that seek to change professional behaviours in complex settings [17][18][19].

Aims and objectives
The primary aim of the process evaluation is to describe how the SPIRIT intervention works in different settings, including if and why it has different effects or does not work at all. This will help us interpret the outcomes of the SPIRIT trial and optimise the design of future interventions. We conceptualise this work as focusing on the interaction between three Domains: 1. the intervention as it was implemented; 2. how people participated in and responded to the intervention; and 3. the contextual characteristics that mediated this relationship. Our specific objectives address each of these Domains as follows: 1. To document how the intervention was implemented, and the extent to which it was implemented as intended over time and across different intervention settings, including the degree to which essential elements were delivered (implementation fidelity). This allows us to see implementation successes or failures that may affect outcomes. (Domain 1: Implementation). 2. To describe how people participated in and responded to the intervention, including any variations across the settings [17,20]. This enables us to critique the program design and delivery, and helps with the interpretation of study outcomes. (Domain 2: Participation and response). 3. To describe the contexts in which the intervention was delivered and explore contextual factors that may influence the delivery or impact of the intervention, and the outcomes. This provides  [21][22][23]. It may also explain intentional and unintentional differences in delivery. Reflecting on the relationship between organisational context and how each agency used the program to address local needs may have implications for future program design and delivery. (Domain 3: Context).
In addition, we address a fourth objective required for new interventions in which the program theory is untested and the process evaluation is designed from scratch rather than employing piloted methods: 4. To explore how well the theory underpinning the intervention was (a) realised in the design and (b) delivered in each participating agency: (a) We will collect data that confirms, refutes or adds nuance to the constructs and relationships proposed in the SPIRIT Action Framework. This model was used as the basis for designing and testing SPIRIT intervention strategies (Redman S, Turner T, Davies HTO, Williamson A, Haynes A, Brennan S, Milat A, O'Connor D, Blyth F, Jorm L: The SPIRIT Action Framework: A structured approach to selecting and testing strategies to increase the use of research in policy, submitted). (b)We will explore whether the essential elements captured and delivered the change principles which informed them (theoretical fidelity). This includes assessing whether the elements thought to be essential appeared to be essential in real world contexts, describing how well they delivered the program's change principles, and developing hypotheses about how amended or different essential elements might have better delivered the change principles. This information can inform future intervention development. Details of the development and testing of essential elements during this trial will be reported separately.

Methods/Design
The process evaluation is being conducted as an integral part of the trial in each of the six participating agencies. An evaluation officer leads the work, including data collection and management. She works with a small multidisciplinary sub-team of investigators who designed the process evaluation, and who continue to monitor its implementation and contribute to the ongoing data analysis. This team is not involved in the design or implementation of the intervention.

Process evaluation design
We have designed a mixed methods process evaluation: gathering quantitative measures of intervention activities (such as numbers of participants and delivered components) [26], and qualitative exploration of the interaction between the intervention, how people experience it, and the contextual characteristics of the six organisations in which it is being delivered [44,45]. Table 2 provides an overview of these activities, which are discussed in more detail below.
Our approach incorporates aspects of developmental evaluation [46,47]. Traditional process evaluation tends to align with program logic models and focus largely on documenting key aspects of these linear and predictive pathways. Developmental evaluation takes a more emergent perspective, assuming that implementation within complex organisational systems will be unpredictable and will result in local adaptation, which may be more appropriate for achieving the intended program goals in that context. This approach focuses on reflective learning at every stage of the evaluation, adapting evaluation questions and data collection methods as the program is implemented, and feeding them back into an evolving trial design. This is appropriate when trialling complex strategies that are untested, producing uncertainty about what will work, where, and with whom; and when new questions, challenges, and opportunities are likely to surface [46,47]. Core aspects of SPIRIT are intended to be standardised, so process evaluation data will not be fed back to the intervention implementation team during the trial. Nevertheless, we hypothesise SPIRIT staff and providers are likely to adapt (intentionally and unintentionally) as they interact with participants and respond to contextual opportunities and constraints. The developmental evaluation perspective helps us see this variation as more than non-adherence to the implementation plan: it is expected emergence. Thus we are exploring why expert providers make particular in-situ changes, and striving to learn from how different strategies play out. These data are collated and analysed to support post-trial critical reflection and recommendations for optimising future interventions.
Ethical approval for the trial and the process evaluation was granted by the University of Western Sydney Human Research Ethics Committee, approval number H8970. Written informed consent is obtained from all study participants. Participant interviews are transcribed and de-identified. All data is kept confidential. The methods for monitoring and documenting intervention session delivery and obtaining participant feedback were piloted in a non-participant health centre within a government department before the trial began.

Data collection Domain 1: Implementation
Documentation of the intervention delivery is informed by the work of Bellg, Borrelli and colleagues who have developed frameworks for measuring intervention fidelity of individual health behaviour change treatments [22,25]. These frameworks were reviewed pragmatically for (a) applicability to the SPIRIT intervention change principles and (b) utility for our aims. Given the intervention's complexity, the use of external content experts to deliver the program, and the likelihood of local tailoring, some items were adapted for improved fit, some were discarded (e.g., those that assessed participants' comprehension and ability to perform skills), and a few were added (e.g., details of how interaction and reflective learning should be facilitated). Table 3 shows the final items (phrased as guiding questions) and the data collection strategies used for each. As shown in Table 2 Table 1). These comprise standardised quality assurance interviews with the person at the agency commissioning the product, and the lead of the research team that develops the product. Interviews take place six months after the completion of the work and ask for reflections on the brokering process (satisfaction, efficiency), level of contact between the agency and the research team, and the utility of the final product [48].

Domain 2: Participation and response
Six types of data are collected to meet the Domain 2 objectives: 1. Pre-session sign-in and consent: Participants are asked to sign in at the beginning of sessions, state their job title/position and give or decline consent for process evaluation data collection (digital recording and note-taking). Information about professional roles allows us to document different types of reach and participation in each agency. 2. Semi-structured and structured observation: The evaluation officer observes and digitally records intervention sessions. The delivery checklist (described above) is used to collate structured information about participation and responses to SPIRIT. Descriptive field notes are taken to supply supporting data (e.g., examples of body language or interactions that illustrate the quality of participation) and to record information the checklist does not cover, such as how participants appear to interact with session contents, providers, and with each other; plus any contributions that might help answer our research questions. Notes are marked to indicate potentially valuable comments that should be verified using the audio recording. Immediately after each session, these notes are entered into a semi-structured session memo template in order to synthesise key aspects of the data and link it to other sources, and to explore hypotheses that will inform further data collection. 3. Self-reported evaluation feedback: Participants are asked to complete anonymous feedback forms immediately after intervention sessions. Session- The first round of interviews focuses on agency culture and context (see section below). The second round focuses on how the interviewee, their team and the wider organisation perceived and responded to the intervention and other aspects of SPIRIT such as the outcome measures and the process evaluation. A flow chart of open-ended questions and prompts derived from the SPIRIT program logic model is used to explore interviewees' views and accounts of how the intervention may have influenced their, and their organisation's, capacity to use research. A copy of the interview schedule for general participants (i.e., not the Liaison Person or CEO) is available in Additional file 4. All participant interviews are digitally recorded and professionally transcribed. An unstructured memo is written directly after each interview to capture initial analytic thinking and hypotheses. Memos are further developed when the transcriptions are read, corrected and de-identified.

Interviews, meetings and informal conversations:
Throughout the trial, information is collected from the people implementing SPIRIT about participation and responses to the intervention, the outcome measures, and the process evaluation. Ad hoc conversations address issues such as how participants are responding to requests to complete outcome measures, and feedback from Liaison People about the administrative tasks they are engaged in. SPIRIT staff were interviewed after agency visits during the pre-intervention agency engagement phase, and continue to be interviewed after they provide mid-intervention feedback (these sessions are not attended by the evaluation officer). Questions focus on agency attitudes to SPIRIT and any factors that might affect engagement and outcomes.

Domain 3: Context
We conceptualised context as incorporating the social, structural and political environment of each participating organisation, but focus on six dimensions that we identified from the bodies of literature described earlier: (i) work practices and culture; (ii) agenda-setting and work prioritisation; (iii) leadership styles and how leaders are perceived; (iv) how different kinds of information, including research, is accessed, used and valued by individuals and the broader organisation; (v) barriers and enablers to using research; (vi) any other contextual factors that might affect outcomes.
Four types of data are collected under this Domain: 1. Structured observation: The delivery checklist and supplementary field notes are used to collate core information about the context of sessions (site, facilities, etc.). 2. Semi-structured observation: During intervention sessions, the evaluation officer takes extensive field notes in relation to the dimensions described above. This information is collated using the same methods as for the participation information (described above in Domain 2, data type 2: Semi-structured and structured observation). 3. Interviews: These take place with purposively sampled participants in the early phase of the intervention and focus on capturing information within the six dimensions outlined above. A copy of the interview schedule for general participants is available in Additional file 5. The CEO, or equivalent, for each agency will be invited to participate in an interview after the final round of outcome measures is complete. This interview will explore why they participated in the trial, what else was going on in and around the organisation that might have affected how staff engage with research, and how change does or does not occur in that organisation.

Interviews, meetings and informal conversations:
The interviews, study management meetings, and informal conversations with SPIRIT staff described under Domain 2 also address contextual issues. This includes feedback from agencies about changes in funding, staff and governance; and how they are being affected by developments in external agencies, politics and the media.
Semi-structured running memos are maintained for each of the six organisations that capture information about participation, responses and contextual changes. They include additional information that is collected opportunistically from a variety of sources including ad hoc conversations with agency staff at non-SPIRIT forums (e.g., conferences) and electronic media such as Twitter and government websites. A cross-agency memo is maintained to capture overarching issues and themes, including emerging themes that require more investigation in the field.

Program improvement
Interview and observational data are collected across each Domain to inform program improvement recommendations for future studies. For Domain 1 (implementation), we focus on how delivery might be improved. This includes fidelity considerations (the congruence between intended and actual delivery) and factors that are not specified in the implementation plan, such as dayto-day communication strategies and creative variations introduced by providers. In Domain 2 (participation and response), we ask how the intervention content, structure and techniques might have better met the needs of the targeted personnel in each agency and effected change more successfully. In Domain 3 (context), we focus on how each intervention setting may have influenced proximal outcomes (including participation) and distal outcomes as measured in the trial, and how the design and delivery of the intervention could have been more appropriate for and responsive to agency culture and context.

Data management and analysis Domain 1 (implementation)
Data from the delivery checklists and participant feedback forms is entered into a database that contains fields for each item by session and by agency. This provides a comparative overview of delivery fidelity, why intentional and unintentional changes are made, participants' evaluative feedback, and other critical information about intervention sites, program delivery and participation. See Additional file 6 for an example of the spreadsheet used to collate data about a Leadership Program session. Analysis will focus on variation between agencies in how the intervention was implemented, particularly differences in the proportion of essential elements delivered at each site and differences in participant feedback, and any association between the two.

Domains 2 (participation and response) and 3 (context)
All other data (early-and post-intervention interview transcripts, and session, agency and interview memos) are managed using Framework Analysis [49,50]. Framework allows large amounts of diverse data to be analysed systematically, it is more transparent than most qualitative data analysis methods, it simplifies and supports comparative case analysis, and it enables us to review inprogress analysis as a team [49,50]. All transcripts are uploaded to NVivo 10 [51] and synthesised in matrices. We use three matrices: one for Domain 2 participation and response data; one for Domain 3 context data; and a third that collates data about participants' research and information utilisation. Data are organised both by case (individuals clustered by agency) and by category. Categories were developed by the process evaluation team reviewing preliminary interview data and memos in relation to the process evaluation questions and the SPIRIT program logic model, and multiple coding interview transcripts to test and revise the categories. Categories include the range of intervention implementation strategies and the research engagement actions identified in the SPIRIT program logic model. Additional file 7 lists the categories used for Domain 2 (participation and response) and Domain 3 (context) framework matrices. Limited qualitative information from two outcome measures is included; the person conducting and coding the outcome measures interviews collates de-identified data from transcripts in relation to the six dimensions used to guide observations (described above) and provides it in a form that allows it to be integrated into the process evaluation framework matrices for analysis. Completed matrices thereby synthesise our varied data within broad categories, in preparation for more interpretive analysis [49]. Memos from interviews and intervention sessions, agency memos and the overarching cross-agency memo are re-read and coded in NVivo using Domain 2 and 3 framework categories.
All data related to each category is clustered and reviewed inductively, identifying key themes from close reading of the dataacross all sessions and participantsto identify and distil the variation of views, experiences and behaviours within each agency. This work includes the development of schematic case studies for each agency. Interpretive memos are written that refine the theme, linking it to corroborating data sources and, when appropriate, linking it to other categories or themes in NVivo or in the implementation fidelity database so that each theme is supported by the broadest range of evidence. This data is reviewed in relation to the interactions between delivery, participation and context, and will also be reviewed in relation to outcomes when they are known. Analysis is on-going and guides continuing data collection. Early themes are revisited in the light of subsequent analytical changes [49,50]. Themes are explored within and across cases and will be reviewed in relation to the outcomes. The small process evaluation team, who co-designed the process evaluation and monitor its implementation, review distilled data and discuss interpretations.

Trial status
SPIRIT is currently being implemented. The trial will conclude in April 2015.

Discussion
In this paper, we describe the design of a process evaluation embedded within a trial of a complex intervention designed to build individual and organisational capacity to use research in policy and program development. We report the methods of the process evaluation and discuss how they are functioning during implementation. In doing so, we contribute to the literature in three ways: (i) we provide a worked example of how to embed process evaluation in the design and evaluation of a complex intervention (these are rare in the literature [15]); (ii) we illustrate an approach to tackling the challenges of complexity in the intervention and its implementation settings; and (iii) we provide, to our knowledge, the only detailed example of the methods for a process evaluation of an intervention conducted as part of a randomised trial in policy organisations.

Strengths and weaknesses
As an integral part of the SPIRIT trial, the process evaluation is well-resourced, detailed and has good access to a range of rich data sources. Early trialling and consultation with policy and program colleagues helped us identify methods that would be appropriate, feasible and effective. For example, they advised us that focus groups would have low attendance and that interviews would obtain franker responses. Also, that our original plan to use ethnographic methods to study day-to-day work practices would be regarded as unacceptably intrusive, particularly in the context of an evaluation. To date, our methods appear to be appropriate for this trial. They are sufficiently flexible to gather data responsively and, with minor exceptions, there is no indication that the process evaluation has impacted participants' comfort or willingness to express themselves frankly in intervention sessions. Participants have given consent for intervention sessions to be observed and recorded, and the majority have completed anonymous feedback forms with no negative comments about the process evaluation. External presenters have understood the purpose of fidelity monitoring and have not appeared to be affected by it. Observations of these sessions (particularly those that are highly interactive) provide access to nuanced information about norms, values, processes, priorities and constraints that is helping us develop rich case studies of each agency's organisational context. The two phases of faceto-face interviews (early interviews focus on organisational culture and work processes, while post-intervention interviews focus on impact) are providing valuable insights about the relationships between the intervention, participation and context. Discussion with the multidisciplinary process evaluation team about emerging themes and interpretations strengthens the trustworthiness of findings.
However, we note some weaknesses. The role of the evaluation officer may inhibit full and frank feedback. Participants are aware that she works within the study team that includes the researchers responsible for designing and implementing the intervention, and some seem to assume that she is involved in decisions about design and implementation. This may affect the openness with which they talk about the trial. Also, the evaluation officer is a researcher asking participants about an intervention designed to increase how they value and use research. It is likely that she is not perceived as disinterested, and this may result in social desirability bias in interviews. The process evaluation itself may add to the burden of participation, and people may find it hard to raise this. To date, these factors do not appear to have had significant effects. As in previous interview-based studies with policymakers [52], respondents have been generous with their time in interviews and our impression is that most have spoken openly, but people with concerns may have been deterred from participating in interviews in the first place.
Lastly, comprehensive evaluation of the action framework is outside the scope of process evaluation, so our contribution to the evolution of the SPIRIT Action Framework will be limited. We should be able to comment on applicability of the Framework within the parameters of this trial, and to flesh out some of the nuances in the relationships between its component parts. But more targeted and responsive data collection and analysis is needed to generate hypotheses that can inform further iterations of the Framework. As in other investigations of policy processes to date, our methods do not fully access the central phenomenon of policy decision-making and the role (current and potential) that research plays in it.

Conclusion
This paper presents a detailed protocol for the process evaluation of a unique complex intervention in health policy and program agencies. A key feature of the design