Developing measures to assess constructs from the Inner Setting domain of the Consolidated Framework for Implementation Research

Background Scientists and practitioners alike need reliable, valid measures of contextual factors that influence implementation. Yet, few existing measures demonstrate reliability or validity. To meet this need, we developed and assessed the psychometric properties of measures of several constructs within the Inner Setting domain of the Consolidated Framework for Implementation Research (CFIR). Methods We searched the literature for existing measures for the 7 Inner Setting domain constructs (Culture Overall, Culture Stress, Culture Effort, Implementation Climate, Learning Climate, Leadership Engagement, and Available Resources). We adapted items for the healthcare context, pilot-tested the adapted measures in 4 Federally Qualified Health Centers (FQHCs), and implemented the revised measures in 78 FQHCs in the 7 states (N = 327 respondents) with a focus on colorectal cancer (CRC) screening practices. To psychometrically assess our measures, we conducted confirmatory factor analysis models (CFA; structural validity), assessed inter-item consistency (reliability), computed scale correlations (discriminant validity), and calculated inter-rater reliability and agreement (organization-level construct reliability and validity). Results CFAs for most constructs exhibited good model fit (CFI > 0.90, TLI > 0.90, SRMR < 0.08, RMSEA < 0.08), with almost all factor loadings exceeding 0.40. Scale reliabilities ranged from good (0.7 ≤ α < 0.9) to excellent (α ≥ 0.9). Scale correlations fell below 0.90, indicating discriminant validity. Inter-rater reliability and agreement were sufficiently high to justify measuring constructs at the clinic-level. Conclusions Our findings provide psychometric evidence in support of the CFIR Inner Setting measures. Our findings also suggest the Inner Setting measures from individuals can be aggregated to represent the clinic-level. Measurement of the Inner Setting constructs can be useful in better understanding and predicting implementation in FQHCs and can be used to identify targets of strategies to accelerate and enhance implementation efforts in FQHCs.


Background
Translating the most recent evidence of what works in disease prevention, diagnosis, and treatment into routine practice in a timely fashion has been a significant challenge for both researchers and practitioners [1][2][3][4]. This challenge can be even greater for community clinics such as Federally Qualified Health Centers (FQHC) that struggle to meet evolving needs of their patients and demands of their organizations and funders. Despite these challenges, it is clear that to improve the quality and effectiveness of primary care, it is essential to accelerate and improve the implementation of evidence-based approaches (EBAs). There are many models and frameworks, such as the Consolidated Framework for Implementation Research (CFIR), that describe contextual factors associated with implementation, yet scientists' ability to accurately measure and intervene upon those factors has been limited.
To advance the field of implementation science and to enable better understanding of factors influencing implementation, accurate and valid measurement is crucial. Nevertheless, systematic reviews reveal that many available measures of implementation context, process, and outcomes lack reliability or validity [5][6][7][8]. An urgent need exists for psychometrically strong measures in implementation science. Without them, the field cannot produce cumulative knowledge about implementation barriers, facilitators, or processes, or generate sound evidence about which implementation strategies work best, when, and for whom. The purpose of this study was to develop and test measures of constructs of the Inner Setting domain of the CFIR [9].
The "Inner Setting" of organizations has been identified as an important set of constructs that can influence the implementation of new research findings into practice [9]. There have been a number of useful definitions of the Inner Setting that help clarify its meaning and potential measurement. For example, Greenhalgh et al. developed a model to explain how innovations in health service delivery can diffuse through organizations; the authors described the organizational (inner) context which included both antecedents for innovation and readiness for innovation [10]. They also highlighted that organizations provide widely differing inner contexts for innovation implementation, and some characteristics of organizations (e.g., structure, culture) influence the likelihood that an innovation will be successfully adopted and incorporated into their usual practice. Lash et al. (2011) described the Inner Setting as the clinic or organizational context in which the intervention will exist [11]. Although we have seen an advance in the literature regarding conceptualization of the Inner Setting contexts and their influence on innovation adoption and implementation, empirical work to quantitatively measure the Inner Setting constructs is limited.
The CFIR was developed by reviewing and synthesizing constructs across 19 implementation and dissemination theories and frameworks into a unified typology [9]. The CFIR includes 37 constructs within 5 major domains: Inner Setting, Outer Setting, Intervention Characteristics, Characteristics of Individuals, and the Process of Implementation. The Inner Setting domain includes 5 constructs: Structural Characteristics, Network and Communications, Culture, Implementation Climate, and Readiness for Implementation [9], and another 9 sub-constructs (e.g., Learning Climate and Available Resources). While the framework describes these domains and constructs within them, it does not articulate relations between constructs or how they may interact to influence implementation. Accurate measurement is needed to begin to understand these relationships and to test whether individual or multiple constructs influence implementation.
This paper describes the work of the Cancer Prevention and Control Research Network (CPCRN) to develop measures for the Inner Setting domain of CFIR and assess the psychometric properties of those measures using data from a multi-state sample of FQHCs. The CPCRN is a group of collaborating centers funded by the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), through the Prevention Research Centers Program since 2002 [12,13]. Each CPCRN center has regional networks of academic, public health, and community organizations that work together to further the dissemination and implementation of EBAs for cancer prevention and control [14]. This article is based on research carried out by the CPCRN FQHC Workgroup. The goal of the FQHC Workgroup was to advance the dissemination and implementation of evidence-based cancer prevention and control programs in FQHCs that provide primary care to underserved populations. Aligned with this goal was the aim to identify factors that influence the implementation of cancer control EBAs beginning with the development of validated measures of CFIR constructs. This study focuses on the development and testing of measures for 7 constructs related to the Inner Setting domain. Work to develop measures of other CFIR constructs is described elsewhere [15].

Development of measures for the Inner Setting constructs
The development of measures occurred in 4 phases: first, we identified constructs of interest and compiled existing measures for those constructs; second, we generated items for each construct of interest by adapting items from existing measures and developing new items to create a set of preliminary measures; third, we pilot-tested and refined the preliminary measures; and fourth, we conducted a validation study with the refined measures. Since our goal was to develop measures of constructs that could potentially be targets for implementation interventions and could be implemented feasibly within the FQHCs, we chose CFIR constructs that were relevant for FQHCs, modifiable, and hypothesized to be measurable with few items. For all the steps described above, we used a consensus development process. We made decisions about what constructs to include at a CPCRN meeting that included CPCRN investigators and other implementation science experts. We discussed each Inner Setting construct and sub-construct and chose a preliminary set of constructs based on expert opinion about importance, changeability, and feasibility for measurement. Following the in-person meeting, the CPCRN FQHC Workgroup held two more in-person meetings and a series of teleconference discussions to make final decisions on the constructs and other development steps described above. We ultimately selected 15 out of 37 CFIR constructs to create measures for. Among these were 5 constructs that fall within the Inner Setting domain: Culture, Implementation Climate, Learning Climate, Leadership Engagement, and Available Resources. CPCRN sites then each took the lead on searching for items for one or more constructs, and the team held weekly meetings for several months and made decisions collectively about the items chosen as described below.

Identification and selection of items
We began our identification of the Inner Setting measures by drawing on existing surveys that had been administered in FQHCs. Specifically, we reviewed a survey created by the Association of Asian Pacific Community Health Organizations (AAPCHO) to study capacity for implementation of evidence-based interventions for cancer screening [16]. We chose the AAPCHO because it was highly related and allowed us to build on previous work. This survey included the Practice Adaptive Reserve (PAR) scale which had previously been used in the evaluation of the national Patient-Centered Medical Home Demonstration Project [17][18][19]. First, we identified items from the AAP-CHO survey that matched CFIR constructs based on the construct definitions [9] and the face validity of items. We held multiple group discussions to reach consensus on the "match". For constructs that did not have matching items from the AAPCHO survey or had items that did not fully reflect their definitions, we conducted a literature search for other existing measures. We started with models and frameworks included in the CFIR to see if they referred to measures of specific constructs. We also searched the following electronic databases: PubMed, CINAHL, ISI Web of Science, and PsycINFO for peer-reviewed articles published in the past 15 years to identify relevant measures. We used search terms such as CFIR, innersetting, implementation culture, and other construct names to identify measures and constructs. In addition to the search, we also reviewed measures listed on the Grid Enabled Measures (GEM) and Society of Implementation Research Collaboration (SIRC) websites. We then compiled all the potential measures for those constructs and had extensive discussions to select items from each. We used the following criteria for item selection: (1) items fit the CFIR definition of the constructs, (2) items had been used in health related settings (e.g., public health, healthcare, mental health, and school) and were relevant for FQHCs or could be adapted to the FQHC setting, and (3) items fit the goals of the survey and were from published studies with measures that demonstrated some evidence of reliability (e.g., internal consistency) and validity (e.g., construct validity) in previous research.
In searching for Culture measures, we identified two sub-constructs not explicitly listed in the CFIR, Stress [20] and Effort [21], which were assessed separately. We decided to include these sub-constructs in addition to a more general measure of culture because the workgroup members believed that while related, these constructs were likely distinct. Therefore, our final list of the Inner Setting measures included 38 items to measure 7 constructs and sub-constructs: Culture Overall (CFIR construct; 9 items), Culture Stress (sub-construct based on the work of Patterson [21]; 4 items), Culture Effort (sub-construct based on the work of Lehman [20]; 5 items), Implementation Climate (CFIR construct; 4 items), Learning Climate (CFIR sub-construct; 4 items), Leadership Engagement (CFIR construct; 4 items), and Available Resources (CFIR sub-construct; 7 items). Definitions for each the Inner Setting construct and sub-construct are described in Table 1.

Item adaptation and survey development
The identification of measures made it clear that some constructs could be measured generally, that is, they did not necessarily need to be tied to a particular implementation effort or EBA, while others required specific anchoring about what EBA the item was referring to. Selected items were adapted for the context of improving colorectal cancer (CRC) screening in FQHCs. For interventionspecific constructs, such as Implementation Climate, items were also adapted to the specific EBA for CRC screening that the FQHC was implementing (captured in another section of the survey). EBA options were selected from those recommended by the Guide to Community Preventive Services (Community Guide) for increasing CRC screening (www.thecommunityguide.org).
Additionally, since we were interested in understanding factors influencing implementation of several EBAs for increasing CRC screening, participants were first asked about the level of implementation of each Community Guide recommended EBA and then asked questions related to CFIR constructs that were specific to the EBA being implemented. Because of constraints on the length of the survey, when a respondent indicated that the FQHC was implementing more than one EBA, subsequent questions on CFIR constructs referred to only one of the EBAs mentioned. The survey automatically inserted only one of the EBAs using the following prioritization: provider reminders first, followed by patient reminders, one-onone education, and provider assessment and feedback. For example, if the clinic responded that they were implementing both provider reminders and one-on-one education, the follow-up questions would insert provider reminders. An example of a follow-up question is as follows: "the program is a top priority in the company" was an item to measure implementation climate by Klein et al. It was adapted as "Using <EBA> to increase CRC screening rates is a top priority in the clinic" in our measure. Depending on which EBAs were used by the clinic, as indicated by previous answers, the question appeared online with a specific EBA. Table 1 indicates whether an item was general or specific to an EBA.

Pilot testing and refinement
We programmed a web-based survey and then pilottested the survey in 4 FQHCs in 2 states (WA and TX). We also sought input from leaders at individual FQHCs and states' Primary Care Associations (PCA) to ensure the appropriateness of the measures for FQHC clinic staff. More specifically, we asked leaders to review constructs for their importance and changeability as well as items for their understanding and representation of the constructs. We then held telephone meetings with leaders to discuss feedback. Feedback from leaders confirmed our selection of constructs and led to minor changes in the wording of some items. A01, A06,A15, A19

Readiness for implementation
Tangible and immediate indicators of organizational commitment to its decision to implement an intervention, consisting of 3 sub-constructs. Implementation readiness is differentiated from implementation climate in the literature, by its inclusion of specific tangible and immediate indicators of organizational commitment to its decision to implement an intervention.

Available resources
The level of resources dedicated for implementation and on-going operations including money, training, education, physical space, and time

Recruitment and survey administration
We used a variety of strategies to recruit FQHCs to participate in the study [22]. While survey administration was customized, recruitment protocols were tailored based on the CPCRN existing partnerships with FQHCs in each participating state. Five CPCRN sites (WA, SC, TX, GA, CO) partnered with their state's PCA. In 4 of these states (WA, TX, SC, CO), the PCA emailed their member FQHCs encouraging them to participate in the survey. Five CPCRN sites that had existing relationships with FQHCs (TX, GA, CA, CO, MO) invited them to participate in the survey by contacting them directly through email, telephone calls, or in-person meetings. One state PCA (SC) also directly recruited participants at a meeting of FQHC staff members.
In most cases, one individual from each participating FQHC was designated as the main contact, usually the clinic's medical or administrative director. This individual was asked to complete questions about their clinic characteristics as well as send an introductory email with a link to the online FQHC CFIR survey to eligible staff members encouraging their participation. The online FQHC CFIR survey was programmed to allow a maximum of 10 staff from each clinic to complete the survey with a maximum of 3 providers (physicians, nurse practitioners, and physician assistants), 3 nurses or quality improvement staff, and 4 medical assistants (non-medical administrative staff were excluded). Between January 2013 and May 2013, providers and staff at FQHC clinics located in CA, CO, GA, MO, SC, TX, and WA completed the survey. Reminder emails were sent to potential participants at 2, 4, 6, and 8 weeks post-invitation. Incentives were offered to either individuals completing the survey or to FQHCs, whichever was preferred by the FQHC. If the clinic chose the individual incentive, participants received $25 gift cards. FQHCs that chose the clinic incentive received $250. One FQHC declined any incentives. All study procedures were approved by the Institutional Review Boards of each CPCRN Collaborating Center as well as the Coordinating Center at the University of North Carolina at Chapel Hill and the CDC.

Data analyses
We assessed descriptive statistics for clinics which responded to the clinic characteristics survey (n = 52) and demographic information from FQHC CFIR survey respondents (n = 327). We also assessed descriptive statistics for FQHC CFIR survey measurement items. Since we collected data from individuals nested within clinics to measure clinic-level constructs, we used a series of confirmatory factor analysis (CFA) models to test factor structure. We first conducted single-level CFA models adjusting for the nested structure of the data for each of the following constructs: Culture Overall, Culture Stress, Culture Effort, Implementation Climate, Learning Climate, Leadership Engagement, and Available Resources. We used full information maximum likelihood estimation with robust standard errors to account for missing data and non-normality of survey items. We adjusted for the nested structure of the data by using the TYPE = COMPLEX command in Mplus. We used multiple indices to evaluate model fit as recommended by [23]: Chi square (non-significant value = good fit), comparative fit index (CFI, > 0.90 = adequate fit and > 0.95 = good fit), Tucker-Lewis Index (TLI, > 0.90 = adequate fit and > 0.95 = good fit), standardized root mean square residual (SRMR, < 0.08 = adequate fit and < 0.05 = good fit), and root mean square error of approximation (RMSEA, < 0.08 = adequate fit and < 0.05 = good fit) [23][24][25][26]. We considered model adjustments if modification indices revealed substantial model improvements that were theoretically meaningful (e.g., reverse-coded items or items that referred to a specific EBA versus a general EBA).
We then conducted two sets of multilevel CFA models for each respective construct. Multilevel models allow for modeling the factor structure at the within-group or individual-level (level 1) and the between-group or the clinic-level (level 2), as illustrated in Fig. 1 [27,28]. This approach allowed for testing whether the factor structure was similar at the individual-level and the clinic-level, Fig. 1 Example of multilevel confirmatory factor model for the Leadership Engagement Scale. The item number with B represents clinic-level items which is assumed when only modeling individual data to represent a higher level. In the first set of multilevel models, we allowed factor loadings for both levels to freely estimate to test unrestricted models. We then tested a set of models where we constrained factor loadings to be equal across levels to determine if items were loading similarly for the individual (within-group) and clinic-levels (between-group). We compared model fit of constrained and unconstrained models between respective factors using Satorra-Bentler's scaled chi square difference tests [29]. To assess fit for multilevel models, we used the same fit indices as previously listed, including the SRMR which is presented separately for the individual and clinic-levels for each model.
To evaluate internal consistency, we computed Cronbach's alpha for each of the scales. We also examined discriminant validity by calculating correlation coefficients of each pair of scales using individual-level data and aggregated data by clinic (to represent the clinic-level). To further assess the reliability of mean scale scores aggregated at the clinic-level, we computed two intraclass correlation coefficients, ICC(1) and ICC(2), using one-way random effects ANOVA [30]. ICC(1) provides an estimate of the proportion of variance in a specific measure that is explained by group membership (FQHC clinic). The larger the value of ICC(1), the greater agreement or shared perception there is among raters within a group (FQHC clinic). ICC (2) indicates the reliability of the group-level mean scores. It varies as a function of ICC(1) and group size: the larger the value of ICC(1) and the larger the group size, the greater the value of ICC(2) and then, a more reliable group mean score. As recommended in the literature [30,31], we used a threshold of 0.70 to indicate a reliable group score.
Finally, we tested an index of inter-rater agreement, the r WG(J) , to further assess the validity of clinic-level means as measures of clinic-level constructs. The r WG(J) index indicates the degree of agreement among raters by comparing within-group variances to an expected variance under the null hypothesis of a distribution representing no agreement [32]. An r WG(J) score above 0.70 indicates sufficient inter-rater agreement to compute FQHC cliniclevel means for clinic-level constructs [33]. ICC(1), ICC(2), and r WG(J) statistics at the clinic-level were computed for clinics with two or more respondents, so clinics with only one respondent were dropped from analyses. We used Mplus version 7.31 [34] for testing all CFA models. To test Cronbach's alpha, correlation coefficients, ICC(1), ICC(2), and r WG(J) , we used SPSS version 23.

Sample characteristics
A total of 327 individuals from 78 FQHCs responded to the survey. However, there were missing data across some survey questions and demographic variables (Tables 2 and   3). The majority of respondents were female (79%) and non-Hispanic individuals (64%) ( Table 2). Thirty-seven percent were medical assistants, 36% were nurses, and 19% were physicians. Most participants had associate degrees or technical school diplomas (46%) or graduate or medical degrees (37%). Around 40% had worked at the clinic for 2 years or less, and 74% worked 40 h or more per week. Sixty percent of participants reported that they provided services in language(s) other than English.
There was an average of about 4 respondents per clinic. Thirty-nine clinics had 1-3 respondents, 22 clinics had 4-6 respondents, and 17 clinics had 7-10 respondents. Of the 78 clinics, 19 were from WA, 15 from TX, 22 from CO, 10 from SC, 5 from GA, 6 from CA, and 1 from MO. A total of 52 clinics completed a separate clinic characteristics survey. Based on survey results from this subsample, the majority of the clinics (64%) served 5000 patients or more in 2012. Under half the clinics had ≥ 50% of patients uninsured and ≥ 40% of patients with limited English proficiency.

Factorial validity
Item means ranged from 2.84 (± 1.08) to 4.09 (± 0.88) while item sample sizes ranged from 258 to 327 (Table 3). The majority of item response distributions were negatively skewed. With the exception of the Culture Stress model, fit for the Inner Setting constructs was good to excellent (RMSEA ≤ 0.08, CFI ≥ 0.95, TLI > 0.93, SRMR ≤ 0.04) ( Table 4). The RMSEA value for the Culture Stress model indicated poor fit (> 0.08); however, the other indicators suggested good model fit. Almost all item factor loadings adjusted for the nested data structure were greater than 0.40 with the exception of item A35a in the Available Resources model (Table 3). Three models contained correlated residual variances: Culture Stress, Learning Climate, and Available Resources. Reasons for correlating residuals included reverse scored items and questions that were focused on a specific EBA versus more general resources within the same construct. Table 3 includes the variance explained by the clinic for each respective item. Results indicated the average ICC across all items was 0.13 with a range from 0.04-0.28. These results suggest that on average 13% of the variance for items was explained by the clinic, supporting the use of multilevel models [23]. Model results for the unconstrained multilevel CFA models were relatively consistent with results adjusted for clustering. The level 1 factor loadings were similar to the adjusted factor loadings while the level 2 factor loadings were consistently higher. Unconstrained models for Culture Effort and Available Resources demonstrated good model fit across all indices whereas Implementation Climate and Learning Climate had good model fit for most indices (Table 5). Unconstrained models for Culture Overall, Culture Stress, and Leadership Engagement had inconsistent fit results suggesting weaker (yet still good) fitting models relative to the other constructs.
When evaluating constrained models, the relative model fit appeared to improve for Culture Overall, Implementation Climate, Learning Climate, Leadership Engagement, and Available Resources (Table 5). Comparing constrained to unconstrained models using Satorra-Bentler's scaled chi square difference tests revealed no significant differences in model fit. These results suggest factor loadings were similar for the within-and between-group portions of the model since allowing parameters to freely estimate did not significantly improve fit. Notably, the SRMR values were higher for the between-group portion of the model compared to the within-group portion suggesting the models fit the individual data better than the group-level data. Culture Stress had very high SRMR values for the between portion of both constrained and unconstrained models leading to insufficient support for use of this measure at the clinic-level (Table 5). Furthermore, the level 2 factor loadings of the Culture Stress model suggested an unexpected weak relation with item A37 and an inverse relation with item A39 (Table 3). Both factor loadings for these items were inconsistent with the level 1 and adjusted factor loadings, which were likely contributing to model misfit for the between portion of the model.

Discriminant validity
We assessed discriminant validity by examining the correlations among constructs using the average score of each scale at the individual-and clinic-levels ( Table 6). Three of the correlations, Culture Overall and Learning Climate, Culture Overall and Leadership Engagement, and Learning Climate and Leadership Engagement had values above 0.80 at both the individual and clinic-levels suggesting there may be some measurement overlap between constructs. The other correlations were well below the threshold, so good discriminant validity was shown across most the Inner Setting dimensions.

Inter-rater reliability and agreement statistics
Inter-rater reliability and inter-rater agreement statistics were computed to assess the reliability and validity of computing clinic-level means from the individual-level data. The results are presented in Table 7. With the exception of Culture Effort, the ICC(1) values of the scales were statistically significant and indicated that 10 to 22% of the variance in scale scores occurred between clinics. The ICC values for Culture Effort were negative suggesting there was a greater amount of variance within clinics versus between clinics for scale scores.

Discussion
This study sought to identify, develop, and test measures that assess multiple dimensions of the CFIR Inner Setting domain. Our findings suggest that these measures exhibit adequate or good psychometric properties. More specifically, CFAs, inter-item consistencies, and correlation analyses indicated our Inner Setting measures have structural validity, reliability, and discriminant validity. Additionally, multilevel CFA results and inter-rater reliability and agreement analyses support using clinic-level means computed from individual data for most constructs. Based on CFA results, scales with the strongest evidence for structural validity were Culture Effort and Available Resources. There was also moderate to strong evidence supporting the structural validity of Culture Overall, Implementation Climate, Learning Climate, and Leadership Engagement where the majority (but not all) of the fit indices suggested good or excellent fit. Culture Stress had the weakest evidence for structural validity, which could in part be due to the limited number of items (4) with one item focused on the individual (A36) whereas the other items were about the clinic (A37-A39).
When evaluating discriminant validity, constructs were differing from each other with the exception of Culture Overall, Learning Climate, and Leadership Engagement. We would expect there to be overlap given all these constructs are part of the Inner Setting. However, the stronger relation observed between these constructs is likely due to the fact that they can influence each other. For example, in this study, we included items that assessed the level of support the leader of an organization provides to create a productive and enjoyable environment where communication is valued [34]. Evidence shows that the culture and climate of an organization is highly influenced by its leadership [17]. Likewise, the organization's learning climate, which in our study was measured with items that assessed the communication, observation, and reflection, and the desire to make things better can be seen as important elements that would make a clinic more "ready" for an implementation effort [19]. While Table 3 Means (standard deviations) The following are available to make <EBA> work in our clinic: patient awareness/need The following are available to make <EBA> work in our clinic: intervention team  these constructs were correlated, they can be assessed and targeted independently with implementation interventions.
In implementation science studies, the level of measurement is often a challenge because, while we may be interested in understanding how contextual factors influence adoption, implementation, and sustainment of EBAs, we typically measure these constructs by obtaining data from individuals within that organization [35]. In many cases, these contextual factors constitute subjective perceptions of organizational norms, culture, and readiness that must be assessed at the individual-level and could potentially vary from one person to another particularly among individuals with different types of roles (e.g., provider vs clinic manager). Nevertheless, it is likely that assessments from multiple individuals could provide a more accurate reflection of these organizational characteristics than by obtaining this information from one organizational representative alone.
In our study, we used two different approaches that have been used in previous studies to assess whether data collected from individuals can be used to represent the clinic. These included: (1) using multilevel models with equality constraints on corresponding factor loadings for the between and within portion of the models [36] and (2) testing reliability and agreement statistics for the individual-level data [37]. The results between the two methods were relatively consistent in supporting the use of clinic-level constructs with a few exceptions. Culture Stress had poor fit for the clinic-level portion of the multilevel model in addition to having weaker levels of agreement. Multilevel Culture Effort models demonstrated strong indicators of fit; however, assessing the ICC(1) for the individual data indicated there was more variance in the scale scores within clinics than between clinics. Overall, there is good evidence to support the   [15,38]; some studies have used qualitative approaches [9] while other studies have attempted to measure the CFIR Inner Setting constructs quantitatively [38]. Some studies have quantitatively assessed the extent to which providers perceive certain CFIR constructs as important in implementing a particular behavior [9,39] but do not measure the construct explicitly. Other studies have used existing measures or subscales to assess some but not all the CFIR Inner Setting domains. For example, Ditty et al. examined constructs from the Inner Setting domain of CFIR and explored their association with the implementation of an evidence-based behavioral therapy [40]. Using a sequential mixed methods approach that included a survey followed by qualitative interviews, the authors explored the relation of selected the Inner Setting variables with implementation of dialectical behavior therapy among trained clinicians. While this study assessed cohesion and communication, team climate for innovation, and on-going supervision using existing scales, constructs from other domains such as Leadership Engagement and Available Resources were not assessed [40]. Acosta et al. (2013) included measures of coalition functioning, leadership, and incorporation of new practices as covariates in evaluating the Assets Getting to Outcomes intervention, an implementation intervention for implementing programs that employed a positive youth development approach to prevention [41].
None of the published work, however, provides a way to measure the multiple dimensions of the Inner Setting domain. Emmons et al. expressed the need for developing and evaluating measures to assess the multidimensionality of organizational-level (inner setting) constructs [5]. Weiner et al. also highlighted that a robust measure could be a valuable diagnostic tool to guide implementation efforts in practice settings [7]. For example, stakeholders in clinical settings (and potentially other organizations) could use such a tool assess the level of culture, implementation climate, or other constructs from the Inner Setting. This information could inform the development or selection of implementation strategies to improve these, if assessments reveal deficits in any areas. Addressing these factors could lead to more efficient and effective implementation efforts in practice settings. Additionally, the measures could be used to assess change in these constructs over time. This study addressed both calls in the literature and requests from the practice communities by developing a psychometrically robust instrument useful for both research and practice.
This study has several strengths. To our best knowledge, this work is the first to develop quantitative measures of Inner Setting, based on the CFIR, for use in FQHCs. In addition, because of the focus on developing pragmatic measures that could be used in implementation research in FQHCs as well as by FQHCs themselves, we chose the Inner Setting constructs that were relevant to FQHCs, amenable to intervention change, and could be assessed with few items. Another strength of this study is that we used a rigorous scale development approach to assess the psychometric properties of our measures. This approach tested different forms of reliability and validity in addition to using multilevel CFA models to account for the individual and clinic-level aspects of the data. Lastly, this study was conducted in 78 clinics across 7 states, which represents a geographically diverse sample and strengthening the generalizability of results. However, more research needs to be done to test if the measures are valid in other settings and topic areas.
This study also has some limitations. The stage of implementation of EBAs could have influenced the measurement of some of the variables assessed. Another limitation was the varying numbers of respondents per clinic, with some clinics having as few as 3 respondents. Another potential limitation is that individual respondents played a variety of roles in patient care. These roles may influence their perception of certain clinic the Inner Setting characteristics and could also influence their perception of the extent to which an EBA is being implemented. Nevertheless, one would expect that even if particular clinic providers or staff may not be directly involved in the implementation, they would be able to assess (from their perspective) to what extent the program was being implemented. If they were not even aware of the program, they would likely indicate the program was not yet "fully implemented". Lastly, while the CFIR builds on literature from studies conducted in many countries [10], many of the measures we drew from and the data we collected for the validation

Conclusions
This study provides evidence that the Inner Setting measures described here have structural validity, reliability, and discriminant validity, and that they can be used to represent the clinic-level. Our findings also suggest the Inner Setting measures can be aggregated to represent the clinic-level. Measurement is crucial for any field, and our understanding of how contextual factors influence implementation as well as our ability to intervene upon these factors is dependent on our ability to measure them. This study provides information and measurement tools that can greatly contribute to research aimed at better understanding the implementation of evidence-based programs and practices in FQHC settings. It can also inform the development of implementation interventions to accelerate and improve the use of healthcare innovations, practices, and programs that will lead to increases in health and quality of life and decreased health disparities.