Organizational readiness for implementing change: a psychometric assessment of a new measure

Background Organizational readiness for change in healthcare settings is an important factor in successful implementation of new policies, programs, and practices. However, research on the topic is hindered by the absence of a brief, reliable, and valid measure. Until such a measure is developed, we cannot advance scientific knowledge about readiness or provide evidence-based guidance to organizational leaders about how to increase readiness. This article presents results of a psychometric assessment of a new measure called Organizational Readiness for Implementing Change (ORIC), which we developed based on Weiner’s theory of organizational readiness for change. Methods We conducted four studies to assess the psychometric properties of ORIC. In study one, we assessed the content adequacy of the new measure using quantitative methods. In study two, we examined the measure’s factor structure and reliability in a laboratory simulation. In study three, we assessed the reliability and validity of an organization-level measure of readiness based on aggregated individual-level data from study two. In study four, we conducted a small field study utilizing the same analytic methods as in study three. Results Content adequacy assessment indicated that the items developed to measure change commitment and change efficacy reflected the theoretical content of these two facets of organizational readiness and distinguished the facets from hypothesized determinants of readiness. Exploratory and confirmatory factor analysis in the lab and field studies revealed two correlated factors, as expected, with good model fit and high item loadings. Reliability analysis in the lab and field studies showed high inter-item consistency for the resulting individual-level scales for change commitment and change efficacy. Inter-rater reliability and inter-rater agreement statistics supported the aggregation of individual level readiness perceptions to the organizational level of analysis. Conclusions This article provides evidence in support of the ORIC measure. We believe this measure will enable testing of theories about determinants and consequences of organizational readiness and, ultimately, assist healthcare leaders to reduce the number of health organization change efforts that do not achieve desired benefits. Although ORIC shows promise, further assessment is needed to test for convergent, discriminant, and predictive validity.


Introduction
Attempts to implement new programs, practices, or policies in organizations often fail because leaders do not establish sufficient organizational readiness for change [1]. Organizational readiness refers to 'the extent to which organizational members are psychologically and behaviorally prepared to implement organizational change' [2]. When organizational readiness is high, members are more likely to initiate change, exert greater effort, exhibit greater persistence, and display more cooperative behavior, which overall results in more effective implementation of the proposed change [3]. Conversely, when organizational readiness is low, members are more likely to view the change as undesirable and subsequently avoid, or even resist, planning for the effort and participating in the change process.
Although organizational readiness for change in healthcare settings has been identified as an important issue [4], research on the topic is hindered by the absence of a brief, reliable, and valid measure of the construct. Until recently, the primary focus in the literature has been on individual readiness for change, not on organizational readiness for change [2]. Although there have been several attempts at measuring organizational readiness (e.g., [5][6][7]), most available instruments are not theory-based and exhibit limited reliability and validity [2,6]. Furthermore, those with desirable psychometric properties have too many items to be practical for use in busy healthcare settings [4]. Until a brief, reliable, and valid measure is developed, we cannot advance scientific knowledge of the determinants or outcomes of readiness or provide evidence-based guidance to organizational leaders about how to increase readiness.
In this article, we report the results of a psychometric assessment of a new, theory-based measure we call Organizational Readiness for Implementing Change (ORIC). We developed ORIC by drawing on Weiner's theory of organizational readiness for change [8] and assessed its content adequacy, structural validity, reliability, and construct validity in a series of studies. These studies provide psychometric evidence for a brief yet robust measure that could be used to advance implementation science and practice.

Conceptual framework
Organizational readiness for change is a multilevel construct that can be assessed at the individual or supraindividual levels (e.g., team, department, or organization). In this analysis, we focus on the supra-individual level because the implementation of many promising innovations in healthcare, such as patient-centered medical homes, Accountable Care Organizations, and electronic health records, require collective, coordinated actions by many organizational members [8]. There are three considerations that researchers should keep in mind when measuring readiness for change at supra-individual levels. First, items should be group-referenced (e.g., 'We are ready to…') rather than self-referenced (e.g., 'I am ready to…') so that they focus respondents' attention on collective readiness rather than personal readiness [8]. Second, assessment should involve multiple respondents from the same unit. Proxy reporting of collective readiness by single respondents (e.g., the CEO) is unlikely to generate valid data [2]. Third, inter-rater agreement should be checked before aggregating individuals' readiness perceptions to supraindividual levels of analysis. If, for example, one-half of an organization's members perceive readiness to be high and one-half perceive it to be low, then the average of organizational members' perceptions of readiness describes none of their views [8].
Organizational readiness for change is not only a multilevel construct, but a multi-faceted one. We used Weiner's theory of organizational readiness for change [8] to identify and define the two facets we aimed to measure as well as the immediate determinants of each ( Figure 1). The first facet of readiness, change commitment, reflects organizational members' shared resolve to implement a change. A hypothesized determinant of change commitment is change valence. Organizational members may value an organizational change for any number of reasons; why they value it may be less important than how much they value it. The second facet of readiness, change efficacy, reflects organizational members' shared belief in their collective capability to implement a change [3,8]. Hypothesized determinants of change efficacy include task knowledge, resource availability, and situational factors. Change efficacy is expected to by high when organizational members know what to do and how to do it, when they perceive they have the resources they need to implement the change, and when they perceive situational factors such as timing to be favorable. An immediate outcome of readiness is organizational members' change-related effort. For example, when readiness is high, organizational members are more likely to initiate the change, put forth greater effort in support of the change, and exhibit greater persistence in the face of obstacles.
Given the potential impact of organizational readiness for change on implementation outcomes, a valid, theorybased measure would be useful for research and practice. The measure should reflect both facets of readinesschange commitment and change efficacy-and differentiate the facets from their determinants to ensure direct measurement of the readiness construct [9]. In addition to being valid and reliable, the assessment must be brief in order to be practical in busy healthcare settings.

Methods
We conducted four studies to assess the psychometric properties of a newly developed, theory-based measure of organizational readiness for implementing change (ORIC). In study one, we assessed the content adequacy of the new measure using quantitative methods [10]. In study two, we examined the measure's factor structure and reliability in a laboratory simulation. In study three, we assessed the reliability and validity of an organizationlevel measure of readiness based on aggregated individuallevel data from study two. In study four, we conducted a small field study to validate the results of study three.

Method
Content adequacy refers to 'the degree to which a measure's items are a proper sample of the theoretical content domain of a construct' [10]. Typically, researchers assess content adequacy by asking a small group of experts to assess whether a measure's items seem representative of the construct's theoretical content. Often, the assessment is informal and qualitative. In this study, we took a formal, quantitative approach by asking a large group of judges to rate the extent to which items reflect the constructs they were intended to measure.

Design, participants, and procedures
Study participants were a convenience sample of 98 students enrolled in undergraduate, masters, and doctoral programs in health policy and management at a university located in the southeastern United States. Convenience samples are acceptable for content adequacy studies because such studies require no particular expertise or experience, merely sufficient reading skill and intellectual ability to perform the rating task [10]. Twenty-eight percent of study participants were men, 72% women. Fourteen percent were 18 -21 years old, 19% were 22 -25 were years old, 18% were 26 -28 years old, 7% were 29 -31 years old, and 33% were 32 years old or older.
Study participants were randomly assigned to complete one of two web-based surveys. One survey consisted of 15 items that reflected two related constructs: change commitment and change valence. The other survey consisted of 15 items that reflected three related constructs: change efficacy, task knowledge, and resource availability. Each survey provided detailed instructions and examples of how to perform the rating task. Using the definitions that we provided, participants rated the extent to which they thought each item reflected each construct in the survey. For example, participants receiving the first survey rated each item twice: once to indicate the extent to which they thought the item reflected change commitment, and again to indicate the extent to which they thought the item reflected change valence. Rating was done using a fivepoint ordinal scale in which a '1' indicated that an item 'does not reflect the concept at all' and a '5' indicates that an item 'reflects the concept well'.
To minimize ordering effects, one-half of the participants receiving the first survey were randomly assigned to perform the change commitment rating task first; the other one-half performed the change valence rating task first. To minimize learning effects, items were presented in random order within each rating task. The same procedure was used to minimize ordering effects and learning effects in the second survey.

Measures
Five items in the first survey were written to measure change commitment (e.g., 'we are committed to implementing this change'). Ten items were written to measure some aspect of change valence: perceived need (e.g., 'we need to implement this change'), perceived benefit (e.g., 'we believe this change will make things better'), perceived timeliness (e.g., 'we see this change as timely'), and perceived compatibility (e.g., 'we feel this change is compatible with our values'). We also included a  'distracter' item to check on participants' attention and diligence in performing the rating task (e.g., 'we know what it takes to implement this change').
Six items in the second survey were written to measure change efficacy (e.g., 'we can coordinate tasks so that implementation goes smoothly'). Four items were written to measure task knowledge (e.g., 'we know what we need to do to implement this change'). Five items were written to measure resource availability (e.g., we have the resources we need to implement this change'). We also included a 'distracter' item to check on participants' attention and diligence in performing the rating task (e.g., 'the timing is good for implementing this change').

Analysis
Content adequacy involves judgments of item content in relation to theoretically-defined constructs [9]. We considered the item adequate if it met three conditions: the item's highest mean corresponded to the intended aspect of organizational readiness for implementation (e.g., change commitment and not change valence) [11]; the item's mean on the intended aspect of organizational readiness for implementation was at least 0.20 units higher than its mean on its hypothesized determinants [11]; and the item's mean must be greater than or equal to 4 on its intended aspect of organizational readiness [11]. In addition, we performed an analysis of variance (ANOVA) to compare the item's mean rating on the hypothesized theoretical construct (e.g., change commitment) to the item's ratings on alternative constructs (e.g., change valence) [12].

Results
In Table 1, the first five items (C1 -C5) were intended to measure change commitment. The first four of these items (C1 -C4) satisfied all three conditions of the content adequacy test. The fifth item (C5) did not satisfy any condition of the content adequacy test. Study participants rated this item as equally reflecting the concepts of change commitment and change valence. In the absence of more semantic context, the verb 'want' could connote either motivation (commitment) or desire (value) to implement change. Although this item did not pass the content adequacy test, we retained it for further testing in circumstances where the semantic context was more clearly defined. None of the items intended to measure change valence (V1 -V10) passed the content adequacy test as measures of change commitment. This suggests that study participants were able to distinguish items intended to measure change commitment from items intended to measure its hypothesized determinant. Also, the distracter item (D1) did not satisfy the three conditions for fit with the construct of change commitment or change valence.
In Table 2, the first six items (E1 -E6) were intended to measure change efficacy. All six items satisfied all three conditions of the content adequacy test. None of the items intended to measure task knowledge (K1 -K4) or resource availability (A1 -A5) passed the content adequacy test as measures of change efficacy. This finding suggests that study participants were able to distinguish items intended to measure change efficacy from items intended to measure its hypothesized determinants. Finally, the distracter item (D2) did not satisfy the three conditions for fit with change efficacy, task knowledge or resource availability. Table 1 and Table 2 note statistically significant differences identified by the ANOVA procedures comparing each item's mean rating on its hypothesized theoretical construct and its ratings on alternative constructs.

Method
In study two, we ascertained the factor structure and reliability of our new measure in a laboratory study wherein we manipulated the two aspects of organizational readiness in vignettes that described a hospital about to implement a major organizational change.

Design, participants, and procedures
The study used a 2 (high-versus low-change commitment) × 2 (high-versus low-change efficacy) between-subjects design. Study participants read one of four randomly assigned vignettes that described a hospital's readiness to implement meaningful use of electronic health records. They then rated the hospital's readiness for implementation as they believed a hospital employee would rate it. Hospital readiness was manipulated in the vignettes by describing various levels of the following: employee knowledge of meaningful use, employee morale and enthusiasm, resources available for implementation, and situational factors such as leadership stability and timing of the implementation. Study participants were a convenience sample of 140 students enrolled in undergraduate, masters, and doctoral programs in health policy and management or health behavior at a university located in the Southeastern United States. These programs familiarize students with the structures, workforce, and services of healthcare organizations. Twenty-six percent of study participants were men; 74% were women. Eleven percent were 18 -21 years old, 25% were 22 -25 years old, 28% were 26 -28 years old, 16% were 29 -31, and 20% were 32 years old or older. A convenience sample was acceptable because participants needed only sufficient intellectual capability to comprehend the vignette and perform the rating task. This approach has been used in previous studies [13].

Measures
Participants rated the hospital's readiness for implementation on 12 items using a 5-point ordinal scale that ranged from 'disagree' to 'agree' (Additional file 1 contains the items). All five items used to assess change commitment in study one were included. Although one of the change commitment items did not pass the content adequacy tests ('We want to implement this change'), we retained it in order to see how it performed in a more clearly defined semantic context (i.e., the vignettes). All six items used to assess change efficacy in study one were also included. A seventh change efficacy item was added ('We can handle the challenges that might arise in implementing this change') in order to capture the organization's ability to respond to problems that emerge during implementation. Although not included in study one, this item passed a content adequacy test involving the same methods but with only 10 study participants. Item content did not change from study one to study two, although item wording did. In study two, efficacy items were phrased 'People who work here are confident that…' This wording was not used in study one because it would have made it obvious to raters that they were efficacy items (based on the use the word 'confident'), which would have defeated the purpose of the content adequacy test.

Analysis
To determine whether change commitment and change efficacy are distinct, yet related facets, we conducted an exploratory principal-axis factor analysis (EFA) with oblique rotation, followed by a confirmatory factor analysis (CFA) specifying two factors. We chose oblique rotation because we expected, based on theory [9], that change commitment and change efficacy represented interrelated facets of organizational readiness and, therefore, would be correlated. Oblique rotation allows for identification of unique contributions to variance and excludes contributions to variance from overlap between the correlated factors [14]. To determine the number of factors to retain in the EFA, we ran the parallel analysis procedures (PA), which Zwick and Velicer [15] found outperforms other methods such as the Kaiser criterion, scree plot, and Bartlett's chi-square test. In PA, a set of random data correlation matrices are created and their eigenvalues are computed. These eigenvalues are compared to those computed from the study data, and only those factors from the study data with eigenvalues greater than those from the random data are retained. To interpret the retained factors and assess item performance, we used the following conventions: factor loadings greater than 0.6, factor cross-loadings less than 0.3, communalities greater than 0.5, and face-valid factor Notes: The first five items were intended to measure change commitment (C). The last item was a distractor (D). The remaining items were intended to measure aspects of change valence (V). The following definitions were provided to participants: Change commitment refers to organizational members' shared resolve to pursue courses of action that will lead to the successful implementation of the change effort. Change valence refers to the value that organizational members assign to a specific, impending organizational change. For example, do they think the change is needed, important, beneficial, or worthwhile? Notes: The first six items were intended to measure change efficacy (E). The next four items were intended to measure task knowledge (K). The next five items were intended to measure resource availability (A). The last item was a distractor (D). The following definitions were provided to participants: Change efficacy refers to organizational members' shared beliefs (or confidence) in their collective capabilities to organize and execute the courses of action required to implement the change successfully. Task knowledge refers to organizational members' knowledge of the tasks that have to be performed, the resources that will be needed, the amount of effort that will be required, and the amount of time that it will take to implement a change. Resource availability refers to organizational members' perceptions of the availability of money, people, equipment, and other resources needed to implement a change.
membership [16]. To assess CFA model fit, we used the following rules of thumb: comparative fit index (CFI) and Tucker-Lewis fit index (TLI) equal to or greater than 0.95, standard root mean square residual (SRMR) less than 0.05, and root mean square error of approximation (RMSEA) equal to or less than 0.06 [17]. We then formed scales for change commitment and change efficacy based on the CFA results and computed alpha coefficients for each scale to assess inter-item consistency. Finally, we conducted a 2 × 2 analysis of variance (ANOVA) to determine whether change commitment and change efficacy varied as expected by the manipulation of information in the vignettes.

Results
EFA yielded two factors with eigenvalues greater than 1 and greater than those generated from the PA of 10 random data correlation matrices. All five items intended to measure change commitment (C1 -C5) exhibited factor loadings greater than 0.6 on the first retained factor, cross-loadings less than 0.25 on the second factor, and communalities greater than 0.5 (see Table 3). Five of the seven items intended to measure change efficacy exhibited factor loadings greater than 0.6 on the second retained factor, cross-loadings less than 0.25 on the first factor, and communalities greater than 0.5. Based on these results, we interpreted the factors as Change Commitment (factor one) and Change Efficacy (factor two). Two items intended to measure change efficacy exhibited loadings between 0.5 and 0.6 on the Change Commitment factor and cross-loadings on the Change Efficacy factor less than 0.25. These items were: (1) 'People who work here confident that they can keep the momentum going in implementing this change', and (2) 'People who work here feel confident that the organization can get people invested in implementing this change'. Apparently, these items had a motivational connotation that study participants associated with items intended to measure change commitment. Since these two items did not load on the expected factor (and lacked face validity as measures of change commitment), we dropped them and re-ran the EFA. Re-analysis did not alter the number of factors or the pattern of factor loadings for the remaining items. We then ran a two-factor CFA using the ten items retained from the re-analysis. The two-factor CFA converged and demonstrated a strong fit when change commitment and change efficacy were allowed to correlate. The comparative fit index (CFI) equaled 0.98, the Tucker-Lewis fit index (TLI) = 0.97, the standard root mean square residual (SRMR) was 0.04 and the root mean square error of approximation (RMSEA) was = 0.06. Standardized parameter estimates are provided in Table 4. No post hoc modifications were necessary given the good fit indicated by the fit indices; residual analysis did not indicate any problems. Alpha coefficients for the five-item Change Commitment Scale and the five-item Change Efficacy Scale were 0.92 and 0.88 respectively. The correlation between the unit-weighted scale scores was similar to that between the factors (r = 0 0.56, p <0.001).
The 2 × 2 ANOVAs revealed small-to medium-size main effects for each manipulation in the vignette and small-size interaction effects (see Table 4) [18]. The manipulation of information about change commitment and change efficacy explained 36% of the variation in the Change Commitment Scale scores and 19% of the Change Efficacy Scale scores. The variance accounted for seemed reasonable given that study participants had to infer the level of change commitment and change efficacy in the vignettes. Plots of the marginal means (not shown) indicate that participants had no difficulty distinguishing change commitment and change efficacy when these factors, individually or jointly, were low. They had more difficulty distinguishing them when both factors were high.

Method
Having established, at least provisionally, the reliability of our new measure at the individual level of analysis, we sought next to ascertain its reliability and validity at the organization level of analysis. Organizational readiness for implementing change is one of many constructs that are potentially relevant to implementation science that can be conceptualized at the organizational level of analysis even though the source of data for the construct resides at the individual level. Although it is tempting to simply compute an organization-level mean from the individual-level data, it is important to first check the reliability and validity of that mean to determine whether it is an adequate representation of the organization-level construct [19,20]. Organizational readiness is conceived as a 'shared team property', that is, a psychological state that organizational members hold in common [8]. Whether this shared team property exists in any given organization is an empirical issue that requires examination of inter-rater reliability and inter-rater agreement. If sufficient inter-rater reliability and inter-rater agreement exist (i.e., organizational members agree in their readiness perceptions), then analysis of organizational readiness as a shared team property can proceed. That is, an organization-level mean can be computed that reliably and validly reflects the perceptions of organizational members as a group. If insufficient inter-rater reliability and inter-rater agreement exist (i.e., organizational members disagree in their readiness perceptions), organizational readiness as a shared team property does not exist. In that case, an organization-level mean does not reliable and validly reflect the perceptions of organizational members as a group. In study three, we assess whether change commitment and change efficacy can be measured reliably and validly at the organizational level of analysis using individual-level data from study two.

Design, participants, and procedure
Study three involved the same design, participants, and procedures as study two. Thus, the data were the same, but they were analyzed differently. In study three, we treated the 140 study participants as if they were employees of the hospitals depicted in the vignettes. Thus, for each of the four hospitals, there were 35 'employees' rating the hospital's readiness to implement meaningful use of electronic health records. One of the advantages of using data from a laboratory study is that we can test whether our measures reliably and validly differentiate organizations that systematically differ by design in levels of change commitment and change efficacy.

Measures
Study three used the five-item Change Commitment Scale and the five-item Change Efficacy Scale developed in study two.

Analysis
To assess the reliability of the organization-level means for change commitment and change efficacy, we computed values for two intraclass correlation coefficients-ICC(1) and ICC(2)-from a one-way random-effects ANOVA. ICC(1) provides an estimate of the extent to which individual-level variability on a given measure is explained by higher level units [20,21]. ICC(1) can also be interpreted as an estimate of the extent to which raters are interchangeable-that is, the extent to which one rater in a group could represent all raters within that group [20]. The larger the value of ICC(1), the more alike the raters are. ICC(2) is a mathematical function of ICC(1), adjusted for group size. ICC(2) indicates the reliability or stability of group-level means in a sample [22]. The larger the value of ICC(1) and the larger the number of respondents per group, the more reliable the group means and, hence, the greater the value of ICC(2).
To assess the validity of the organization-level means as measures of organization-level constructs, we computed and tested the statistical significance of two indices of inter-rater agreement: r WG(J) and AD M(J) . The r WG(J) index indicates the extent of consensus, agreement, or within-unit variability in a multi-item scale by comparing within-group variances to an expected variance under the null hypothesis of no agreement [20]. We assumed the null followed a uniform (rectangular) distribution. The AD M(J) index, also known as the mean absolute deviation, is used less often than the r WG(J) , but allows more direct conceptualizations of inter-rater agreement in the units of the original measure. Both the indices and their respective critical values, which were obtained from an empirical distribution based on 100,000 simulations and corresponded to a 0.05 level of statistical significance, were calculated using the Multilevel Modeling in R package [23,24]. The r WG(J) critical value is calculated based on the 95th percentile and the AD M(J) based on the 5 th percentile [23]. Formulae for r WG(J) , AD M(J), ICC(1), and ICC(2), can be found in the Appendix.

Results
The one-way ANOVA for the Change Commitment Scale yielded an ICC(1) of 0.72, and an ICC(2) of 0.98 (p <0.001). The one-way ANOVA for the Change Efficacy Scale yielded an ICC(1) of 0.51, and an ICC(2) of 0.97 (p <0.001). The magnitude and statistical significance of these correlations indicate the organizationlevel means for the Change Commitment scale and the Change Efficacy scale were reliable. More individuallevel variance was explained by hospital (vignette) assignment for change commitment than for change efficacy (72% versus 51%).
For the Change Commitment Scale, the sample values for r WG(J) and AD M(J) for the ensemble of four hospitals were 0.87 and 0.73 respectively. For the Change Efficacy Scale, the sample values for r WG(J) and AD M(J) for the ensemble of four hospitals were 0.82 and 0.80 respectively. Table 5 summarizes the empirical distributions in terms of their means, medians, standard deviations, 5 th percentiles, and 95 th percentiles. For both scales, the sample values for the ensemble of hospitals for r WG(J) exceeded the 95 th percentile of their corresponding null distributions and the corresponding p-values were almost zero. Likewise, the sample values for AD M(J) for the ensemble of hospitals were smaller than the 5 th percentile of their corresponding null distributions and the corresponding p-values were almost zero. Therefore, for both the r WG(J) and AD M(J) indexes, the null hypothesis of no agreement in the ensemble of hospitals in the sample was rejected. Table 6 reports for each scale the sample values for r WG(J) and AD M(J) for the four hospitals and the 95 th and 5 th percentiles of the null distributions for r WG(J) and AD M(J) respectively. For both scales, the sample values for r WG(J) for all four hospitals exceeded the 95 th percentile of the null distribution for r WG(J) . Likewise, for both scales, the sample values for AD M(J) for all four hospitals were smaller than the 5 th percentile of the null distribution for AD M(J) . These results indicate that sufficient inter-rater agreement exists for each of the four hospitals to justify the construction of an organizational readiness for implementing change measure from individual-level perceptual data. The organization-level means for the Change Commitment Scale and the Change Efficacy Scale for each hospital are shown in second and third columns of Table 6.

Design, participants, and procedure
For study four we used a convenience sample of international non-governmental organizational (INGO) staff (n = 311) from around the world who responded to an online survey about their organizational readiness to implement mobile technology for monitoring and evaluation systems in health programs. All INGOs were based in the United States. Of the study participants, 54.5% were men and 45.5% were women. Three percent of respondents were under 25 years old, 35% were 25 -35 years old, 35% were 36 -45 years old, 20% were 45 -55 years old, and 7% were older than 55. A total of 44% of respondents were based in the United States and 56% were based in other countries (primarily in Africa).

Measures
The online survey in this study included four items to assess change commitment and five items to assess change efficacy. One of the five change commitment items used in study two and study three ('We will do whatever it takes to implement this change') was inadvertently dropped in survey construction.

Analysis
We conducted a two-factor CFA to assess factor structure, computed alpha coefficients for the resulting scales to assess inter-item consistency, computed ICC(1) and ICC(2) from a one-way random-effects ANOVA to assess the reliability of organization-level means, and tested the statistical significance of r WG(J) and AD M(J) to assess the validity of organization-level means. For these assessments, we included only organizations (n = 10) represented by more than five survey respondents.

Results
The two-factor CFA model converged and demonstrated a good fit when change commitment and change efficacy were allowed to correlate. The comparative fit index (CFI) equaled 0.97, the Tucker-Lewis fit index (TLI) = 0.96, the standard root mean square residual (SRMR) was 0.05 and the root mean square error of approximation (RMSEA) was = 0.08. No post-hoc modifications were necessary given the good fit indicated by the fit indices; residual analysis did not indicate problems. Standardized parameter estimates are provided in Table 7.
Alpha coefficients for the four-item Change Commitment Scale and the five-item Change Efficacy Scale were 0.91 and 0.89 respectively. The correlation between the unit-weighted scale scores was similar to the correlation between the factors (r = 0.60, p <0.001).
The one-way ANOVA for the Change Commitment Scale yielded an ICC(1) of 0.09 (p < 0.02), and an ICC (2) of 0.56. Using interpretative conventions, the ICC(1) value approximates a 'medium' effect size and the ICC(2) value suggests a moderate level of reliability [15,25]. The one-way ANOVA for the Change Efficacy Scale yielded an ICC(1) of 0.02, and an ICC(2) of 0.16 (p <0.30). Using interpretative conventions, the ICC(1) value indicates a 'small' effect size and the ICC(2) suggests a low level of reliability. These correlations are lower than those obtained in study three, where we could manipulate and standardize the information that respondents received about the readiness of the four hypothetical hospitals depicted in the vignettes. However, these correlations are close to the values reported for subjective measures in implementation studies in healthcare settings (median = 0.04; interquartile range = 0.01 -0.06) [26]. Although results would support the aggregation of individual-level data into an organization-level mean for change commitment, they would not support such aggregation for change efficacy.
A different picture emerges from an analysis of the inter-rater agreement statistics. For the Change Commitment Scale, the sample values for r WG(J) and AD M(J) for Table 6 Significance tests for the means of r WG(J) and AD M(J) for each vignette hospital (study three)  Note: Organizational readiness was manipulated in the vignettes as follows: Hospital 1 high commitment-high efficacy; Hospital 2 high commitment-low efficacy; Hospital 3 low commitment-high efficacy; and Hospital 4 low commitment-low efficacy. The hospital-level means for change commitment (CC) and change efficacy (CE) were consistent with the experimental manipulation. For all four hospitals, the sample values for r WG(J) exceeded the 95 th percentile values of the empirical distributions derived from 100,000 simulated random samples. Likewise, for all four hospitals, the sample values for AD M(J) were smaller than the 5 th percentile values of the empirical distributions derived from 100,000 simulated random samples. Therefore, for each hospital, we reject the null hypothesis of no agreement based on the uniform (rectangular) distribution.   [15,27]. In contrast to the results for ICC(1) and ICC (2), these results would support the aggregation individual-level into organization-level means for both change efficacy and change commitment. Table 9 reports the sample values and the 95 th and 5 th percentiles of the null distributions for r WG(J) and AD M(J) , respectively, for each scale for the 10 INGOs. In eight of the INGOs, for both scales the sample values for r WG(J) exceeded the 95 th percentile of the null distribution and were smaller than the 5 th percentile of the null distribution for AD M(J) . These results indicate sufficient inter-rater agreement for these eight INGOs to justify the construction of organization-level means for change commitment and change efficacy from individual-level data. For INGO 3 and INGO 8, the sample values for r WG(J) for the two scales did not exceed the 95 th percentile of the null distribution nor were the sample values for AD M(J) for the two scales smaller than the 5 th percentile of the null   distribution for AD M(J) . These results indicate insufficient inter-rater agreement for two INGOs to justify the construction of organization-level means for change commitment and change efficacy from individual-level data.

Discussion
In this article, we provide psychometric evidence in support of a new, brief, theory-based measure of organizational readiness for change, which we call Organizational Readiness for Implementing Change (ORIC). Content adequacy assessment indicated that the items that we developed to measure change commitment and change efficacy reflected the theoretical content of these two facets of organizational readiness and distinguished these two facets from hypothesized determinants of readiness. Exploratory and confirmatory factor analysis in the lab and field study revealed two correlated factors, as expected, with good model fit and high item loadings. Reliability analysis in the lab and field study showed high inter-item consistency for the resulting individual-level scales for change commitment and change efficacy. Inter-rater reliability and inter-rater agreement statistics supported the aggregation of individual level readiness perceptions to the organizational level of analysis.
As expected, the lab study provided stronger evidence than the field study for the reliability and validity of organization-level means as representations of organization-level measures of readiness. In the lab study, we manipulated and standardized the information that study participants received about the organizational readiness of the hospitals depicted in the vignettes to implement meaningful use of electronic health records. In the field study, we made no effort to present study participants with consistent information about their organization's readiness to use mobile phone technology to monitor and evaluate international health and development programs. Likewise, we made no effort to select international non-governmental organizations that might be expected a priori to vary widely in organization readiness. To our knowledge, organizational leaders made no effort to shape organizational members' perceptions of readiness. Even under these conditions, organizational members exhibited 'strong agreement' in their perceptions of organizational readiness, and the overall level of readiness among the set of participating organizations was high.
The discrepant results in the inter-rater reliability statistics [i.e., ICC(1) and ICC(2)] and inter-rater agreement statistics (r WG(J) and AD M(J) ) for change efficacy highlight an important difference in how these two types of statistics measure 'similarity' in organizational members' ratings. Inter-rater reliability indicates the relative consistency (or rank order consistency) in ratings, whereas inter-rater agreement indicate the absolute consensus (or interchangeability) in ratings. LeBreton and colleagues [15] observe that strong levels of inter-rater agreement can be masked by subtle inconsistencies in the rank orders of ratings, especially when the between-unit variance is restricted (e.g., all organizations are rated high or low). For example, two sets of ratings on a seven-point scale (rater one = 7, 6, 6, 7, 7, 6, 6, 7, 7; rater two = 6, 7, 7, 7, 7, 6, 6, 6, 7, 7) would generate a mean r WG(J) of 0.94 and an ICC(1) of only 0.04. LeBreton et al. encourage investigators to examine multiple indicators of inter-rater reliability and inter-rater agreement, but caution that one type of statistic may be more relevant than another depending on the research question. In this case, we were interested in the psychometric question of whether individual perceptual data on readiness could be aggregated to the organizationlevel of analysis. Our field study results suggested that sufficient consensus existed within the INGOs to measure readiness at the organizational level; however, our results also suggested, for the participating INGOs, betweengroup variation in change efficacy scores might be insufficient to warrant an organization-level analysis of the determinants or outcomes of this facet of readiness. Although ORIC shows promise, further psychometric assessment is warranted. Specifically, the measure should be tested for convergent, discriminant, and predictive validity. Convergent validity could be assessed by comparing ORIC to other reliable, valid, but much longer measures, such as the Organizational Readiness for Change Assessment [5] or the Texas Christian University (TCU) Organizational Readiness for Change instrument [28]. Discriminant validity could be assessed by comparing ORIC to measures of constructs related to, yet distinct from, organizational readiness for change (e.g., organizational culture). Finally, predictive validity could be assessed by examining the association of ORIC with hypothesized outcomes of readiness, such as championing change and implementation effectiveness [8]. Assessment of predictive validity is particularly important for determining whether organizational level readiness should be a key priority for leaders of organizational change efforts.

Limitations
This study had a few limitations. First, one item in the Change Commitment Scale was dropped inadvertently in the field test in study four. Results from our previous three studies suggest that including the item would not have adversely affected the reliability and validity of the scale in study four. Nevertheless, future field studies should check this.
Second, we carried forward to study two one item (i.e., 'We want to perform this change') that did not satisfy the conditions for content adequacy in study one. We did so because we believed participants may not have had sufficient semantic context to determine the meaning of 'want.' We obtained encouraging results for this item from the exploratory and confirmatory factor analysis in the laboratory study (study two and 3) and the confirmatory factor analysis in the field study (study four). Nevertheless, further testing on this item is warranted.
Third, in study two and study three we asked graduate and undergraduate students to assess organizational readiness as if they were an employee of the hospital described in the vignette. This approach may raise concerns about the validity of these data because the students are not in fact hospital employees. However, we believe this approach is appropriate for our study because it has been used in several previous studies [29,30], the students were enrolled in programs (health policy and management or health behavior) that familiarize students with healthcare settings, and the results of our field test with actual employees (study four) support findings from study two and study three.
Fourth, we could not test for a higher-order organizational readiness for change construct because the structural component of the model would be under-identified with only two factors. Our field study results suggest, however, that constructing a higher-order factor might not be advisable given the moderate correlation of the Change Commitment and Change Efficacy scales and the differences they exhibited in inter-rater reliability. Researchers might wish to retain the scales rather than combine them because they capture related, yet distinct facets of organizational readiness to implement change.
Finally, in study four we did not collect information about the efforts organizational leaders undertook to increase readiness of employees. Such information could have proved useful for assessing why inter-rater reliability for the Change Efficacy Scale did not support aggregation of the individual-level data into an organizational-level mean. For example, it is possible that some individuals were provided more information about the impending change than others, resulting in different views on the organization's readiness. Finally, because each organization in study four exhibited a high level of readiness, it would be useful to test ORIC in a sample with more variation in readiness between organizations.

Conclusion
A brief, reliable, and valid measure of organizational readiness for change, such as ORIC, could advance implementation research and practice. For research, such a measure would enable the testing of theories about the determinants and consequences of organizational readiness. Such advancements could lead to answers for a number of important questions: Is organizational readiness for change important for all types of changes and in all types of organizations? Is readiness a necessary, but not sufficient, condition for effective implementation of a change? Is there a readiness threshold that should be met prior to beginning implementation of the change? Do all organizational representatives need to be ready for the change, or is readiness only important for specific groups of individuals? Healthcare leaders could use answers to such questions, combined with ORIC, to assess organizational readiness for change in their own settings. Doing so would be useful for developing implementation strategies and allocating resources for a specific change. Ultimately, valid measurement of organizational readiness for change could reduce the number of health organization change efforts that either do not lead to desired benefits or fail altogether.

Additional file
Additional file 1: Organizational Readiness for Implementing Change (ORIC).