Table 3 Psychometric and Pragmatic Evidence Rating Scale (PAPERS) domains and definitions

From: Quantitative measures of health policy implementation determinants and outcomes: a systematic review

Pragmatic criteriaBrevityNumber of items; excellent < 10 items
Language simplicityReadability of items, ranging from accessible only to experts (poor) to readable at or below an 8th grade level (excellent)
Cost to use instrumentMonetary amount researchers pay to use the instrument; excellent = freely available in the public domain
Training easeExtent of assessor burden due to required trainings versus manualized self-training; excellent = no training required by instrument developer
Analysis easeExtent of assessor burden due to complexity of scoring interpretation; excellent = cutoff scores with value labels and automated calculations
Psychometric propertiesNormsA measure of generalizability based on sample size and means and standard deviations of item values
Internal consistencyReliability
Convergent construct validityObserved association in data of two theoretically related constructs, assessed through effect sizes and correlations
Discriminant construct validityObserved differentiation (lack of association) of two theoretically distinct constructs, assessed through effect sizes and correlations
Known-groups validityExtent to which groups known to have different characteristics can be differentiated by the measure
Predictive criterion validityExtent to which a measure can predict or be associated with an outcome measured at a future time
Concurrent criterion validityCorrelation of a measure’s observed scores with scores from a previously established measure of the construct
ResponsivenessExtent to which a measure can detect changes over time, i.e., clinically important not just statistically significant changes over time
Structural validityStructure of test covariance, i.e., extent to which groups of items increase or decrease together versus a different pattern, assessed by goodness of fit of factor analyses or principal component analyses
  1. Lewis et al. [11], Stanick et al. [42]
  2. Each domain is scored from poor (− 1), none/not reported (0), minimal/emerging (1), adequate (2), good (3), or excellent (4). Specific rating scales for each domain are provided in Supplemental Tables 4 and 5