Skip to main content

Table 3 Psychometric and Pragmatic Evidence Rating Scale (PAPERS) domains and definitions

From: Quantitative measures of health policy implementation determinants and outcomes: a systematic review




Pragmatic criteria


Number of items; excellent < 10 items

Language simplicity

Readability of items, ranging from accessible only to experts (poor) to readable at or below an 8th grade level (excellent)

Cost to use instrument

Monetary amount researchers pay to use the instrument; excellent = freely available in the public domain

Training ease

Extent of assessor burden due to required trainings versus manualized self-training; excellent = no training required by instrument developer

Analysis ease

Extent of assessor burden due to complexity of scoring interpretation; excellent = cutoff scores with value labels and automated calculations

Psychometric properties


A measure of generalizability based on sample size and means and standard deviations of item values

Internal consistency


Convergent construct validity

Observed association in data of two theoretically related constructs, assessed through effect sizes and correlations

Discriminant construct validity

Observed differentiation (lack of association) of two theoretically distinct constructs, assessed through effect sizes and correlations

Known-groups validity

Extent to which groups known to have different characteristics can be differentiated by the measure

Predictive criterion validity

Extent to which a measure can predict or be associated with an outcome measured at a future time

Concurrent criterion validity

Correlation of a measure’s observed scores with scores from a previously established measure of the construct


Extent to which a measure can detect changes over time, i.e., clinically important not just statistically significant changes over time

Structural validity

Structure of test covariance, i.e., extent to which groups of items increase or decrease together versus a different pattern, assessed by goodness of fit of factor analyses or principal component analyses

  1. Lewis et al. [11], Stanick et al. [42]
  2. Each domain is scored from poor (− 1), none/not reported (0), minimal/emerging (1), adequate (2), good (3), or excellent (4). Specific rating scales for each domain are provided in Supplemental Tables 4 and 5