Skip to main content

Understanding causal pathways within health systems policy evaluation through mediation analysis: an application to payment for performance (P4P) in Tanzania



The evaluation of payment for performance (P4P) programmes has focused mainly on understanding contributions to health service coverage, without unpacking causal mechanisms. The overall aim of the paper is to test the causal pathways through which P4P schemes may (or may not) influence maternal care outcomes.


We used data from an evaluation of a P4P programme in Tanzania. Data were collected from a sample of 3000 women who delivered in the 12 months prior to interview and 200 health workers at 150 health facilities from seven intervention and four comparison districts in Tanzania in January 2012 and in February 2013. We applied causal mediation analysis using a linear structural equation model to identify direct and indirect effects of P4P on institutional delivery rates and on the uptake of two doses of an antimalarial drug during pregnancy. We first ran a series of linear difference-in-difference regression models to test the effect of P4P on potential mediators, which we then included in a linear difference-in-difference model evaluating the impact of P4P on the outcome. We tested the robustness of our results to unmeasured confounding using semi-parametric methods.


P4P reduced the probability of women paying for delivery care (−4.5 percentage points) which mediates the total effect of P4P on institutional deliveries (by 48%) and on deliveries in a public health facility (by 78%). P4P reduced the stock-out rate for some essential drugs, specifically oxytocin (−36 percentage points), which mediated the total effect of P4P on institutional deliveries (by 22%) and deliveries in a public health facility (by 30%). P4P increased kindness at delivery (5 percentage points), which mediated the effect of P4P on institutional deliveries (by 48%) and on deliveries in a public health facility (by 49%). P4P increased the likelihood of supervision visits taking place within the last 90 days (18 percentage points), which mediated 15% of the total P4P effect on the uptake of two antimalarial doses during antenatal care (IPT2). Kindness during deliveries and the probability of paying out of pocket for delivery care were the mediators most robust to unmeasured confounding.


The effect of P4P on institutional deliveries is mediated by financing and human resources factors, while uptake of antimalarials in pregnancy is mediated by governance factors. Further research is required to explore additional and more complex causal pathways.

Peer Review reports



Much of the focus of programme evaluation has been on outcome measurement and finding out whether or not a programme works, with randomised trials being considered to be the gold standard for causal inference [1]. However, when dealing with complex interventions, it is not enough to know whether they work, we also need to understand how they work [2]. Process evaluation enables us to get at the how and why questions and unpack the “black box” surrounding complex interventions and is increasingly promoted within evaluation research [3, 4].

One of the core functions of process evaluation is to shed light on causal mechanisms or the process through which a programme influences an outcome [2, 5]. Examination of causal mechanisms is necessary in order to understand why a programme worked, or why it did not work, and whether the underlying theory was sound. It enables theory building and enhances intervention design [6] and can support the plausibility of outcome effects being associated with the intervention in a non-randomised study [7], increasing the internal validity of evaluation in social sciences [1, 5].

Practically, causal mechanisms can be identified by specifying intermediate outcomes or variables, referred to as mediators, that are on the causal pathway between the intervention and the outcome [6, 8]. The approach used to investigate causal mechanisms involves the estimation of causal mediation effects or the breakdown of total causal effects into indirect effects (the effect of the intervention on the outcome that passes through the mediator) and the direct effect (the effect of the intervention on the outcome through all other pathways) [9]. Causal mediation analysis has been employed to test change pathways within the evaluation of public health programmes, using individual-level psychological [912] or physical characteristics [13], that may affect behaviour change outcomes. A recent study also considered the effect of community along with individual level mediators [14]. To the best of our knowledge, to date, there has been only one study [15] considering mediators which are relevant to the evaluation of interventions aimed at strengthening health systems.

Payment for performance (P4P) is an example of a programme which operates at the health system level with the aim of improving the quality and use of health services to enhance population health outcomes. P4P involves the payment of financial rewards to health workers (and sometimes to health facilities) based on their achievement of pre-specified performance targets. P4P has been widely used in the UK and the USA [16] and increasingly in low- and middle-income countries [17].

There is a growing body of evidence evaluating the impact of P4P [18]. Findings show that overall P4P has a positive effect on targeted service outcomes [19], although the evidence base in low-income settings is limited to a small number of studies [17, 2025]. There has been less attention to the processes by which these outcomes are achieved, particularly in low- and middle-income settings [17, 26]. Three studies examined the implementation process challenges facing a P4P programme [2729] and evaluations are increasingly looking at intermediate outcomes that may have affected service delivery [15, 30]. However, existing studies do not conclusively shed light on the pathways through which P4P achieves outcomes. Either they do not formally test the pathways or they test them on a limited number of mediators [15].

The overall aim of the paper is to test the causal pathways through which payment for performance may (or may not) influence the utilisation of maternal health services. A previous study in Tanzania evaluated the impact of P4P on service use, quality, equity, and health worker motivation over a 13-month period from January 2012 to February 2013 using linear difference-in-difference analysis [31]. The evaluation found a significant and positive effect on two of the targeted indicators: an increase of 8.2 percentage points (CI 3.6 to 12.8) in institutional deliveries, of 6.5 percentage points (CI 1.3 to 11.7) in the rate of deliveries in public facilities, and of 10.3 percentage points (CI 4.3 to 16.2) in the proportion of women receiving two antimalarial doses during antenatal care [21]. In this paper, we extend this analysis to examine the mediators of programme effect and to test the causal pathway to improved outcomes.

Study setting

In 2011, the Ministry of Health and Social Welfare of the Republic of Tanzania introduced a P4P scheme in the Pwani region, with initial payments being made in mid-2012. The P4P scheme comprised four main components.

(1) P4P provided financial bonuses to health facilities and district and regional health managers based on achievement of maternal and child health (MCH) performance targets related to service coverage and quality of care. The targets were either for specific services (e.g., institutional delivery, postnatal care, family planning) or for care provided during a service (e.g., two doses of intermittent preventive treatment for malaria (IPT2) during antenatal care and HIV treatment for HIV-positive pregnant women). At the facility level, at least three quarters of the bonus were distributed among health workers. The health worker incentive represented about 10% of the average health worker monthly salary (about USD 30 per month). District and regional managers received bonus payments based on the performance of facilities in their district and region.

(2) The remaining 25% of the bonus went to the health facility and could be invested in drugs, supplies, or facility improvements. This represents roughly 4% of their average budget.

(3) Supervision was more frequent as facility performance data were verified every 6 months by national, regional, and district stakeholders, whereby achievements of targets, established by the Central Ministry of Health and Social Welfare, were measured and bonuses paid.

(4) Primary care facilities had to open bank accounts in order to receive bonus payments and could retain cost sharing revenue in these accounts, whereas before such funds were held at district level. Health Facility Governing Committees, comprised of health workers and community members are responsible for managing facility resources, including P4P bonus payments, and representatives were to be present to withdraw bonus funds from the bank. However, the community members on the committee were not eligible for bonus payments.

Conceptual framework

Our analysis was guided by a theory of change for how P4P would affect the health system to improve outcomes and a set of underlying assumptions about the change processes involved (Fig. 1).

Fig. 1
figure 1

Theory of change of P4P pathways to impact via health system strengthening

The increase in facility revenue from performance payments, together with financial autonomy resulting from facility-level bank accounts, may generate the need for increased accountability of resource allocation and use at the facility level, potentially stimulating health facility governing committees that are otherwise inactive and improving relations between providers and communities [32]. Greater resources and more accountability over their use are expected to lead to improved availability of equipment, drugs, and medical supplies at the facility, especially in relation to targeted services. P4P is also expected to directly affect supervision linked to the process of performance verification done by health care managers, as this results in more frequent contact between providers and managers, who examine registers and work conduct at the facility.

The direct financial incentives to health workers that are tied to service delivery, coupled with the changes in the availability of resources supervision practices are expected to impact on health workers’ job satisfaction and increase motivation to adhere to clinical guidelines [33, 34] and treat patients respectfully. Health worker knowledge may also increase, through investment in training to improve skills linked to incentivised services or through reallocation of staff to under-resourced or poor-performing facilities. To stimulate service use and achieve targets, health workers may undertake more outreach activities and/or reduce user fees and/or be more likely to enforce exemptions for vulnerable groups [35, 36] or encourage enrolment in community health insurance, as this generates additional revenue for the facility.

We identified a set of indicators to measure each of the steps on the causal pathway (Table 1). The indicators were measured the household, facility, and health worker surveys (Table 1). A full discussion of the effects of P4P on the availability of medical supplies and drugs and on governance of facilities is presented elsewhere [37].

Table 1 Health financing, governance and human resources indicators tested as potential mediators linked to theory of change


Data sources

Surveys were undertaken in all seven districts in the Pwani region where P4P is being implemented and four neighbouring comparison districts with no P4P, with 75 facilities being sampled in each of the study arms, comprising 6 hospitals, 16 health centres, 11 non-public dispensaries, and 42 dispensaries. A health facility survey was conducted at all facilities and 1–2 health workers per facility were interviewed. Interviews were conducted with women who had delivered in the past 12 months sampled within the catchment area of the facilities—a total of 3000 women per round. Baseline data collection was conducted between January and March 2012 and endline data was collected 13 months later [31]. All data could be linked at the facility level [21].

Data analysis

We used causal mediation analysis to identify steps on the causal pathway to the two significant outcomes in the main evaluation (delivery in a health facility and uptake of two doses of antimalarial drugs during pregnancy). We also considered potential mediators of a third outcome, delivery in a public health facility, as we thought that mediators may differ within public compared to non-public facilities. We assessed mediation by applying the linear structural equation model (LSEM) Baron and Kenny [6, 38]. We estimated a single-mediator model to identify the effect of P4P on mediators and the effect of the latter on institutional deliveries and coverage of antimalarials during pregnancy. We followed a four step process to assessing mediation.

Step1: Estimating the impact of P4P on outcomes

First, we replicated the analysis previously carried out by Binyaruka et al. [21] to evaluate the effect of P4P on the selected outcomes using a linear difference-in-difference regression model:

$$ {Y}_{ijt}={\beta}_0^1+{\beta}_1^1\left( P4{P}_j \times {\delta}_t\right) + {\beta}_2^1{\delta}_t+{\beta}_3^1{X}_{ijt}+{\gamma}_j+{\varepsilon}_{ijt}^1 $$

where i is the sample of women who gave birth in the 12 months prior to the interview in the catchment area of facility j at time t. Y ijt is a dummy taking value 1 if the service was received by a woman and 0 otherwise. P4P j is an indicator of whether P4P was implemented in the area where the woman was sampled from. We included facility fixed effects (γ j ) to control for facility-level unobserved time-invariant characteristics and a dummy variable taking the value of 0 at baseline and 1 at endline (δ t ) to account for year fixed effects. We also controlled for individual-level characteristics (education, religion, marital status, occupation, age, number of pregnancies) and household characteristics (insurance status, number of household members, household head education, and wealth based on ownership of household assets and housing particulars) that are known to affect outcomes (X ijt ). The effect of P4P on outcomes was estimated by β 11 . Standard errors were clustered at the health facility level.

Step 2: Identifying mediators

Second, we tested for the effect of P4P on each of the potential mediators identified within the theory of change (Table 1).

As in (1), we used a linear difference-in-difference regression model:

$$ {M}_{ijt}={\beta}_0^2+{\beta}_1^2\left( P4{P}_j \times {\delta}_t\right) + {\beta}_2^2{\delta}_t+{\beta}_3^2{X}_{ijt}+{\gamma}_j+{\varepsilon}_{ijt}^2 $$

where M ijt is the potential mediator and β 21 indicates the effect of P4P on the mediator. All mediators were measured at the health facility level. Items collected through the health worker survey were either averaged across health workers in the same facility, when they concerned individual judgement (satisfaction and motivation), or the highest value was retained when they concerned health facility characteristics (time and content of last supervision visit). Indicators of price, satisfaction with the service received, and kindness during delivery that were measured at the individual level were averaged across women in the same facility catchment area. The woman herself was excluded from the calculation to avoid direct reverse causality and to test how the prevalent reported price and quality affected individual choice [39]. Although some mediators were measured at the individual level and some at the health facility level, Eq. 2 was estimated at the individual level for all mediators, for comparability with step 1 and step 3. Standard errors were clustered at the health facility level.

Step 3: Identifying direct and indirect causal effects

Third, we evaluated the effect of P4P on the outcomes of interest, by re-estimating Eq. 1, including the potential mediators M ijt identified in step 2:

$$ {Y}_{ijt}={\beta}_0^3+{\beta}_1^3\left( P4{P}_j \times {\delta}_t\right) + {\beta}_2^3{\delta}_t+{\beta}_3^3{X}_{ijt}+{\beta}_4^3{M}_{ijt} + {\gamma}_j+{\varepsilon}_{ijt}^3 $$

We ran the analysis separately for each maternal care outcome Y i and for each potential mediator M ijt identified in step 2. If the estimated coefficient of M ijt  (β 34 ) was significant and the effect of P4P was reduced compared to that estimated in (1) (β 31 was smaller than β 11 ), we can infer that the effect of P4P on Y ijt is mediated through M ijt . For each set of outcome and mediators, β 31 measures the direct effect of P4P on Y ijt , while the mediated (or indirect) effect was calculated as the product between β 21 and β 34 , and its significance verified by calculating their bootstrapped standard errors [6]. These analyses were run at the individual level. As for Eqs. 1 and 2, Eq. 3 was estimated using a linear probability model and standard errors were clustered at the health facility level.

Step 4: Sensitivity analysis

The identified mediators can only be considered to be “on the causal pathway” (enabling the measurement of causal mediation effects) under a set of two assumptions, referred to as “sequential ignorability”: first, the intervention assignment is independent of outcomes and mediators and, second, the observed mediator is independent of outcomes given the actual treatment status and pre-treatment confounders (there are no unmeasured confounders that affect both the mediator and the outcome) [40].

The first part of the assumption is satisfied if the treatment is assigned randomly or assumed to be random given the pre-treatment covariates [8]. The use of difference in difference regression methods allows us to control for factors that may lead to the endogenous assignment of the intervention subject to the assumption of parallel trends. We verified that the pre-intervention trends in a selection of mediators and outcomes were parallel between intervention and comparison areas [21].

The second assumption is still required to identify the causal effect of the mediator on the outcome and cannot be formally tested [8, 41]. To address this, Imai et al. [42] propose a measure of the sensitivity to unmeasured confounding. Since the level of correlation between ε 2 ijt and ε 3 ijt reflects the presence of unobservables affecting both the mediator and the outcome, the level at which the mediation effect would be zero provides an indication of how plausible the assumption is. The smaller the level of correlation, the less plausible the assumption. Imai et al. [42] develop their approach using a potential outcome framework and a semi-parametric approach for the identification of direct and mediated effects of the treatment. We set the prediction of potential outcomes to be based on Eqs. 1, 2, and 3 used in the LSEM, so that the sensitivity analysis would apply to the original results obtained. The sensitivity analysis provides the coefficient of correlation (rho) between ε 2 ijt and ε 3 ijt at which the average causal mediation effect (ACME) equals 0 [43].

The LSEM approach to mediation analysis requires no interaction between the intervention and the average causal mediation effect, in or words that the average causal mediation effect is equivalent in intervention and comparison areas. We test this assumption by introducing an interaction term between treatment and mediator in Eq. 3 and testing its significance.

Since the outcomes are observed at the individual level, but the P4P scheme is implemented at the health facility level, we test the sensitivity of our results to the level at which the analysis is carried out by re-estimating Eqs. 1 to 3 on the outcomes measured at the health facility level, based on averages of individuals within the facility catchment area.

We tested for clustering at the district level using a bootstrapping procedure which is recommended when the number of clusters is small [44, 45]. Since multiple hypothesis testing may lead to false rejection of the null hypothesis, we also applied a modified Bonferroni correction to adjust the significant threshold accounting for the correlation between the tested outcomes [30]. All statistical analyses were conducted using STATA 14.


Descriptive statistics

The intervention and comparison groups are similar at baseline in relation to most of the outcomes and mediators considered (Table 2). However, in general, the comparison group performs slightly better than the intervention group in relation to the mediators.

Table 2 Summary statistics of maternal care outcomes and potential mediators at the baseline and endline and by intervention and comparison group

Mediation analysis

As it has been previously reported, there was a positive and significant effect of P4P on the rate of institutional deliveries (an 8.2 percentage point increase, CI 3.6 to 12.8), on the rate of deliveries in public health facilities (a 6.5 percentage point increase, CI 1.3; 11.7) and on the uptake of two doses of antimalarial drugs during antenatal care (a 10.3 percentage point increase, CI 4.4; 16.1) [21] (Table 3). The effect of P4P was tested on all potential mediators in Table 2, but results are reported only for those significantly affected by P4P (Table 3).

Table 3 Effect of P4P on institutional delivery and on potential mediators

P4P led to an increased availability of resources at the facility, notably a reduction in the disruption of services due to broken equipment (by −14.9 percentage points, CI −29.3 to −0.4); a reduction in the stock-out rate of essential medical supplies (by −14.8 percentage points, CI −24.8 to −4.9) and drugs (by −17.2 percentage points, CI −26.8 to −5.8), particularly those used during delivery including Oxytocin (by −36.2 percentage points, CI −55.9 to −16.4) and Ergometrin (by −26.1 percentage points, CI −48.2 to −4.0). P4P resulted in more frequent supervision. There was an increase in the probability of having received the last district or regional supervision in the last 90 days (by 18 percentage points, CI 4.0 to 32.0). P4P resulted in a significant increase in health worker knowledge (by 18.8 percentage points, CI 10.4 to 27.2) and improved patient-provider interactions, measured by patient perceptions of provider kindness during deliveries (by 4.3 percentage points, CI −0.4 to 9.0). P4P led to a reduction in user costs (by 4.5 percentage points, CI -9.5 to 0.6), measured as the reduced probability of paying out-of-pocket for institutional delivery by women living within the catchment area of the facility (Table 3). No effect was found on the remaining indicators on the causal pathway, notably, health worker motivation, outreach activities, and insurance enrolment.

Among all the potential mediators identified, only a limited number of them significantly mediated the effect of P4P on the outcomes of interest (Table 4). The coefficient associated with P4P reported in Table 4 represents the direct programme effect when controlling for a given mediator; where this is less than that reported in the analysis without mediators, there is evidence of mediation. The indirect effect of P4P on the outcome, or the effect which passes through a given mediator, is calculated by interacting the coefficient associated with the mediator of interest in Eq. 3 with the effect of P4P on the same mediator in Eq. 2. The estimates of the direct and indirect (through the selected mediators) effects of P4P on outcomes are reported in Table 5 along with the results of sensitivity to the sequential ignorability assumption (rho at which ACME equals 0).

Table 4 Effect of P4P and potential mediators on maternal care outcomes (results from Eq. 3)
Table 5 Indirect effect of potential mediators on maternal care outcomes

The probability of paying for delivery and the perceived kindness of health workers during delivery mediate the effect of P4P on institutional deliveries, and the stock out rate of Oxytocin mediates the effect of P4P on deliveries in public facilities. When these are included as mediators, P4P has no significant direct effect on the outcome (Table 4).

The reduction in the proportion of women who paid for delivery mediates 48% of the effect of P4P on institutional delivery and 78% of the effect of P4P on delivery in a public health facility (Table 5). The reduction in the stock-out rate of oxytocin mediates 22% of the total effect of P4P on institutional delivery and 30% of the total programme effect on delivery in a public health facility (Table 5, columns 1 and 2). The kindness of providers during delivery mediates 48% of the total effect of P4P on institutional deliveries and 49% on deliveries in public facilities. The increase in the timeliness of supervision mediates 15% of the effect of P4P on the uptake of two doses of anti-malarial drugs during antenatal care (Table 5, column 3), but did not mediate the effect of P4P on institutional deliveries. Uptake of two doses of anti-malarial drugs did not appear to be a significant mediator of the effect of P4P on institutional deliveries (Table 4, columns 1 and 2), but it was borderline significant for deliveries in a public health facility.

Sensitivity analysis

The sensitivity analysis (Table 5 and Table 9 in the Appendix) indicates that little correlation between the error terms of Eqs. 2 and 3 (correlation coefficients ranging from 0.02 to 0.04) would be sufficient to reduce the mediated effect to zero for most mediators. However, a higher correlation coefficient would be required to reduce to zero the indirect effect of P4P through a reduction of payment at delivery and increased health worker kindness, on institutional delivery (correlation coefficients 0.23 and 0.20, respectively) and on delivery in a public health facility (correlation coefficients 0.25 and 0.16, respectively).

When carrying out the analysis at the health facility level (Table 6, Table 7 and Table 8 in Appendix), the stock out rate of Oxytocin and the perceived kindness of health workers at delivery still mediated the effect of P4P on institutional deliveries, while the proportion of women who paid for delivery mediated the effect on deliveries in public facilities. However, the other mediators identified were no longer significant and no mediators for the uptake of two doses of anti-malarial drugs during antenatal care were identified. New mediators were also identified. For example, health worker satisfaction with local leaders became mediator of delivery in a public health facility. None of the indirect effects were significant, however, as a consequence of the reduced statistical power due to the smaller number of observations.

A number of other sensitivity analyses were carried out. We tested for significance of the interaction between treatment and mediator in Eq. 3 and found no significant effect indicating that the average mediation effect is equivalent in treated and non-treated areas. We identified the same set of potential mediators when we tested for the effect of P4P correcting standard errors for clustering at the district level. When we adjusted the level of significance to account for multiple outcome testing, the reduction in the stock out rate of Oxytocin was the only mediator that remained significant.


Causal mediation analysis has been put forward as an approach to understand causal mechanisms within process evaluation [2]. However, to date, there is very little empirical evidence of its application within the evaluation of complex health interventions. Building on an existing impact evaluation, we set out to test the causal pathways through which P4P affected maternal care outcomes using causal mediation analysis. While our finding of P4P effects on core maternal outcomes is partly consistent with previous evaluation studies in Rwanda and Burundi [20, 22, 30, 46], ours is the first to formally test the pathways through which P4P affects outcomes.

As in a previous study [15], we found that P4P affects the level of inputs available in health facilities. However, we tested for a wider range of mediators consistently with our theory of change and found that they mediate a significant proportion of the effect of P4P on the use of maternal care services.

Reductions in the probability of paying out of pocket and increased provider kindness during delivery mediated the largest share of the P4P effect on institutional deliveries overall and in public facilities, and these mediation effects were more robust to unmeasured confounding. Oxytocin is a drug administered to induce or support labour and to manage the third stage of labour reducing the risk of postpartum haemorrhage [47]. The reduction in the rate of stock-out of Oxytocin mediated 22% of the effect on institutional delivery (up to 30% in public health facilities), but the correlation coefficient at which the ACME is zero was very low (0.04) suggesting that the results are highly sensitive to unmeasured confounding. The effect of P4P on the availability of Oxytocin is, however, consistent with our theory of change. The increased availability of Oxytocin may be due to additional resources made available through P4P to facilities and/or greater communication with district authorities resulting from more frequent supervision. The increased availability of Oxytocin may be appreciated by women as a marker for quality of obstetric care, and management of bleeding, thereby influencing demand [48], though there is no literature highlighting women’s preference for induction [49].

Although women are supposed to be exempt from payment for deliveries in public facilities, often such exemptions are incompletely enforced [50]. Also, when drugs are out of stock, women have to pay for them at private pharmacies. The mediation effect of the probability of paying for care is consistent with providers making a concerted effort to enforce exemptions to attract women to facilities for their delivery [35]. The probability of payment is also likely affected by the reduction in stock out of drugs related to delivery that no longer have to be paid for privately by patients.

Health worker kindness, measured as the mean rank reported by other women in the same health facility catchment area, was found to be a significant mediator, suggesting that increased institutional deliveries could be due to expectations of higher quality of the service provided. This is consistent with our theory of change, whereby health workers modify their interactions and behaviour with patients to make services more attractive, to increase demand so as to meet the performance targets. Literature from a range of settings has highlighted the importance of provider attitude and kindness for women’s demand for care at birth [51, 52]. Improved timeliness of supervision, which we believe may be associated with the verification activities carried out as part of the P4P programme, significantly mediated 15% of the effect of P4P on the uptake of two doses of antimalarials during pregnancy. This indicates that increased monitoring and coaching may lead health workers to improve service delivery.

Referring back to our initial theory of change, the mediators which explained the largest share of total programme effect, and were most robust to unmeasured confounding, rely primarily on health worker response to the direct financial incentive. However, we did not find evidence of P4P increasing motivation, which was identified as a necessary precursor to behaviour change within the theory of change. This could be due to the limited sample size for the health worker survey, or invalid measurement of the underlying motivation construct, which was proxied as job satisfaction. It is also possible that health workers respond to incentives by changing their behaviour without experiencing greater job satisfaction. Our results also suggest that other components of the P4P programme were relevant to outcome achievements, notably the additional availability of resources used to procure drugs and supplies, and more timely supervision, though these effects were less robust to unmeasured confounding. We found less evidence of the effect of the increased facility financial autonomy. Ultimately, such information is useful as it helps identify the programme’s most effective components and “levers” of change.

In addition to identifying likely mediators on the pathway to outcomes, our analysis also illustrates the application of causal mediation analysis to the evaluation of a health systems intervention, such as P4P, and specifically the consideration of health systems mediators, rather than individual level mediators, related to behaviour change. However, doing so does raise practical challenges.

First, when mediators operate at the level of the provider or health facility and outcomes are measured at the household or individual level, it is unclear at which level the analysis should be carried out. We carried out the analysis at the individual level, as we were interested in the pathways to population outcomes, but we assessed the robustness of results to analysis at the facility level, and we found this did affect some of the mediators. The difference in results is in part due to the weighting based on the relative size of the  health facility catchment population, which varies from facility to facility, as well as the reduced sample size and resulting lower statistical power.

Second, randomised trials of health systems interventions are often difficult to implement, and quasi-experimental methods may be the only way to assess causal effects, as in this study. However, to date, causal mediation analysis has only been used alongside randomised controlled trials. We demonstrated its use within difference-in-difference analysis. This approach rests on the assumption of parallel trends between intervention and comparison groups in relation to outcomes as well as mediators. While we were able to assess pre-intervention trends in outcomes, we could do it for only some mediators [21, 37]. In the future, researchers should seek to gather pre-intervention time series data on outcomes as well as mediators. As in the main impact evaluation [21], we used a linear regression model to estimate P4P effects which allows us to use linear structural equation modelling to generate our estimate of mediation effect, although our outcomes and many of our mediators are binary. We had, however, previously demonstrated the robustness of our results to the use of non-linear models [21].

The selection of mediators for inclusion in the analysis was limited to those available within the surveys, so that the effect through potentially relevant mediators, such as the level of funding available at the facility, could not not be tested. Our approach relies on the accurate measurement of potential mediators and, where possible, we used tools that had been tested and applied in previous research to minimise the risk of bias. Future studies should consider using qualitative methods to validate and help explain mediators identified as being significant through mediation analysis.

The application of causal mediation analysis to the evaluation of P4P generates an estimate of average causal pathways. The assumption is that all facilities experience the same pathway to impact; however, it is of course possible that facilities introduce different strategies to achieve outcomes and that there is some variation in pathways across facilities.

The assumption that interventions affect mediators, which in turn affect outcomes, presupposes a temporal ordering, of the change in mediators preceding that of outcomes. In our study, we measured outcomes and mediators at two points in time: at baseline and endline. Hence, changes in mediators were measured at the same time as changes in outcomes. In the case of mediators measured at the individual level, this was problematic, as we would not expect a woman’s report of kindness during her delivery to affect her delivery choice, rather we would expect her choice to be based on perceptions of kindness from the experience of other women. For this reason, we estimated the mediator excluding the woman herself. Further studies should seek to obtain measures of the mediator prior to that of outcomes, either through midline surveys or by framing questions appropriately (for example, did you perceive that kindness during delivery had improved at your nearby facility prior to your birth?).

While we were able to identify significant mediators and explain how much of the overall effect of P4P each could explain, we were unable to determine the order of the causal chain. Some mediators may cause other mediators; hence, there is likely to be a hierarchy of outcomes (for example, increased availability of Oxytocin may affect health worker kindness, as increased drug availability improves their ability to do their job, which in turn affects service uptake). Epidemiology offers methods for quantifying the effects of multiple mediators, and their interactions, and decomposing them, but these methods are still very recent and with limited application [10, 11, 13, 41]. Most importantly, they rely on identifying assumptions, which are often unlikely to be satisfied or hard to prove within policy experiments. Further analysis should explore ways to examine more complex causal pathways, for example, interactions between financing and human resources or governance factors, and to assess total mediated effect.


In this study, we found that the effect of P4P on institutional deliveries was mediated by a reduction in the probability of women paying for delivery care and an increase in provider kindness during deliveries and greater availability of drugs. The increase in coverage of IPT during antenatal care was mediated by more frequent supervision visits.

This study illustrates that there is great potential to apply the method of causal mediation analysis to help unpack the causal mechanisms of complex health systems interventions such as P4P, shedding light on how they impact the health system to achieve population health goals. We encourage further research of this kind to strengthen the evidence base about how health system interventions works.



Average Causal Mediation Effect


Difference in difference


Linear structural equations model


Payment for performance


  1. Ludwig J, Kling JR, Mullainathan S. Mechanism experiments and policy evaluations. J Econ Perspect. 2011;25(3):17–38.

    Article  Google Scholar 

  2. Craig P et al. Developing and evaluating complex interventions: New guidance. BMJ. 2008;337:a1655.

  3. Bonell C, et al. Realist randomised controlled trials: a new approach to evaluating complex public health interventions. Soc Sci Med. 2012;75(12):2299–306.

    Article  PubMed  Google Scholar 

  4. Oakley A, et al. Process evaluation in randomised controlled trials of complex interventions. Br Med J. 2006;18(332):413–6.

    Article  Google Scholar 

  5. Moore GF, et al. Process evaluation of complex interventions: Medical Research Council guidance. Br Med J. 2015;350:h1258.

    Article  Google Scholar 

  6. MacKinnon DP, Fairchild AJ, Fritz MS. Mediation analysis. Annu Rev Psychol. 2007;58(593):59.

    Google Scholar 

  7. Grant A, et al. Process evaluations for cluster-randomised trials of complex interventions: a proposed framework for design and reporting. Trials. 2013;12(14):15.

    Article  Google Scholar 

  8. Imai K, et al. Unpacking the black box of causality: learning about causal mechanisms from experimental and observational studies. Am Polit Sci Rev. 2011;105(4):765–89.

    Article  Google Scholar 

  9. Gunzler D, et al. Introduction to mediation analysis with structural equation modeling. Shanghai Arch Psychiatry. 2013;25(6):390–4.

    PubMed  PubMed Central  Google Scholar 

  10. Heckman J, Pinto R, Savelyev PA. Understanding the mechanisms through which an influential early childhood program boosted adult outcomes. Am Econ Rev. 2013;103:2052–86.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Heckman J, Pinto R. Econometric mediation analyses: identifying the sources of treatment effects from experimentally estimated production technologies with unmeasured and mismeasured inputs. Econ Rev. 2015;34(1-2):6–31.

    Article  Google Scholar 

  12. Conti G, J Heckman, and R Pinto. The effects of two influential early childhood interventions on health and healthy behaviors. In: IZA Discussion Paper IZA, Editor. Bonn. The Economic Journal. 2015;126(596):F28–F65.

  13. DeStavola BL, et al. Mediation analysis with intermediate confounding: structural equation modeling viewed through the causal inference lens. Am J Epidemiol. 2015;181(1):64–80.

    Article  Google Scholar 

  14. Abramsky T et al. Ecological pathways to prevention: how does the SASA! community mobilisation model work to prevent physical intimate partner violence against women? BMC Public Health. 2016;16(339):1–21.

  15. Ngo D, S Tisamarie, and S Bauhoff,. Health system changes under pay-for-performance: the effects of Rwanda’s national programme on facility inputs. Health Policy and Plan. 2016. doi:10.1093/heapol/czw091. first published online July 19, 2016.

  16. Scott A et al. The effect of financial incentives on the quality of health care provided by primary care physicians. Cochrane Database Syst Rev. 2011;9.

  17. Witter S, et al. Paying for performance to improve the delivery of health interventions in low- and middle-income countries. Cochrane Database Syst Rev. 2012;2:CD007899.

    Google Scholar 

  18. Ogundeji Y, Bland J, Sheldon T. The effectiveness of payment for performance in health care: a meta-analysis and exploration of variation in outcomes. Health Policy. 2016;120(10):1141–50.

    Article  PubMed  Google Scholar 

  19. Hasnain Z, N Manning, and JH Pierskalla. Performance-related pay in the public sector: a review of theory and evidence. World Bank Policy Research Working Paper. 2012; 6043.

  20. Basinga P, et al. Effect on maternal and child health services in Rwanda of payment to primary health-care providers for performance: an impact evaluation. Lancet. 2011;377(9775):1421–8.

    Article  PubMed  Google Scholar 

  21. Binyaruka P et al. Effect of paying for performance on utilisation, quality, and user costs of health services in Tanzania: a controlled before and after study. PLoS ONE. 2015;10(8):1–16.

  22. Bonfrer I, Van de Poel E, Van Doorslaer E. The effects of performance incentives on the utilization and quality of maternal and child care in Burundi. Soc Sci Med. 2014;123:96–104.

    Article  PubMed  Google Scholar 

  23. Peabody J, et al. Financial incentives and measurement improved physicians’ quality of care in the Philippines. Health Affairs (Millwood). 2011;30(4):773–81.

    Article  Google Scholar 

  24. Yip W, et al. Capitation combined with pay-for-performance improves antibiotic prescribing practices in rural China. Health Affairs (Millwood). 2014;33(3):502–10.

    Article  Google Scholar 

  25. Van de Poel E et al. Impact of performance-based financing in a low-resource setting: a decade of experience in Cambodia Health Economics. Health Economics. 2016;25(6):688–705.

  26. Witter S, et al. Performance-based financing as a health system reform: mapping the key dimensions for monitoring and evaluation. BMC Health Serv Res. 2013;13:367.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Meessen P, et al. Reviewing institutions of rural health centres: the Performance Initiative in Butare, Rwanda. Trop Med Int Health. 2006;11(8):1303–17.

    Article  PubMed  Google Scholar 

  28. Bertone MP, Meessen B. Studying the link between institutions and health system performance: a framework and an illustration with the analysis of two performance-based financing schemes in Burundi. Health Policy Plan. 2013;28(8):847–57.

    Article  PubMed  Google Scholar 

  29. Ssengooba F, McPake B, Palmer N. Why performance-based contracting failed in Uganda—an “open-box” evaluation of a complex health system intervention. Soc Sci Med. 2012;75(2):377–83.

    Article  PubMed  Google Scholar 

  30. Bonfrer I, et al. Introduction of performance-based financing in Burundi was associated with improvements in care and quality. Health Aff. 2014;33(12):2179–87.

    Article  Google Scholar 

  31. Borghi J, et al. Protocol for the evaluation of a pay for performance programme in Pwani region in Tanzania: a controlled before and after study. Implement Sci. 2013;8:80. 2013. 8(80).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Falisse JB et al. Community participation and voice mechanisms under performance-based financing schemes in Burundi. Trop Med Int Health. 2012;17(5):674-682.

  33. Waddimba AC, et al. Provider attitudes associated with adherence to evidence-based clinical guidelines in a managed care setting. Med Care Res Rev. 2010;67(1):93–116.

    Article  PubMed  Google Scholar 

  34. Gertler P, Vermeersch C. Using performance incentives to improve medical care productivity and health outcomes. 2013, NBER Working Paper No. 19046.

  35. Huillery E, Seban J. Performance-based financing for health: experimental evidence from the Democratic Republic of Congo. mimeo. Available at:

  36. Wang H, et al. An experiment in payment reform for doctors in rural China reduced some unnecessary care but did not lower total costs. Health Affairs (Millwood). 2011;30(12):2427–36.

    Article  Google Scholar 

  37. Binyaruka P, M Mamdani, and J Borghi. Improving quality of care through payment for performance: examining effects on the availability and stock out of essential commodities in Tanzania. Trop Med Int Health. 2017;22(1):92–102.

  38. Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol. 1986;51:1173–82.

    Article  CAS  PubMed  Google Scholar 

  39. Manski CF. Identification for prediction and decision. Boston: H.U. Press; 2009.

    Google Scholar 

  40. Imai K, Keele L, Yamamoto T. Identication, inference, and sensitivity analysis for causal mediation effects. Stat Sci. 2010;25:51–71.

    Article  Google Scholar 

  41. Keele L, Tingley D, Yamamoto T. Identifying mechanisms behind policy interventions via causal mediation analysis. J Policy Anal Manage. 2015;34(4):937–63.

    Article  Google Scholar 

  42. Imai K, Keele L, Tingley D. A general approach to causal mediation analysis. Psychol Methods. 2010;15:309–34.

    Article  PubMed  Google Scholar 

  43. Hicks R, Tingley D. Causal mediation analysis. Stata J. 2011;11(4):1–15.

    Google Scholar 

  44. Cameron A, Gelbach J, Miller D. Bootstrap-based improvements for inference with clustered errors. Rev Econ Stat. 2008;90(3):414–27.

    Article  Google Scholar 

  45. Cameron AC, Miller DL. A practitioner’s guide to cluster-robust inference. J Hum Resour. 2015;50(2):317–72.

    Article  Google Scholar 

  46. Falisse JB, et al. Performance-based financing in the context of selective free health-care: an evaluation of its effects on the use of primary health-care services in Burundi using routine data. Health Policy Plan. 2014;30(10):1251–60.

    Article  PubMed  Google Scholar 

  47. Westhoff G, Cotter AM, Tolosa JE. Prophylactic oxytocin for the third stage of labour to prevent postpartum haemorrhage. Cochrane Database Syst Rev. 2013;10:CD001808.

    Google Scholar 

  48. Wilunda C, et al. A qualitative study on barriers to utilisation of institutional delivery services in Moroto and Napak districts, Uganda: implications for programming. BMC Pregnancy Childbirth. 2014;14:259.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Moore J, et al. Moving toward patient-centered care: women’s decisions, perceptions, and experiences of the induction of laborprocess. Birth. 2014;41(2):138–46.

    Article  PubMed  Google Scholar 

  50. Kruk ME, et al. User fee exemptions are not enough: out-of-pocket payments for ‘free’ delivery services in rural Tanzania. Trop Med Int Health. 2008;13(12):1442–51.

    Article  PubMed  Google Scholar 

  51. Kruk M, et al. Women’s preferences for obstetric care in rural Ethiopia: a population-based discrete choice experiment in a region with low rates of facility delivery. J Epidemiol Community Health. 2010;64(11):984–8.

    Article  CAS  PubMed  Google Scholar 

  52. Larson E, et al. Moving toward patient-centered care in africa: a discrete choice experiment of preferences for delivery care among 3,003 Tanzanian women. PLoS One. 2015;10(8):e0135621.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


We would like to thank Kara Hanson for reviewing the paper and for valuable comments and suggestions. We also thank the P4P evaluation research team, including process researchers, data collectors, and field coordinators.


The Government of Norway funded the data collection for the programme evaluation that was used in this paper. The UK Department for International Development as part of the Consortium for Research on Resilient and Responsive Health Systems supported the funding of the authors time undertaking data analysis and writing of the paper. The Research Council of Norway also supported JB's time.

Availability of data and materials

The data have been uploaded into a data repository. The DOI URL for the dataset is 10.5281/zenodo.21709.

Authors’ contributions

JB conceptualized the study and conceptual framework and contributed to the first draft of the paper. LA developed and carried out the analysis and contributed to the first draft of the paper. PB critically revised the paper. All authors edited the manuscript. All authors read and approved the final manuscript.

Competing interests

PB and JB have the following competing interests: they were funded by the Government of Norway to undertake the data collection associated with this research. The Government of Norway also funded the P4P programme in the Pwani region of Tanzania. The funder of the study had no role in data analysis, data interpretation, or writing of the manuscript. All authors read and approved the final manuscript.

Consent for publication

Not applicable.

Ethics approval and consent to participate

The evaluation study received ethical approval from the Ifakara Health Institute institutional review board (approval number: 1BI1IRB/38) and the ethics committee of the London School of Hygiene & Tropical Medicine. Study participants provided written consent to participate in this study, requiring them to sign a written consent form that was read out to them by the interviewers. This consent form was reviewed and approved by the ethics committees prior to the start of the research.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Laura Anselmi.



Sensitivity analysis with data averaged at the health facility level

Table 6 Effect of P4P on institutional delivery and on potential mediators
Table 7 Effect of P4P and potential mediators on maternal care outcomes (results from Eq. 3)
Table 8 Indirect effect of potential mediators on maternal care outcomes
Table 9 P4P Average causal mediation effect (ACME) and direct effects, sensitivity analysis using Imai et al. (2010) approach

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anselmi, L., Binyaruka, P. & Borghi, J. Understanding causal pathways within health systems policy evaluation through mediation analysis: an application to payment for performance (P4P) in Tanzania. Implementation Sci 12, 10 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: