2008
August 2008
Bias formulas for external adjustment and sensitivity analysis of unmeasured confounders Annals of EpidemiologyPurpose: Uncontrolled confounders are an important source of bias in epidemiologic studies. The authors review and derive a set of parallel simple formulas for bias factors in the risk difference, risk ratio, and odds ratio from studies with an unmeasured polytomous confounder and a dichotomous exposure and outcome.
Methods: The authors show how the bias formulas are related to and are sometimes simpler than earlier formulas. The article contains three examples, including a Monte Carlo sensitivity analysis of a preadjusted or conditional estimate.
Results: All the bias expressions can be given parallel formulations as the difference or ratio of (i) the sum across confounder strata of each exposure-stratified confounder-outcome effect measure multiplied by the confounder prevalences among the exposed and (ii) the sum across confounder strata of the same effect measure multiplied by the confounder prevalences among the unexposed. The basic formulas can be applied to scenarios with a polytomous confounder, exposure, or outcome.
Conclusions: In addition to aiding design and analysis strategies for confounder control, the bias formulas provide a link between classical standardization decompositions of demography and classical bias formulas of epidemiology. They are also useful in constructing general programs for sensitivity analysis and more elaborate probabilistic risk analyses.
2011
January 2011
Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders EpidemiologyUncontrolled confounding in observational studies gives rise to biased effect estimates. Sensitivity analysis techniques can be useful in assessing the magnitude of these biases. In this paper, we use the potential outcomes framework to derive a general class of sensitivity-analysis formulas for outcomes, treatments, and measured and unmeasured confounding variables that may be categorical or continuous. We give results for additive, risk-ratio and odds-ratio scales. We show that these results encompass a number of more specific sensitivity-analysis methods in the statistics and epidemiology literature. The applicability, usefulness, and limits of the bias-adjustment formulas are discussed. We illustrate the sensitivity-analysis techniques that follow from our results by applying them to 3 different studies. The bias formulas are particularly simple and easy to use in settings in which the unmeasured confounding variable is binary with constant effect on the outcome across treatment levels.
2015
October 2015
Kernel Balancing: A flexible non-parametric weighting procedure for estimating causal effects Statistica SinicaMatching and weighting methods are widely used to estimate causal effects when adjusting for a set of observables is required. Matching is appealing for its non-parametric nature, but with continuous variables, is not guaranteed to remove bias. Weighting techniques choose weights on units to ensure pre-specified functions of the covariates
have equal (weighted) means for the treated and control group. This assures unbiased effect estimation only when the potential outcomes are linear in those pre-specified functions of the observables. Kernel balancing begins by assuming the expectation of the non-treatment potential outcome conditional on the covariates falls in a large, flexible space of functions associated with a kernel. It then constructs linear bases for this function space and achieves approximate balance on these bases. A worst-case bound on the bias due to this approximation is given and is the target of minimization. Relative to current practice, kernel balancing offers one reasoned solution to the long-standing question of which functions of the covariates investigators should attempt to achieve (and check) balance on. Further, these weights are also those that would make the estimated multivariate density of covariates approximately the same for the treated and control groups, when the same choice of kernel is used to estimate those densities. The approach is fully automated up to the choice of a kernel and smoothing parameter, for which default options and guidelines are provided. An R package, KBAL, implements this approach.
Software: R package, Link: Supplement
October 2015
G-computation demonstration in causal mediation analysis European Journal of EpidemiologyRecent work has considerably advanced the definition, identification and estimation of controlled direct, and natural direct and indirect effects in causal mediation analysis. Despite the various estimation methods and statistical routines being developed, a unified approach for effect estimation under different effect decomposition scenarios is still needed for epidemiologic research. G-computation offers such unification and has been used for total effect and joint controlled direct effect estimation settings, involving different types of exposure and outcome variables. In this study, we demonstrate the utility of parametric g-computation in estimating various components of the total effect, including (1) natural direct and indirect effects, (2) standard and stochastic controlled direct effects, and (3) reference and mediated interaction effects, using Monte Carlo simulations in standard statistical software. For each study subject, we estimated their nested potential outcomes corresponding to the (mediated) effects of an intervention on the exposure wherein the mediator was allowed to attain the value it would have under a possible counterfactual exposure intervention, under a pre-specified distribution of the mediator independent of any causes, or under a fixed controlled value. A final regression of the potential outcome on the exposure intervention variable was used to compute point estimates and bootstrap was used to obtain confidence intervals. Through contrasting different potential outcomes, this analytical framework provides an intuitive way of estimating effects under the recently introduced 3- and 4-way effect decomposition. This framework can be extended to complex multivariable and longitudinal mediation settings.
2017
March 2017
Bias Analysis for Uncontrolled Confounding in the Health Sciences Annual Review of Public HealthUncontrolled confounding due to unmeasured confounders biases causal inference in health science studies using observational and imperfect experimental designs. The adoption of methods for analysis of bias due to uncontrolled confounding has been slow, despite the increasing availability of such methods. Bias analysis for such uncontrolled confounding is most useful in big data studies and systematic reviews to gauge the extent to which extraneous preexposure variables that affect the exposure and the outcome can explain some or all of the reported exposure-outcome associations. We review methods that can be applied during or after data analysis to adjust for uncontrolled confounding for different outcomes, confounders, and study settings. We discuss relevant bias formulas and how to obtain the required information for applying them. Finally, we develop a new intuitive generalized bias analysis framework for simulating and adjusting for the amount of uncontrolled confounding due to not measuring and adjusting for one or more confounders.
September 2017
G-computation of Average Treatment Effects on the Treated and the Untreated BMC Medical Research MethodologyBackground: Average treatment effects on the treated (ATT) and the untreated (ATU) are useful when there is interest in: the evaluation of the effects of treatments or interventions on those who received them, the presence of treatment heterogeneity, or the projection of potential outcomes in a target (sub-) population. In this paper we illustrate the steps for estimating ATT and ATU using g-computation implemented via Monte Carlo simulation.
Methods: To obtain marginal effect estimates for ATT and ATU we used a three-step approach: fitting a model for the outcome, generating potential outcome variables for ATT and ATU separately, and regressing each potential outcome variable on treatment intervention.
Results: The estimates for ATT, ATU and average treatment effect (ATE) were of similar magnitude, with ATE being in between ATT and ATU as expected. In our illustrative example, the effect (risk difference [RD]) of a higher education on angina among the participants who indeed have at least a high school education (ATT) was -0.019 (95% CI: -0.040, -0.007) and that among those who have less than a high school education in India (ATU) was -0.012 (95% CI: -0.036, 0.010).
Conclusions: The g-computation algorithm is a powerful way of estimating standardized estimates like the ATT and ATU. Its use should be encouraged in modern epidemiologic teaching and practice.
December 2017
Projecting the Impact of Hypothetical Early Life Interventions on Adiposity in Children Living in Low-Income Households Pediatric ObesityIt is difficult to evaluate the effectiveness of interventions aimed at reducing early childhood obesity using randomized trials. Objective To illustrate how observational data can be analysed using causal inference methods to estimate the potential impact of behavioural 'interventions' on early childhood adiposity. Methods We used longitudinal data from 1054 children 1-5 years old enrolled in the Special Supplemental Nutrition Program for Women, Infants and Children and followed (WIC) from 2008 to 2010 for a mean duration of 23 months. The data came from a random sample of WIC families living in Los Angeles County in 2008. We used the parametric g-formula to estimate the impact of various hypothetical behavioural interventions. Results Adjusted mean weight-for-height Z score at the end of follow-up was 0.73 (95% CI 0.65, 0.81) under no intervention and 0.63 (95% CI 0.38, 0.87) for all interventions given jointly. Exclusive breastfeeding for 6 months or longer was the most effective intervention [population mean difference = -0.11 (95% CI -0.22, 0.01)]. Other interventions had little or no effect. Conclusions Compared with interventions promoting healthy eating and physical activity behaviours, breastfeeding was more effective in reducing obesity risk in children aged 1-5 years. When carefully applied, causal inference methods may offer viable alternatives to randomized trials in etiologic and evaluation research.
2018
January 2018
Making Sense of Sensitivity: Extending Omitted Variable Bias Journal of the Royal Statistical Society, Series B (Statistical Methodology)In this paper we extend the familiar "omitted variable bias"
framework, creating a suite of tools for sensitivity analysis of regression
coefficients and their standard errors to unobserved confounders that: (i) do
not require assumptions about the functional form of the treatment assignment
mechanism nor the distribution of the unobserved confounder(s); (ii) can be
used to assess the sensitivity to multiple confounders, whether they influence
the treatment or the outcome linearly or not; (iii) facilitate the use of
expert knowledge to judge the plausibility of sensitivity parameters; and, (iv)
can be easily and intuitively displayed, either in concise regression tables or
more elaborate graphs. More precisely, we introduce two novel measures for
communicating the sensitivity of regression results that can be used for
routine reporting. The "robustness value" describes the association
unobserved confounding would need to have with both the treatment and the
outcome to change the research conclusions. The partial R-squared of the
treatment with the outcome shows how strongly confounders explaining all of the
outcome would have to be associated with the treatment to eliminate the
estimated effect. Next, we provide intuitive graphical tools that allow
researchers to make more elaborate arguments about the sensitivity of not only
point estimates but also t-values (or p-values and confidence intervals). We
also provide graphical tools for exploring extreme sensitivity scenarios in
which all or much of the residual variance is assumed to be due to confounders.
Finally, we note that a widespread informal "benchmarking" practice
can be widely misleading, and introduce a novel alternative that allows
researchers to formally bound the strength of unobserved confounders "as
strong as" certain covariate(s) in terms of the explained variance of the
treatment and/or the outcome. We illustrate these methods with a running
example that estimates the effect of exposure to violence in western Sudan on
attitudes toward peace.
Software: R sensemakr, STATA sensemakr, Python sensemakr, Shinyapp
March 2018
Covariate Balancing Propensity Score for a Continuous Treatment: Application to the efficacy of political advertisements Annals of Applied StatisticsPropensity score matching and weighting are popular methods when estimating causal effects in observational studies. Beyond the assumption of unconfoundedness, however, these methods also require the model for the propensity score to be correctly specified. The recently proposed covariate balancing propensity score (CBPS) methodology increases the robustness to model misspecification by directly optimizing sample covariate balance between the treatment and control groups. In this paper, we extend the CBPS to a continuous treatment. We propose the covariate balancing generalized propensity score (CBGPS) methodology, which minimizes the association between covariates and the treatment. We develop both parametric and nonparametric approaches and show their superior performance over the standard maximum likelihood estimation in a simulation study. The CBGPS methodology is applied to an observational study, whose goal is to estimate the causal effects of political advertisements on campaign contributions. We also provide open-source software that implements the proposed methods.
For R users, CBPS can be
installed from CRAN: >install.packages("CBPS")
Software: R package
September 2018
Estimating causal effects of new treatments despite self-selection: The case of experimental medical treatments Journal of Causal InferenceProviding terminally ill patients with access to experimental treatments,
as allowed by recent “right to try” laws and “expanded access” programs, poses a variety of ethical questions. While practitioners and investigators may assume it is impossible to learn the effects of these treatment without randomized trials, this paper describes a simple tool to estimate the effects of these experimental treatments on those who take them, despite the problem of selection into treatment, and without assumptions about the selection process. The key assumption is that the average outcome, such as survival, would remain stable over time in the absence of the new treatment. Such an assumption is unprovable, but can often be credibly judged by reference to historical data and by experts familiar with the disease and its treatment. Further, where this
assumption may be violated, the result can be adjusted to account for a
hypothesized change in the non-treatment outcome, or to conduct a sensitivity analysis. The method is simple to understand and implement, requiring just four numbers to form a point estimate. Such an approach can be used not only to learn which experimental treatments are promising, but also to warn us when treatments are actually harmful – especially when they might otherwise appear to be beneficial, as illustrated by example here. While this note focuses on experimental medical treatments as a motivating case, more generally this approach can be employed where a new treatment becomes available or has a large increase in uptake, where selection bias is a concern, and where an assumption on the change in average non-treatment outcome over time can credibly be imposed.
2019
January 2019
A Persuasive Peace: Syrian refugees' attitudes towards compromise and civil war termination Journal of Peace ResearchCivilians who have fled violent conflict and
settled in neighboring countries are integral to processes of civil war
termination. Contingent on their attitudes, they can either back peaceful
settlements or support warring groups and continued fighting. Attitudes toward
peaceful settlement are expected to be especially obdurate for civilians who
have been exposed to violence. In a survey of 1,120 Syrian refugees in Turkey
conducted in 2016, we use experiments to examine attitudes towards two critical
phases of conflict termination -- a ceasefire and a peace agreement. We examine
the rigidity/flexibility of refugees' attitudes to see if subtle changes in how
wartime losses are framed or in who endorses a peace process can shift
willingness to compromise with the incumbent Assad regime. Our results
show, first, that refugees are far more likely to agree to a ceasefire proposed
by a civilian as opposed to one proposed by armed actors from either the Syrian
government or the opposition. Second, simply describing the refugee community's
wartime experience as suffering rather than sacrifice substantially increases
willingness to compromise with the regime to bring about peace. This effect
remains strong among those who experienced greater violence. Together, these
results show that even among a highly pro-opposition population that has
experienced severe violence, willingness to settle and make peace are
remarkably flexible and dependent upon these cues.
October 2019
The effect of personal violence on attitudes towards peace in Darfur Journal of Conflict ResolutionDoes exposure to violence motivate individuals to support further violence, or to seek peace? Such questions are central to our understanding of how conflicts evolve, terminate, and recur. Yet, convincing empirical evidence as to which response dominates, even in a specific case, has been elusive, owing to the inability to rule out confounding biases. This paper employs a natural experiment based on the indiscriminacy of violence within villages in Darfur to examine how refugees' experiences of violence affect their attitudes toward peace. The results are consistent with a pro-peace or "weary" response: individuals directly harmed by violence were more likely to report that peace is possible, and less likely to demand execution of their enemies. This provides micro-level evidence supporting earlier country-level work on "war-weariness," and extends the growing literature on the effects of violence on individuals by including attitudes toward peace as an important outcome. These findings suggest that victims harmed by violence during war can play a positive role in settlement and reconciliation processes.
2020
January 2020
Analyzing Selection Bias for Credible Causal Inference: When in Doubt, DAG It Out. Epidemiology EpidemiologyCausal modeling and inference rely on strong assumptions, one of which is conditional exchangeability. Uncontrolled confounding is often seen as if it is the most important threat to conditional exchangeability although collider-stratification bias or selection bias can be just as important. 1–4 In this issue of the journal, Flanders and Ye. 5 henceforth, F&Y) and Smith and VanderWeele 6 (henceforth, S&VW) present their results on new bounds—limits that selection bias would not exceed in any specified context—and accompanying summary measures for the values of the selection bias bounding factors that will be enough to explain away any observed association between the exposure and the outcome on the risk ratio or relative risk scale, with risk difference results given in the appendix of S&VW’s article. These articles on M-bias or selection bias fit into a growing body of work that have renewed researchers’ interests in selection bias including the recent overlapping literature on generalizability and transportability, and bounding factors and related summary measures for bias analysis.
March 2020
Invited Commentary: Making Causal Inference More Social and (Social) Epidemiology More Causal American Journal of EpidemiologyA society's social structure and the interactions of its members determine when key drivers of health occur, for how long they last, and how they operate. Yet, it has been unclear whether causal inference methods can help us find meaningful interventions on these fundamental social drivers of health. Galea and Hernán propose we place hypothetical interventions on a spectrum and estimate their effects by emulating trials, either through individual-level data analysis or systems science modeling (Am J Epidemiol. 2020;189(3):167-170). In this commentary, by way of example in health disparities research, we probe this "closer engagement of social epidemiology with formal causal inference approaches." The formidable, but not insurmountable, tensions call for causal reasoning and effect estimation in social epidemiology that should always be enveloped by a thorough understanding of how systems and the social exposome shape risk factor and health distributions. We argue that one way toward progress is a true partnership of social epidemiology and causal inference with bilateral feedback aimed at integrating social epidemiologic theory, causal identification and modeling methods, systems thinking, and improved study design and data. To produce consequential work, we must make social epidemiology more causal and causal inference more social.
July 2020
Inference without randomization or ignorability: A stability-controlled quasi-experiment on the prevention of tuberculosis. Statistics in MedicineThe stability-controlled quasi-experiment (SCQE) is an approach to study the effects of nonrandomized, newly adopted treatments. While covariate adjustment techniques rely on a “no unobserved confounding” assumption, SCQE imposes an assumption on the change in the average nontreatment outcome between successive cohorts (the “baseline trend”). We provide inferential tools for SCQE and its first application, examining whether isoniazid preventive therapy (IPT) reduced tuberculosis (TB) incidence among 26,715 HIV patients inTanzania. After IPT became available, 16% of untreated patients developed TB within a year, compared with only 0.5% of patients under treatment. Thus, asimple difference in means suggests a 15.5 percentage point (pp) lower risk(p≪.001).Adjusting for covariates using numerous techniques leaves this effectively unchanged. Yet, due to confounding biases, such estimates can be misleading regardless of their statistical strength. By contrast, SCQE reveals valid causaleffect estimates for any chosen assumption on the baseline trend. For example,assuming a baseline trend near 0 (no change in TB incidence over time, absent this treatment) implies a small and insignificant effect. To argue IPT was beneficial requires arguing that the nontreatment incidence would have risen by atleast 0.7 pp per year, which is plausible but far from certain. SCQE may produce narrow estimates when the plausible range of baseline trends can be sufficiently constrained, while in every case it tells us what baseline trends must be believed in order to sustain a given conclusion, protecting against inferences that rely upon infeasible assumptions.
July 2020
Wildfire Exposure Increases Pro-Climate Political Behaviors American Political Science ReviewOne political barrier to climate reforms is the temporal mismatch between short-term policy costs and long-term policy benefits. Will public support for climate reforms increase as climate-related disasters make the short-term costs of inaction more salient? Leveraging variation in the timing of Californian wildfires, we evaluate how exposure to a climate-related hazard influences political behavior, rather than self-reported attitudes or behavioral intentions. We show that wildfires
increased support for costly, climate-related ballot measures by 5 to 6
percentage points for those living within 5km of a recent wildfire, decaying to near zero beyond a distance of 15km. This effect is concentrated in Democratic-voting areas, and nearly zero in Republican-dominated areas. We conclude that experienced climate threats can enhance willingness-to-act but largely in places where voters are known to believe in climate change.
December 2020
Understanding, choosing, and unifying multilevel and fixed effect approaches. Political Analysis Political AnalysisWhen working with grouped data, investigators may choose between “fixed effects” models (FE) with specialized (e.g., cluster-robust) standard errors, or “multilevel models" (MLMs) employing “random effects”. We review the claims given in published works regarding this choice, then clarify how these approaches work and compare by showing that: (i) random effects employed in MLMs are simply “regularized” fixed effects; (ii) unmodified MLMs are consequently susceptible to bias—but there is a longstanding remedy; and (iii) the “default” MLM standard errors rely on narrow assumptions that can lead to under coverage in many settings. Our review of over 100 papers using MLM in political science, education, and sociology show that these “known” concerns have been widely ignored in practice. We describe how to debias MLM’s coeicient estimates, and provide an option to more flexibly estimate their standard errors. Most illuminating, once MLMs are adjusted in these two ways the point estimate and standard error for the target coeicient are exactly equal to those of the analogous FE model with cluster-robust standard errors. For investigators working with observational data and who
are interested only in inference on the target coefficient, either approach is equally appropriate and preferable to uncorrected MLM.
2022
January 2022
Causal Effect of Chronic Pain on Mortality Through Opioid Prescriptions: Application of the Front-Door Formula EpidemiologyBackground: Chronic pain is the leading cause of disability worldwide and is strongly associated with the epidemic of opioid overdosing events. However, the causal links between chronic pain, opioid prescriptions, and mortality remain unclear.
Methods: This study included 13,884 US adults aged ≥20 years who provided data on chronic pain in the National Health and Nutrition Examination Survey 1999-2004 with linkage to mortality databases through 2015. We employed the generalized form of the front-door formula within the structural causal model framework to investigate the causal effect of chronic pain on all-cause mortality mediated by opioid prescriptions.
Results: We identified a total of 718 participants at 3 years of follow-up and 1260 participants at 5 years as having died from all causes. Opioid prescriptions increased the risk of all-cause mortality with an estimated odds ratio (OR) (95% confidence interval) = 1.5 (1.1, 1.9) at 3 years and 1.3 (1.1, 1.6) at 5 years. The front-door formula revealed that chronic pain increased the risk of all-cause mortality through opioid prescriptions; OR = 1.06 (1.01, 1.11) at 3 years and 1.03 (1.01, 1.06) at 5 years. Our bias analysis showed that our findings based on the front-door formula were likely robust to plausible sources of bias from uncontrolled exposure-mediator or mediator-outcome confounding.
Conclusions: Chronic pain increased the risk of all-cause mortality through opioid prescriptions. Our findings highlight the importance of careful guideline-based chronic pain management to prevent death from possibly inappropriate opioid prescriptions driven by chronic pain.
2023
July 2023
From “Is it unconfounded?” to “How much confounding would it take?”: Applying the sensitivity-based approach to assess causes of support for peace in Colombia The Journal of PoliticsAttention to the credibility of causal
claims has increased tremendously in recent years. When relying on
observational data, debate often centers on whether investigators have ruled
out any bias due to confounding. However, the relevant
scientific question is generally not whether bias is precisely zero, but
whether it is problematic enough to alter one’s research conclusion. We argue
that sensitivity analyses would improve research practice by showing how
results would change under plausible degrees of confounding, or equivalently,
by revealing what one must argue about the strength of confounding to sustain a
research conclusion. This would improve scrutiny of studies in which non-zero
bias is expected, and of those where authors argue for zero bias but results
may be fragile to confounding too weak to be ruled out. We illustrate this
using off-the-shelf sensitivity tools to examine two potential influences on
support for the FARC peace agreement in Colombia.
August 2023
Monotonicity: Detection, Refutation, and RamificationThe assumption of monotonicity, namely that outputs cannot decrease when inputs increase, is critical for many reasoning tasks, including unit selection, A/B testing, and quasi-experimental econometrics. It is also vital for identifying Probabilities of Causation, which, in turn, enable the estimation of individual-level behavior. This paper demonstrates how monotonicity can be detected (or refuted) using observational, experimental, or combined data. Using such data, we pinpoint regions where monotonicity is definitively violated, where it unequivocally holds, and where its status remains undetermined. We further explore the consequences of monotonicity violations, especially when a maximum percentage of possible violation is specified. Finally, we illustrate applications for personalized decision-making.
Software: Monotonicity Necessity and Sufficiency, Link: Interactive plot for necessary and sufficient regions of monotonicity
August 2023
Simultaneous adjustment of uncontrolled confounding, selection bias and misclassification in multiple-bias modelling International Journal of EpidemiologyAdjusting for multiple biases usually involves adjusting for one bias at a time, with careful attention to the order in which these biases are adjusted. A novel, alternative approach to multiple-bias adjustment involves the simultaneous adjustment of all biases via imputation and/or regression weighting. The imputed value or weight corresponds to the probability of the missing data and serves to 'reconstruct' the unbiased data that would be observed based on the provided assumptions of the degree of bias.
2024
March 2024
Demystifying and avoiding the OLS" weighting problem": Unmodeled heterogeneity and straightforward solutions arxivResearchers frequently estimate treatment effects by regressing outcomes (Y) on treatment (D) and covariates (X). Even without unobserved confounding, the coefficient on D yields a conditional-variance-weighted average of strata-wise effects, not the average treatment effect. Scholars have proposed characterizing the severity of these weights, evaluating resulting biases, or changing investigators' target estimand to the conditional-variance-weighted effect. We aim to demystify these weights, clarifying how they arise, what they represent, and how to avoid them. Specifically, these weights reflect misspecification bias from unmodeled treatment-effect heterogeneity. Rather than diagnosing or tolerating them, we recommend avoiding the issue altogether, by relaxing the standard regression assumption of "single linearity" to one of "separate linearity" (of each potential outcome in the covariates), accommodating heterogeneity. Numerous methods--including regression imputation (g-computation), interacted regression, and mean balancing weights--satisfy this assumption. In many settings, the efficiency cost to avoiding this weighting problem altogether will be modest and worthwhile.
July 2024
Inference at the data's edge: Gaussian processes for modeling and inference under model-dependency, poor overlap, and extrapolation. arxivMany inferential tasks involve fitting models to observed data and predicting outcomes at new covariate values, requiring interpolation or extrapolation. Conventional methods select a single best-fitting model, discarding fits that were similarly plausible in-sample but would yield sharply different predictions out-of-sample. Gaussian Processes (GPs) offer a principled alternative. Rather than committing to one conditional expectation function, GPs deliver a posterior distribution over outcomes at any covariate value. This posterior effectively retains the range of models consistent with the data, widening uncertainty intervals where extrapolation magnifies divergence. In this way, the GP's uncertainty estimates reflect the implications of extrapolation on our predictions, helping to tame the "dangers of extreme counterfactuals" (King & Zeng, 2006). The approach requires (i) specifying a covariance function linking outcome similarity to covariate similarity, and (ii) assuming Gaussian noise around the conditional expectation. We provide an accessible introduction to GPs with emphasis on this property, along with a simple, automated procedure for hyperparameter selection implemented in the R package gpss. We illustrate the value of GPs for capturing counterfactual uncertainty in three settings: (i) treatment effect estimation with poor overlap, (ii) interrupted time series requiring extrapolation beyond pre-intervention data, and (iii) regression discontinuity designs where estimates hinge on boundary behavior.
October 2024
Causal progress with imperfect placebo treatments and outcomes arxivIn the quest to make defensible causal claims from observational data, it is sometimes possible to leverage information from "placebo treatments" and "placebo outcomes". Existing approaches employing such information focus largely on point identification and assume (i) "perfect placebos", meaning placebo treatments have precisely zero effect on the outcome and the real treatment has precisely zero effect on a placebo outcome; and (ii) "equiconfounding", meaning that the treatment-outcome relationship where one is a placebo suffers the same amount of confounding as does the real treatment-outcome relationship, on some scale. We instead consider an omitted variable bias framework, in which users can postulate ranges of values for the degree of unequal confounding and the degree of placebo imperfection. Once postulated, these assumptions identify or bound the linear estimates of treatment effects. Our approach also does not require using both a placebo treatment and placebo outcome, as some others do. While applicable in many settings, one ubiquitous use-case for this approach is to employ pre-treatment outcomes as (perfect) placebo outcomes, as in difference-in-difference. The parallel trends assumption in this setting is identical to the equiconfounding assumption, on a particular scale, which our framework allows the user to relax. Finally, we demonstrate the use of our framework with two applications and a simulation, employing an R package that implements these approaches.
November 2024
Sensemakr: Sensitivity Analysis Tools for OLS in R and Stata Observational StudiesThis paper introduces the package sensemakr for R and Stata, which implements a suite of sensitivity analysis tools for regression models developed in Cinelli and Hazlett (2020a). Given a regression model, sensemakr can compute sensitivity statistics for routine reporting, such as the robustness value, which describes the minimum strength that unobserved confounders need to have to overturn a research conclusion. The package also provides plotting tools that visually demonstrate the sensitivity of point estimates and t-values to hypothetical confounders. Finally, sensemakr implements formal bounds on sensitivity parameters by means of comparison with the explanatory power of observed variables. All these tools are based on the familiar "omitted variable bias" framework, do not require assumptions regarding the functional form of the treatment assignment mechanism nor the distribution of the unobserved confounders, and naturally handle multiple, non-linear confounders. With sensemakr, users can transparently report the sensitivity of their causal inferences to unobserved confounding, thereby enabling a more precise, quantitative debate as to what can be concluded from imperfect observational studies.
2025
January 2025
An omitted variable bias framework for sensitivity analysis of instrumental variables BiometrikaWe develop an omitted variable bias framework for sensitivity analysis of instrumental variable estimates that naturally handles multiple side effects (violations of the exclusion restriction assumption) and confounders (violations of the ignorability of the instrument assumption) of the instrument, exploits expert knowledge to bound sensitivity parameters and can be easily implemented with standard software. Specifically, we introduce sensitivity statistics for routine reporting, such as (extreme) robustness values for instrumental variables, describing the minimum strength that omitted variables need to have to change the conclusions of a study. Next, we provide visual displays that fully characterize the sensitivity of point estimates and confidence intervals to violations of the standard instrumental variable assumptions. Finally, we offer formal bounds on the worst possible bias under the assumption that the maximum explanatory power of omitted variables is no stronger than a multiple of the explanatory power of observed variables. Conveniently, many pivotal conclusions regarding the sensitivity of the instrumental variable estimate (e.g., tests against the null hypothesis of a zero causal effect) can be reached simply through separate sensitivity analyses of the effect of the instrument on the treatment (the first stage) and the effect of the instrument on the outcome (the reduced form). We apply our methods in a running example that uses proximity to college as an instrumental variable to estimate the returns to schooling.
February 2025
Kpop: A kernel balancing approach for reducing specification assumptions in survey weighting. Journal of the Royal Statistical Society, Series B (Statistical Methodology)With the precipitous decline in response rates, researchers and pollsters have been left with highly nonrepresentative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables X must be adjusted for, they rarely defend particular functional forms relating these variables to the response process or the outcome. Unfortunately, commonly used calibration weights-which make the weighted mean of X in the sample equal that of the population-only ensure correct adjustment when the portion of the outcome and the response process left unexplained by linear functions of X are independent. To alleviate this functional form dependency, we describe kernel balancing for population weighting (kpop). This approach replaces the design matrix X with a kernel matrix, K encoding high-order information about X. Weights are then found to make the weighted average row of K among sampled units approximately equal to that of the target population. This produces good calibration on a wide range of smooth functions of X, without relying on the user to decide which X or what functions of them to include. We describe the method and illustrate it by application to polling data from the 2016 US presidential election.
February 2025
Generalized framework for identifying meaningful heterogenous treatment effects in observational studies: A parametric data-adaptive G-computation approach Statistical Methods in Medical ResearchThere has been a renewed interest in identifying heterogenous treatment effects (HTEs) to guide personalized medicine. The objective was to illustrate the use of a step-by-step transparent parametric data-adaptive approach (the generalized HTE approach) based on the G-computation algorithm to detect heterogenous subgroups and estimate meaningful conditional average treatment effects (CATE). The following seven steps implement the generalized HTE approach: Step 1: Select variables that satisfy the backdoor criterion and potential effect modifiers; Step 2: Specify a flexible saturated model including potential confounders and effect modifiers; Step 3: Apply a selection method to reduce overfitting; Step 4: Predict potential outcomes under treatment and no treatment; Step 5: Contrast the potential outcomes for each individual; Step 6: Fit cluster modeling to identify potential effect modifiers; Step 7: Estimate subgroup CATEs. We illustrated the use of this approach using simulated and real data. Our generalized HTE approach successfully identified HTEs and subgroups defined by all effect modifiers using simulated and real data. Our study illustrates that it is feasible to use a step-by-step parametric and transparent data-adaptive approach to detect effect modifiers and identify meaningful HTEs in an observational setting. This approach should be more appealing to epidemiologists interested in explanation.
April 2025
Sensitivity of weighted least squares estimators to omitted variables arxivThis paper introduces tools for assessing the sensitivity, to unobserved confounding, of a common estimator of the causal effect of a treatment on an outcome that employs weights: the weighted linear regression of the outcome on the treatment and observed covariates. We demonstrate through the omitted variable bias framework that the bias of this estimator is a function of two intuitive sensitivity parameters: (i) the proportion of weighted variance in the treatment that unobserved confounding explains given the covariates and (ii) the proportion of weighted variance in the outcome that unobserved confounding explains given the covariates and the treatment, i.e., two weighted partial
R2
values. Following previous work, we define sensitivity statistics that lend themselves well to routine reporting, and derive formal bounds on the strength of the unobserved confounding with (a multiple of) the strength of select dimensions of the covariates, which help the user determine if unobserved confounding that would alter one's conclusions is plausible. We also propose tools for adjusted inference. A key choice we make is to examine only how the (weighted) outcome model is influenced by unobserved confounding, rather than examining how the weights have been biased by omitted confounding. One benefit of this choice is that the resulting tool applies with any weights (e.g., inverse-propensity score, matching, or covariate balancing weights). Another benefit is that we can rely on simple omitted variable bias approaches that, for example, impose no distributional assumptions on the data or unobserved confounding, and can address bias from misspecification in the observed data. We make these tools available in the weightsense package for the R computing language.
April 2025
Real effect or bias? Good practices for evaluating the robustness of evidence from comparative observational studies through quantitative sensitivity analysis for unmeasured confounding Pharmaceutical StatisticsThe assumption of "no unmeasured confounders" is a critical but unverifiable assumption required for causal inference yet quantitative sensitivity analyses to assess robustness of real-world evidence remains under-utilized. The lack of use is likely in part due to complexity of implementation and often specific and restrictive data requirements for application of each method. With the advent of methods that are broadly applicable in that they do not require identification of a specific unmeasured confounder-along with publicly available code for implementation-roadblocks toward broader use of sensitivity analyses are decreasing. To spur greater application, here we offer a good practice guidance to address the potential for unmeasured confounding at both the design and analysis stages, including framing questions and an analytic toolbox for researchers. The questions at the design stage guide the researcher through steps evaluating the potential robustness of the design while encouraging gathering of additional data to reduce uncertainty due to potential confounding. At the analysis stage, the questions guide quantifying the robustness of the observed result and providing researchers with a clearer indication of the strength of their conclusions. We demonstrate the application of this guidance using simulated data based on an observational fibromyalgia study, applying multiple methods from our analytic toolbox for illustration purposes.
July 2025
Glucagon-Like Peptide-1 Receptor Agonists and Incidence of Dementia Among Older Adults With Type 2 Diabetes : A Target Trial Emulation Annals of Internal MedicineBackground: Glucagon-like peptide-1 receptor agonists (GLP-1RAs) have been shown to decrease blood glucose levels, promote weight loss, and prevent cardiovascular events. However, evidence is limited regarding their effect on dementia, although emerging observational studies, some with serious methodological limitations, have suggested large reductions in dementia associated with GLP-1RAs that may not be entirely causally related.
Objective: To compare the effect of GLP-1RAs versus dipeptidyl peptidase-4 inhibitors (DPP4is) as second-line therapy for type 2 diabetes on risk for dementia among older adults.
Design: Target trial emulation.
Setting: United States from January 2016 to December 2020.
Participants: Medicare fee-for-service beneficiaries aged 66 years or older with diabetes who used metformin and did not have dementia at baseline and initiated GLP-1RAs or DPP4is between January 2017 and December 2018.
Measurements: Onset of dementia was defined as 1 year before the date of a new dementia diagnosis. Risks were calculated at 30 months in GLP-1RA and DPP4i groups matched in a 1:2 ratio on an estimated propensity score and compared via ratios and differences.
Results: Among 2418 patients initiating GLP-1RAs and 4836 matched patients initiating DPP4is, the mean age was 71 years, and 55% were female. Over a median follow-up of 1.9 years, the outcome occurred in 96 patients in the GLP-1RA group and 217 in the DPP4i group. The estimated risk difference at 30 months was -0.93 (95% CI, -2.33 to 0.23) percentage points, and the estimated risk ratio was 0.83 (95% CI, 0.61 to 1.05). The estimated risk ratios were 0.64 (95% CI, 0.46 to 0.93) and 1.22 (95% CI, 0.74 to 1.66) among those younger than 75 years and aged 75 years or older, respectively.
Limitations: Potential residual confounding (no data on body mass index, glycemic control, or duration of diabetes), outcome misclassification, and short follow-up.
Conclusion: Among older adults with diabetes, no clear evidence was found that the incidence of dementia differed overall between patients using GLP-1RAs versus DPP4is. Under conventional statistical criteria, an effect of GLP-1RAs between a 39% decrease and a 5% increase in risk for dementia was highly compatible with the data, although estimates differed by age. Randomized trials are needed to quantify the effect of GLP-1RAs on dementia.
Primary funding source: Gregory Annenberg Weingarten, GRoW @ Annenberg.
October 2025
Post-treatment problems: What can we say about the effect of a treatment among sub-groups who (would) respond in some way? arxivInvestigators are often interested in how a treatment affects an outcome for units responding to treatment in a certain way. We may wish to know the effect among units that, for example, meaningfully implemented an intervention, passed an attention check, or demonstrated some important mechanistic response. Simply conditioning on the observed value of the post-treatment variable introduces problematic biases. Further, the identification assumptions required of several existing strategies are often indefensible. We propose the Treatment Reactive Average Causal Effect (TRACE), which we define as the total effect of treatment in the group that, if treated, would realize a particular value of the relevant post-treatment variable. By reasoning about the effect among the "non-reactive" group, we can identify and estimate the range of plausible values for the TRACE. We demonstrate the use of this approach with three examples: (i) learning the effect of police-perceived race on police violence during traffic stops, a case where point identification may be possible; (ii) estimating effects of a community-policing intervention in Liberia, in communities that meaningfully implemented it, and (iii) studying how in-person canvassing affects support for transgender rights, among participants for whom the intervention would result in more positive feelings towards transgender people.
November 2025
Safe inference outside of randomized trials: Application of the stability-controlled quasi-experiment to the effects of three COVID-19 therapies Observational StudiesWhen estimating the effects of medical therapies from their use outside of randomized trials, researchers often rely on assumptions that are difficult to justify and impossible to verify. The resulting estimates may thus be far from their intended causal targets, potentially making a harmful treatment appear beneficial or vice versa. We review the stability-controlled quasi-experiment (SCQE), a method suited to settings where a treatment's prevalence changes sharply over a short period, and apply it to assess the effects of remdesivir, hydroxychloroquine, and dexamethasone on COVID-19 mortality. Rather than requiring debate about the absence (or limited strength) of unobserved confounding, about "parallel trends", or other well-known strategies, the SCQE asks users to reason about a "baseline trend" assumption. In this setting, this asks "How much could COVID-19 mortality have changed over a short period, absent the treatment change in question?" Any plausible range for this assumption yields a corresponding range of plausible causal effect estimates. Conversely, SCQE clarifies what baseline trends must be defended or refuted in order to defend or refute a given conclusion about a treatment's efficacy or harm. Using data from two hospital systems early in the COVID-19 pandemic, we show that SCQE could have enabled safe yet partially informative inferences about treatment effects before clinical trial completion, producing conclusions consistent with the results of eventual randomized trials.
December 2025
Making DAGs even more useful: using augmented causal diagrams to depict counterfactual, study design, measurement, analytical, and interventional features International Journal of EpidemiologySince their mainstream introduction in the 1990s, causal diagrams, including directed acyclic graphs (DAGs), have been increasingly used to depict our causal knowledge of the world we study and to guide study design, analysis, and interpretation for causal inference. Beyond describing the data-generating mechanism, DAGs are typically used to select variables for confounding control. However, to do more, researchers have had to modify or augment their DAGs with additional features reflecting theoretical what-if scenarios or study-dependent processes affecting the data, analysis, and interpretation. In this journal, Mansournia et al. show how to depict balancing scores on DAGs—a welcome addition to the growing use of augmented graphs. This commentary will first place this work in the broader context of how researchers have tried to do more with augmented causal diagrams, including augmented directed acyclic graphs (ADAGs), which we use more expansively to include all such graphs in this commentary. Then, we highlight some useful features of ADAGs that depict balancing scores, focusing on the propensity score for illustration. We conclude with some remarks on the future of graphical augmentation.
2026
January 2026
Re. Prediagnostic Exposures and Cancer Survival: Can a Meaningful Causal Estimand be Specified? EpidemiologyAlbers and colleagues highlighted the challenges of defining meaningful and useful estimands for effects of pre-cancer-diagnosis exposures on post-cancer-diagnosis outcomes. We aim to contribute to this conversation by discussing (i) truncation and (ii) specification of causal estimands for cancer-relevant target populations