Skip to main content
← Back to Software

gcomp

Analysis

G-computation formula via Monte Carlo simulation (includes gcomptab)

. net install gcomp, from(...)
View on GitHub →

Version 1.1.0 | 2026-04-26

gcomp implements Robins' parametric g-computation formula in Stata using Monte Carlo simulation and bootstrap inference. It supports two related causal-inference workflows: causal mediation analysis and longitudinal causal-effect estimation in the presence of time-varying confounding.

This Stata-Tools release is a maintained fork of SSC gformula v1.16 beta (Rhian Daniel, 2021) with bug fixes, modernization, and removal of SSC dependencies. The companion command gcomptab formats supported mediation results into publication-ready Excel tables.

Requirements

  • Stata 16 or later
  • No external dependencies — all required functionality is bundled

Installation

capture ado uninstall gcomp
net install gcomp, from("https://raw.githubusercontent.com/tpcopeland/Stata-Tools/main/gcomp") replace

Commands

Command Description
gcomp Estimate mediation effects or longitudinal causal effects via parametric g-computation
gcomptab Export supported gcomp mediation results to a formatted Excel table

How It Works

The core idea

Standard regression adjusts for confounders by including them in a model. But when confounders are themselves affected by prior exposure — the time-varying confounding problem — this approach can introduce bias. Similarly, when a mediator-outcome confounder is affected by the exposure, standard mediation methods fail.

G-computation solves both problems by:

  1. Fitting parametric models to the observed data (one model per variable you need to simulate)
  2. Simulating a copy of the population under each hypothetical scenario (e.g. "everyone treated" vs. "no one treated")
  3. Comparing outcomes across scenarios to estimate the causal effect

Bootstrap confidence intervals are obtained by repeating the entire procedure on resampled data.

Two required ingredients

Every gcomp call needs:

  • commands() — tells Stata which model family to use for each simulated variable: logit (binary), regress (continuous), mlogit (multinomial), or ologit (ordinal)
  • equations() — tells Stata which predictors belong in each of those models

Both use a colon-separated, comma-delimited syntax:

commands(m: logit, y: logit)
equations(m: x c, y: m x c)

Choosing a workflow

Use case Core syntax pattern What you get
Mediation (binary exposure) gcomp ..., outcome() mediation obe exposure() mediator() base_confs() TCE, NDE, NIE, PM, and optionally CDE
Mediation (categorical exposure) gcomp ..., outcome() mediation oce exposure() mediator() base_confs() Per-level mediation contrasts
Time-varying confounding gcomp ..., outcome() idvar() tvar() varyingcovariates() intvars() interventions() Potential outcomes under hypothetical interventions
Excel export gcomptab, xlsx() sheet() Publication-ready table from supported mediation results

Demo

All demo output below is generated by gcomp/demo/demo_gcomp.do. Bootstrap samples are kept low for speed — use sim(10000) samples(1000) for real analyses.

Binary-exposure mediation (OBE)

Binary exposure, binary mediator, binary outcome, continuous confounder. Reports TCE, NDE, NIE, and proportion mediated.

Console output (click to expand)
. gcomp y m x c, outcome(y) mediation obe
>     exposure(x) mediator(m)
>     commands(m: logit, y: logit)
>     equations(m: x c, y: m x c)
>     base_confs(c) sim(500) samples(50) seed(42)
G-computation procedure using Monte Carlo simulation: mediation

   Outcome variable: y
   Exposure variable(s): x
   Mediator variable(s): m
   Size of MC sample: 500
   No. of bootstrap samples: 50

   A summary of the specified parametric models:

   (for simulation under different interventions)

      Variable | Command | Prediction equation
   ------------+---------+-------------------------------------------------------
             m | logit   |  x c
             y | logit   |  m x c
   ------------------------------------------------------------------------------

   No. of subjects = 1000

   Bootstrapping:
(running _gcomp_bootstrap on estimation sample)

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50

G-computation formula estimates of the total causal effect and the natural
direct/indirect effects

 -------------------------------------------------------------------------------------
              |  G-computation      Bootstrap                         Normal-based
              |  estimate (MD)      Std. Err.      z    P>|z|     [95% Conf. Interval]
 -------------+-----------------------------------------------------------------------
       TCE    |        .056           .035098    1.6    0.111    -.0127909    .1247909
       NDE    |        .018          .0329626     .55   0.585    -.0466054    .0826054
       NIE    |        .038          .0253395    1.5    0.134    -.0116646    .0876646
       PM     |    .6785714           1.64664     .41   0.680    -2.548783    3.905926
 -------------------------------------------------------------------------------------

Controlled direct effect (CDE)

Add control(0) to fix the mediator at 0 for all subjects and estimate the CDE alongside the natural effects.

Console output (click to expand)
. gcomp y m x c, outcome(y) mediation obe
>     exposure(x) mediator(m)
>     commands(m: logit, y: logit)
>     equations(m: x c, y: m x c)
>     base_confs(c) control(0) sim(500) samples(50) seed(42)
G-computation procedure using Monte Carlo simulation: mediation

   Outcome variable: y
   Exposure variable(s): x
   Mediator variable(s): m
   Size of MC sample: 500
   No. of bootstrap samples: 50

   A summary of the specified parametric models:

   (for simulation under different interventions)

      Variable | Command | Prediction equation
   ------------+---------+-------------------------------------------------------
             m | logit   |  x c
             y | logit   |  m x c
   ------------------------------------------------------------------------------

   No. of subjects = 1000

   Bootstrapping:
(running _gcomp_bootstrap on estimation sample)

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.............................................x....    50

G-computation formula estimates of the total causal effect, the natural
direct/indirect effects, and the controlled direct effect

         Control value(s):
              m=0

 -------------------------------------------------------------------------------------
              |  G-computation      Bootstrap                         Normal-based
              |  estimate (MD)      Std. Err.      z    P>|z|     [95% Conf. Interval]
 -------------+-----------------------------------------------------------------------
       TCE    |        .014          .0368658     .38   0.704    -.0582557    .0862557
       NDE    |         .02          .0336395     .59   0.552    -.0459322    .0859322
       NIE    |       -.006          .0243236    -.25   0.805    -.0536733    .0416733
       PM     |   -.4285714           1.17437    -.36   0.715    -2.730295    1.873152
       CDE    |       -.016          .0314072    -.51   0.610    -.0775571    .0455571
 -------------------------------------------------------------------------------------

Categorical-exposure mediation (OCE)

Use oce when the exposure has more than two levels. Each non-baseline level produces its own set of mediation contrasts. gcomptab does not format oce output — review e() results directly.

Console output (click to expand)
. gcomp y m x c, outcome(y) mediation oce
>     exposure(x) mediator(m)
>     commands(m: logit, y: logit)
>     equations(m: x c, y: m x c)
>     base_confs(c) sim(500) samples(50) seed(42)
G-computation procedure using Monte Carlo simulation: mediation

Warning: Option baseline() has not been specified, and therefore the baseline
will be assumed to be 0.

   Outcome variable: y
   Exposure variable(s): x
   Mediator variable(s): m
   Size of MC sample: 500
   No. of bootstrap samples: 50

   A summary of the specified parametric models:

   (for simulation under different interventions)

      Variable | Command | Prediction equation
   ------------+---------+-------------------------------------------------------
             m | logit   |  x c
             y | logit   |  m x c
   ------------------------------------------------------------------------------

   No. of subjects = 1000

   Bootstrapping:
(running _gcomp_bootstrap on estimation sample)

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50

G-computation formula estimates of the total causal effect and the natural
direct/indirect effects

         Baseline value(s):
              x=0

 -------------------------------------------------------------------------------------
              |  G-computation      Bootstrap                         Normal-based
              |  estimate (MD)      Std. Err.      z    P>|z|     [95% Conf. Interval]
 -------------+-----------------------------------------------------------------------
    TCE(1)    |       -.054         .0347287    -1.55   0.120     -.122067     .014067
    TCE(2)    |       -.092         .0302558    -3.04   0.002    -.1513002   -.0326998
 -------------+-----------------------------------------------------------------------
    NDE(1)    |       -.052         .0321308    -1.62   0.106    -.1149751    .0109751
    NDE(2)    |       -.104         .0367025    -2.83   0.005    -.1759355   -.0320645
 -------------+-----------------------------------------------------------------------
    NIE(1)    |       -.002         .0311316     -.06   0.949    -.0630168    .0590168
    NIE(2)    |        .012         .0305374      .39   0.694    -.0478522    .0718522
 -------------+-----------------------------------------------------------------------
    PM(1)     |     .037037         .6929801      .05   0.957    -1.321179    1.395253
    PM(2)     |   -.1304348         .2787365     -.47   0.640    -.6767484    .4158788
 -------------------------------------------------------------------------------------

Time-varying confounding

Panel data with 120 subjects over 3 time points. A is the time-varying treatment, L is the time-varying confounder affected by prior treatment, and outcome is measured at end of follow-up (eofu). Estimates potential outcomes under "always treat" (A=1) vs "never treat" (A=0).

Console output (click to expand)
. gcomp outcome L0 A L Alag Llag id time, outcome(outcome)
>     idvar(id) tvar(time)
>     varyingcovariates(L) fixedcovariates(L0)
>     laggedvars(Alag Llag) lagrules(Alag: A 1, Llag: L 1)
>     commands(A: logit, outcome: logit, L: regress)
>     equations(A: L0 L, outcome: Alag Llag L0, L: Alag Llag L0)
>     intvars(A) interventions(A=1, A=0)
>     sim(120) samples(5) seed(20260421) eofu
G-computation procedure using Monte Carlo simulation: time-varying confounding

   Outcome variable: outcome
   Intervention variable(s): A
   Outcome type: binary, measured at end of follow-up
   Size of MC sample: 120
   No. of bootstrap samples: 5

   A summary of the specified parametric models:

   (for simulation under different interventions)

      Variable | Command | Prediction equation
   ------------+---------+-------------------------------------------------------
             L | regress |  Alag Llag L0
             A | logit   |  L0 L
       outcome | logit   |  Alag Llag L0
   ------------------------------------------------------------------------------

   Warning: 240 observations of the outcome variable are being ignored because
   they were recorded before the end of follow-up

   No. of subjects = 120

   Bootstrapping:
(running _gcomp_bootstrap on estimation sample)

Bootstrap replications (5)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.....

G-computation formula estimates of the expected values of the potential outcome
under each of the specified interventions and under no intervention (i.e. as
simulated under the observational regime). For comparison, the mean outcome in
the observed data is also shown.

         Specified interventions:
              Intervention 1: A=1
              Intervention 2: A=0

 ----------------------------------------------------------------------------------
              |  G-computation
              |   estimate of    Bootstrap                         Normal-based
      outcome |     mean PO      Std. Err.      z    P>|z|     [95% Conf. Interval]
 -------------+--------------------------------------------------------------------
      Int. 1  |    .1416667      .0560258     2.53   0.011     .0318581    .2514752
      Int. 2  |    .1916667      .0625278     3.07   0.002     .0691145    .3142188
 -------------+--------------------------------------------------------------------
 Obs. regime  |
   simulated  |    .1666667      .0314024     5.31   0.000     .1051191    .2282143
    observed  |    .1583333
 ----------------------------------------------------------------------------------

Export mediation results to Excel

Run gcomptab immediately after a supported mediation model (obe, linexp, specific, or baseline — not oce). The demo produces demo_gcomptab.xlsx with Normal and Percentile CI sheets.

gcomptab, xlsx("demo_gcomptab.xlsx") sheet("Normal CI") ///
    title("Table 1. Causal Mediation Analysis (Normal CIs)")

gcomptab, xlsx("demo_gcomptab.xlsx") sheet("Percentile CI") ///
    ci(percentile) title("Table 2. Mediation Results (Percentile CIs)")

Key Options

gcomp

Option Role
outcome(varname) Identify the outcome variable
commands(string) Choose the model family for each simulated variable
equations(string) Specify the predictor set for each simulated variable
mediation Switch into mediation mode
exposure(varlist) Identify the exposure variable(s) for mediation
mediator(varlist) Identify the mediator variable(s)
base_confs(varlist) List baseline confounders for mediation
control(string) Set mediator level for CDE; without this, CDE is not estimated
idvar(varname) / tvar(varname) Identify subject and time in long data
varyingcovariates(varlist) List time-varying confounders affected by prior exposure
intvars(varlist) / interventions(string) Define the variables and rules for hypothetical interventions
eofu Outcome is measured only on the last row per subject
simulations(#) / samples(#) Set Monte Carlo sample size and bootstrap replications
diagnostics Display model-fit statistics during initial estimation
all Report all four CI types (normal, percentile, BC, BCa)
seed(#) Set random number seed for reproducibility

gcomptab

Option Role
xlsx(filename) Excel workbook to create or update
sheet(string) Sheet name to create or replace
ci(string) Confidence-interval type: normal (default), percentile, bc, bca
title(string) Table title written into cell A1
labels(string) Override the default effect labels (backslash-separated)
decimal(#) Decimal places for numeric values (default 3, range 1-6)
boldp(#) Bold numeric cells when Wald p < cutoff
highlight(#) Highlight row in yellow when Wald p < cutoff
zebra Alternating row shading
footnote(string) Footnote text below the table

Returned Results

After gcomp

All results are stored in e():

Scalars: e(N) (subjects), e(MC_sims) (Monte Carlo sample size), e(samples) (bootstrap replications).

Matrices: e(b) (point estimates), e(V) (variance-covariance), e(se) (standard errors), e(ci_normal) (normal CIs), and optionally e(ci_percentile), e(ci_bc), e(ci_bca) (with all). e(effects) provides an effecttab-compatible matrix (estimate, ci_lower, ci_upper, pvalue) for non-oce mediation. e(model_diagnostics) stores model-fit statistics.

Macros: e(cmd) ("gcomp"), e(analysis_type) ("mediation" or "time_varying"), e(outcome), e(exposure), e(mediator), e(mediation_type), e(scale), e(msm).

Convenience scalars (mediation, non-oce): e(tce), e(nde), e(nie), e(pm), e(cde), and their SEs (e(se_tce), etc.).

Convenience scalars (mediation, oce): e(tce_j), e(nde_j), e(nie_j), e(pm_j), e(cde_j) for each contrast j.

Time-varying mode: e(obs_data) (observed outcome prevalence).

After gcomptab

Results are stored in r(): r(N_effects) (4 or 5), r(tce), r(nde), r(nie), r(pm), r(cde) (if applicable), r(xlsx), r(sheet), r(ci).

References

  • Robins JM. 1986. A new approach to causal inference in mortality studies with sustained exposure periods. Mathematical Modelling 7(9-12):1393-1512.
  • Daniel RM, De Stavola BL, Cousens SN. 2011. gformula: Estimating causal effects in the presence of time-varying confounding or mediation using the g-computation formula. Stata Journal 11(4):479-517.
  • Taubman SL, Robins JM, Mittleman MA, Hernan MA. 2009. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International Journal of Epidemiology 38(6):1599-1611.
  • VanderWeele TJ. 2015. Explanation in causal inference: methods for mediation and interaction. Oxford University Press.

Version History

  • 1.1.0 (2026-04-26): Input validation and model-fit diagnostics. commands(), equations(), and related options are now validated before the bootstrap loop — mismatches produce clear error messages naming the offending variable. New diagnostics option displays model-fit statistics (N, convergence, R^2/pseudo-R^2, RMSE) for each parametric model during the initial estimation run. Diagnostics are always stored in e(model_diagnostics).
  • 1.0.3 (2026-04-22): Fix time-varying g-computation regression — varlist2 ordering had been reversed (outcome first) in v1.0.2, causing predict pred_Y to fire before time-varying confounders and treatment were sampled at each visit. Every simulated outcome came out as 1 (silent wrong results); minsim errored with r(503). Restores outcome-last ordering from v1.0.1. Adds V7.3 minsim regression test and tightens V7.1 assertions to guard against re-introduction.
  • 1.0.2 (2026-04-19): Stata-Tools fork release with bundled Excel export support via gcomptab

Author

Fork maintainer: Timothy P Copeland, Karolinska Institutet. Original command by Rhian Daniel, London School of Hygiene and Tropical Medicine.

License

MIT