gcomp
AnalysisG-computation formula via Monte Carlo simulation (includes gcomptab)
Version 1.1.0 | 2026-04-26
gcomp implements Robins' parametric g-computation formula in Stata using Monte Carlo simulation and bootstrap inference. It supports two related causal-inference workflows: causal mediation analysis and longitudinal causal-effect estimation in the presence of time-varying confounding.
This Stata-Tools release is a maintained fork of SSC gformula v1.16 beta (Rhian Daniel, 2021) with bug fixes, modernization, and removal of SSC dependencies. The companion command gcomptab formats supported mediation results into publication-ready Excel tables.
Requirements
- Stata 16 or later
- No external dependencies — all required functionality is bundled
Installation
capture ado uninstall gcomp
net install gcomp, from("https://raw.githubusercontent.com/tpcopeland/Stata-Tools/main/gcomp") replace
Commands
| Command | Description |
|---|---|
gcomp |
Estimate mediation effects or longitudinal causal effects via parametric g-computation |
gcomptab |
Export supported gcomp mediation results to a formatted Excel table |
How It Works
The core idea
Standard regression adjusts for confounders by including them in a model. But when confounders are themselves affected by prior exposure — the time-varying confounding problem — this approach can introduce bias. Similarly, when a mediator-outcome confounder is affected by the exposure, standard mediation methods fail.
G-computation solves both problems by:
- Fitting parametric models to the observed data (one model per variable you need to simulate)
- Simulating a copy of the population under each hypothetical scenario (e.g. "everyone treated" vs. "no one treated")
- Comparing outcomes across scenarios to estimate the causal effect
Bootstrap confidence intervals are obtained by repeating the entire procedure on resampled data.
Two required ingredients
Every gcomp call needs:
commands()— tells Stata which model family to use for each simulated variable:logit(binary),regress(continuous),mlogit(multinomial), orologit(ordinal)equations()— tells Stata which predictors belong in each of those models
Both use a colon-separated, comma-delimited syntax:
commands(m: logit, y: logit)
equations(m: x c, y: m x c)
Choosing a workflow
| Use case | Core syntax pattern | What you get |
|---|---|---|
| Mediation (binary exposure) | gcomp ..., outcome() mediation obe exposure() mediator() base_confs() |
TCE, NDE, NIE, PM, and optionally CDE |
| Mediation (categorical exposure) | gcomp ..., outcome() mediation oce exposure() mediator() base_confs() |
Per-level mediation contrasts |
| Time-varying confounding | gcomp ..., outcome() idvar() tvar() varyingcovariates() intvars() interventions() |
Potential outcomes under hypothetical interventions |
| Excel export | gcomptab, xlsx() sheet() |
Publication-ready table from supported mediation results |
Demo
All demo output below is generated by gcomp/demo/demo_gcomp.do. Bootstrap samples are kept low for speed — use sim(10000) samples(1000) for real analyses.
Binary-exposure mediation (OBE)
Binary exposure, binary mediator, binary outcome, continuous confounder. Reports TCE, NDE, NIE, and proportion mediated.
Console output (click to expand)
. gcomp y m x c, outcome(y) mediation obe
> exposure(x) mediator(m)
> commands(m: logit, y: logit)
> equations(m: x c, y: m x c)
> base_confs(c) sim(500) samples(50) seed(42)
G-computation procedure using Monte Carlo simulation: mediation
Outcome variable: y
Exposure variable(s): x
Mediator variable(s): m
Size of MC sample: 500
No. of bootstrap samples: 50
A summary of the specified parametric models:
(for simulation under different interventions)
Variable | Command | Prediction equation
------------+---------+-------------------------------------------------------
m | logit | x c
y | logit | m x c
------------------------------------------------------------------------------
No. of subjects = 1000
Bootstrapping:
(running _gcomp_bootstrap on estimation sample)
Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
G-computation formula estimates of the total causal effect and the natural
direct/indirect effects
-------------------------------------------------------------------------------------
| G-computation Bootstrap Normal-based
| estimate (MD) Std. Err. z P>|z| [95% Conf. Interval]
-------------+-----------------------------------------------------------------------
TCE | .056 .035098 1.6 0.111 -.0127909 .1247909
NDE | .018 .0329626 .55 0.585 -.0466054 .0826054
NIE | .038 .0253395 1.5 0.134 -.0116646 .0876646
PM | .6785714 1.64664 .41 0.680 -2.548783 3.905926
-------------------------------------------------------------------------------------
Controlled direct effect (CDE)
Add control(0) to fix the mediator at 0 for all subjects and estimate the CDE alongside the natural effects.
Console output (click to expand)
. gcomp y m x c, outcome(y) mediation obe
> exposure(x) mediator(m)
> commands(m: logit, y: logit)
> equations(m: x c, y: m x c)
> base_confs(c) control(0) sim(500) samples(50) seed(42)
G-computation procedure using Monte Carlo simulation: mediation
Outcome variable: y
Exposure variable(s): x
Mediator variable(s): m
Size of MC sample: 500
No. of bootstrap samples: 50
A summary of the specified parametric models:
(for simulation under different interventions)
Variable | Command | Prediction equation
------------+---------+-------------------------------------------------------
m | logit | x c
y | logit | m x c
------------------------------------------------------------------------------
No. of subjects = 1000
Bootstrapping:
(running _gcomp_bootstrap on estimation sample)
Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.............................................x.... 50
G-computation formula estimates of the total causal effect, the natural
direct/indirect effects, and the controlled direct effect
Control value(s):
m=0
-------------------------------------------------------------------------------------
| G-computation Bootstrap Normal-based
| estimate (MD) Std. Err. z P>|z| [95% Conf. Interval]
-------------+-----------------------------------------------------------------------
TCE | .014 .0368658 .38 0.704 -.0582557 .0862557
NDE | .02 .0336395 .59 0.552 -.0459322 .0859322
NIE | -.006 .0243236 -.25 0.805 -.0536733 .0416733
PM | -.4285714 1.17437 -.36 0.715 -2.730295 1.873152
CDE | -.016 .0314072 -.51 0.610 -.0775571 .0455571
-------------------------------------------------------------------------------------
Categorical-exposure mediation (OCE)
Use oce when the exposure has more than two levels. Each non-baseline level produces its own set of mediation contrasts. gcomptab does not format oce output — review e() results directly.
Console output (click to expand)
. gcomp y m x c, outcome(y) mediation oce
> exposure(x) mediator(m)
> commands(m: logit, y: logit)
> equations(m: x c, y: m x c)
> base_confs(c) sim(500) samples(50) seed(42)
G-computation procedure using Monte Carlo simulation: mediation
Warning: Option baseline() has not been specified, and therefore the baseline
will be assumed to be 0.
Outcome variable: y
Exposure variable(s): x
Mediator variable(s): m
Size of MC sample: 500
No. of bootstrap samples: 50
A summary of the specified parametric models:
(for simulation under different interventions)
Variable | Command | Prediction equation
------------+---------+-------------------------------------------------------
m | logit | x c
y | logit | m x c
------------------------------------------------------------------------------
No. of subjects = 1000
Bootstrapping:
(running _gcomp_bootstrap on estimation sample)
Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
G-computation formula estimates of the total causal effect and the natural
direct/indirect effects
Baseline value(s):
x=0
-------------------------------------------------------------------------------------
| G-computation Bootstrap Normal-based
| estimate (MD) Std. Err. z P>|z| [95% Conf. Interval]
-------------+-----------------------------------------------------------------------
TCE(1) | -.054 .0347287 -1.55 0.120 -.122067 .014067
TCE(2) | -.092 .0302558 -3.04 0.002 -.1513002 -.0326998
-------------+-----------------------------------------------------------------------
NDE(1) | -.052 .0321308 -1.62 0.106 -.1149751 .0109751
NDE(2) | -.104 .0367025 -2.83 0.005 -.1759355 -.0320645
-------------+-----------------------------------------------------------------------
NIE(1) | -.002 .0311316 -.06 0.949 -.0630168 .0590168
NIE(2) | .012 .0305374 .39 0.694 -.0478522 .0718522
-------------+-----------------------------------------------------------------------
PM(1) | .037037 .6929801 .05 0.957 -1.321179 1.395253
PM(2) | -.1304348 .2787365 -.47 0.640 -.6767484 .4158788
-------------------------------------------------------------------------------------
Time-varying confounding
Panel data with 120 subjects over 3 time points. A is the time-varying treatment, L is the time-varying confounder affected by prior treatment, and outcome is measured at end of follow-up (eofu). Estimates potential outcomes under "always treat" (A=1) vs "never treat" (A=0).
Console output (click to expand)
. gcomp outcome L0 A L Alag Llag id time, outcome(outcome)
> idvar(id) tvar(time)
> varyingcovariates(L) fixedcovariates(L0)
> laggedvars(Alag Llag) lagrules(Alag: A 1, Llag: L 1)
> commands(A: logit, outcome: logit, L: regress)
> equations(A: L0 L, outcome: Alag Llag L0, L: Alag Llag L0)
> intvars(A) interventions(A=1, A=0)
> sim(120) samples(5) seed(20260421) eofu
G-computation procedure using Monte Carlo simulation: time-varying confounding
Outcome variable: outcome
Intervention variable(s): A
Outcome type: binary, measured at end of follow-up
Size of MC sample: 120
No. of bootstrap samples: 5
A summary of the specified parametric models:
(for simulation under different interventions)
Variable | Command | Prediction equation
------------+---------+-------------------------------------------------------
L | regress | Alag Llag L0
A | logit | L0 L
outcome | logit | Alag Llag L0
------------------------------------------------------------------------------
Warning: 240 observations of the outcome variable are being ignored because
they were recorded before the end of follow-up
No. of subjects = 120
Bootstrapping:
(running _gcomp_bootstrap on estimation sample)
Bootstrap replications (5)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.....
G-computation formula estimates of the expected values of the potential outcome
under each of the specified interventions and under no intervention (i.e. as
simulated under the observational regime). For comparison, the mean outcome in
the observed data is also shown.
Specified interventions:
Intervention 1: A=1
Intervention 2: A=0
----------------------------------------------------------------------------------
| G-computation
| estimate of Bootstrap Normal-based
outcome | mean PO Std. Err. z P>|z| [95% Conf. Interval]
-------------+--------------------------------------------------------------------
Int. 1 | .1416667 .0560258 2.53 0.011 .0318581 .2514752
Int. 2 | .1916667 .0625278 3.07 0.002 .0691145 .3142188
-------------+--------------------------------------------------------------------
Obs. regime |
simulated | .1666667 .0314024 5.31 0.000 .1051191 .2282143
observed | .1583333
----------------------------------------------------------------------------------
Export mediation results to Excel
Run gcomptab immediately after a supported mediation model (obe, linexp, specific, or baseline — not oce). The demo produces demo_gcomptab.xlsx with Normal and Percentile CI sheets.
gcomptab, xlsx("demo_gcomptab.xlsx") sheet("Normal CI") ///
title("Table 1. Causal Mediation Analysis (Normal CIs)")
gcomptab, xlsx("demo_gcomptab.xlsx") sheet("Percentile CI") ///
ci(percentile) title("Table 2. Mediation Results (Percentile CIs)")
Key Options
gcomp
| Option | Role |
|---|---|
outcome(varname) |
Identify the outcome variable |
commands(string) |
Choose the model family for each simulated variable |
equations(string) |
Specify the predictor set for each simulated variable |
mediation |
Switch into mediation mode |
exposure(varlist) |
Identify the exposure variable(s) for mediation |
mediator(varlist) |
Identify the mediator variable(s) |
base_confs(varlist) |
List baseline confounders for mediation |
control(string) |
Set mediator level for CDE; without this, CDE is not estimated |
idvar(varname) / tvar(varname) |
Identify subject and time in long data |
varyingcovariates(varlist) |
List time-varying confounders affected by prior exposure |
intvars(varlist) / interventions(string) |
Define the variables and rules for hypothetical interventions |
eofu |
Outcome is measured only on the last row per subject |
simulations(#) / samples(#) |
Set Monte Carlo sample size and bootstrap replications |
diagnostics |
Display model-fit statistics during initial estimation |
all |
Report all four CI types (normal, percentile, BC, BCa) |
seed(#) |
Set random number seed for reproducibility |
gcomptab
| Option | Role |
|---|---|
xlsx(filename) |
Excel workbook to create or update |
sheet(string) |
Sheet name to create or replace |
ci(string) |
Confidence-interval type: normal (default), percentile, bc, bca |
title(string) |
Table title written into cell A1 |
labels(string) |
Override the default effect labels (backslash-separated) |
decimal(#) |
Decimal places for numeric values (default 3, range 1-6) |
boldp(#) |
Bold numeric cells when Wald p < cutoff |
highlight(#) |
Highlight row in yellow when Wald p < cutoff |
zebra |
Alternating row shading |
footnote(string) |
Footnote text below the table |
Returned Results
After gcomp
All results are stored in e():
Scalars: e(N) (subjects), e(MC_sims) (Monte Carlo sample size), e(samples) (bootstrap replications).
Matrices: e(b) (point estimates), e(V) (variance-covariance), e(se) (standard errors), e(ci_normal) (normal CIs), and optionally e(ci_percentile), e(ci_bc), e(ci_bca) (with all). e(effects) provides an effecttab-compatible matrix (estimate, ci_lower, ci_upper, pvalue) for non-oce mediation. e(model_diagnostics) stores model-fit statistics.
Macros: e(cmd) ("gcomp"), e(analysis_type) ("mediation" or "time_varying"), e(outcome), e(exposure), e(mediator), e(mediation_type), e(scale), e(msm).
Convenience scalars (mediation, non-oce): e(tce), e(nde), e(nie), e(pm), e(cde), and their SEs (e(se_tce), etc.).
Convenience scalars (mediation, oce): e(tce_j), e(nde_j), e(nie_j), e(pm_j), e(cde_j) for each contrast j.
Time-varying mode: e(obs_data) (observed outcome prevalence).
After gcomptab
Results are stored in r(): r(N_effects) (4 or 5), r(tce), r(nde), r(nie), r(pm), r(cde) (if applicable), r(xlsx), r(sheet), r(ci).
References
- Robins JM. 1986. A new approach to causal inference in mortality studies with sustained exposure periods. Mathematical Modelling 7(9-12):1393-1512.
- Daniel RM, De Stavola BL, Cousens SN. 2011. gformula: Estimating causal effects in the presence of time-varying confounding or mediation using the g-computation formula. Stata Journal 11(4):479-517.
- Taubman SL, Robins JM, Mittleman MA, Hernan MA. 2009. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International Journal of Epidemiology 38(6):1599-1611.
- VanderWeele TJ. 2015. Explanation in causal inference: methods for mediation and interaction. Oxford University Press.
Version History
- 1.1.0 (2026-04-26): Input validation and model-fit diagnostics.
commands(),equations(), and related options are now validated before the bootstrap loop — mismatches produce clear error messages naming the offending variable. Newdiagnosticsoption displays model-fit statistics (N, convergence, R^2/pseudo-R^2, RMSE) for each parametric model during the initial estimation run. Diagnostics are always stored ine(model_diagnostics). - 1.0.3 (2026-04-22): Fix time-varying g-computation regression — varlist2 ordering had been reversed (outcome first) in v1.0.2, causing
predict pred_Yto fire before time-varying confounders and treatment were sampled at each visit. Every simulated outcome came out as 1 (silent wrong results);minsimerrored with r(503). Restores outcome-last ordering from v1.0.1. Adds V7.3 minsim regression test and tightens V7.1 assertions to guard against re-introduction. - 1.0.2 (2026-04-19): Stata-Tools fork release with bundled Excel export support via
gcomptab
Author
Fork maintainer: Timothy P Copeland, Karolinska Institutet. Original command by Rhian Daniel, London School of Hygiene and Tropical Medicine.
License
MIT