Functions
didint() - Estimate ATT
The function didint() estimates the ATT while adjusting for covariates that may vary by state, time, or both.
DiDInt.didint — Function
didint(
outcome::Union{AbstractString, Symbol},
state::Union{AbstractString, Symbol},
time::Union{AbstractString, Symbol},
data::Union{DataFrame, RConnector.RDataFrame};
gvar::Union{AbstractString, Symbol, Nothing} = nothing,
treated_states::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Nothing} = nothing,
treatment_times::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Date, Nothing} = nothing,
date_format::Union{AbstractString, Nothing} = nothing,
covariates::Union{T, Vector{T}} where T <: Union{AbstractString, Symbol, Nothing} = nothing,
ccc::AbstractString = "int",
agg::AbstractString = "cohort",
weighting::AbstractString = "both",
ref::Union{Dict{<:AbstractString, <:AbstractString}, Nothing} = nothing,
freq::Union{AbstractString, Nothing} = nothing,
freq_multiplier::Number = 1,
start_date::Union{AbstractString, Number, Date, Nothing} = nothing,
end_date::Union{AbstractString, Number, Date, Nothing} = nothing,
nperm::Number = 999,
verbose::Bool = true,
seed::Number = rand(1:1000000),
use_pre_controls::Bool = false,
notyet::Union{Nothing, Bool} = nothing,
hc::Union{AbstractString, Number} = "hc1",
truejack::Bool = false,
recover::Union{Bool, Nothing} = nothing
)The didint() function estimates the average effect of treatment on the treated (ATT) while accounting for covariates that may vary by state, time, or by both state and time simultaneously.
Details
The arguments treated_states and treated_times should be entered such that the first element in treated_states refers to the state treated at the date entered as the first element in treated_times, the second element in treated_states refers to the state treated at the date entered as the second element in treated_times, and so on.
Parameters
Required Parameters
outcome::Union{AbstractString, Symbol}Input the name of the column which identifies the outcome of interest.state::Union{AbstractString, Symbol}Input the name of the column which identifies the state membership of the observation.time::Union{AbstractString, Symbol}Input the name of the column which identifies the date of the observation.data::Union{DataFrame, RConnector.RDataFrame}The DataFrame to be used for the analysis.
Treatment Specification
gvar::Union{AbstractString, Symbol, Nothing} = nothingName of the column which indicates time of first treatment for each state.treated_states::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Nothing} = nothingA vector of strings (or a single string) noting the treated state(s).treatment_times::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Date, Nothing} = nothingA vector (or single entry) denoting the associated treatment times of the 'treated_states'.
Model Specification
ccc::AbstractString = "int"Specify which version of DID-INT should be used. Options are:"hom","time","state", "add", and"int".agg::AbstractString = "cohort"Enter the aggregation method as a string. Options are:"cohort","simple","state","sgt","none".weighting::AbstractString = "both"Specify which weighting method should be used. Options are:"both","att","diff", or"none".covariates::Union{T, Vector{T}} where T <: Union{AbstractString, Symbol, Nothing} = nothingA vector of covariates entered as either strings or symbols (or a single covariate string or symbol), or,nothing(default).notyet::Bool = falseDetermine if pre-treatment periods from treated states should be used as controls.recover::Union{Bool, Nothing} = nothingDetermine if "zero'd out" lambda coefficients should be re-estimated while dropping any covariates that had introduced collinearity. Defaults totrueforccc = "int"andfalseotherwise. Note that for anycccoption besides"int"that setting recover totrueorfalsewill not affect the calculated ATTs.ref::Union{Dict{<:AbstractString, <:AbstractString}, Nothing} = nothingA dictionary specifying which category in a categorical variable should be used as the reference (baseline) category.
Date Processing & Period Grid Construction
date_format::Union{AbstractString, Nothing} = nothingDate format (e.g. "yyyy" or "yyyy-mm-dd") to be used when parsing string dates from the time column, orstart_date,end_date, andtreatment_timesarguments.freq::Union{AbstractString, Nothing} = nothingA string indicating the desired timeframe of a period for the analysis for staggered adoption scenarios. Options are:"year","month","week","day".freq_multiplier::Number = 1An integer by which the 'freq' argument should be multiplied in a staggered adoption scenario, e.g. if a two-year period is desired, setfreq = "year"andfreq_multiplier = 2.start_date::Union{AbstractString, Number, Date, Nothing} = nothingAny data prior this date is dropped, and serves as the starting date for the period grid construction if activated.end_date::Union{AbstractString, Number, Date, Nothing} = nothingAny data after this date is dropped, and serves as the end date for the period grid construction if activated.
Inference
nperm::Number = 999The number of unique permutations (not including the initial assignment of treatment times) to be considered when performing the randomization inference.verbose::Bool = trueA boolean option for displaying progress of the randomization procedure.seed::Number = rand(1:1000000)An integer to set the random seed for the randomization inference procedure.hc::Union{AbstractString, Number} = "hc1"Specify which heteroskedasticity-consistent covariance matrix estimator (HCCME) should be used. Options are0,1,2,3, and4(or"hc0","hc1","hc2","hc3","hc4").truejack::Bool = falseWhen aggregation is set to either"add","time", or"hom", then in order to get valid jackknife standard errors, we need to re-estimate the DID-INT model from square one (running the large FixedEffectsModels). This is because the covariate effects in those cases depend on values from across states, so dropping a state will change the lambda values, this is not true for the aggregation options of"int"or"state".
Returns
A DataFrame of results including the estimate of the ATT as well as standard errors and p-values.
Citations
- Karim & Webb (2025). "Good Controls Gone Bad: Difference-in-Differences with Covariates". https://arxiv.org/abs/2412.14447
- MacKinnon & Webb (2020). "Randomization inference for difference-in-differences with few treated clusters". https://doi.org/10.1016/j.jeconom.2020.04.024
didint_plot() - Prepare Data for Visualization
The didint_plot() function produces a dataset in a long format that can easily be used for plotting parallel trends or event study plots.
DiDInt.didint_plot — Function
didint_plot(
outcome::Union{AbstractString, Symbol},
state::Union{AbstractString, Symbol},
time::Union{AbstractString, Symbol},
data::DataFrame;
gvar::Union{AbstractString, Symbol, Nothing} = nothing,
treated_states::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Nothing} = nothing,
treatment_times::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Date, Nothing} = nothing,
date_format::Union{AbstractString, Nothing} = nothing,
covariates::Union{Vector{<:AbstractString}, AbstractString, Nothing} = nothing,
ref::Union{Dict{<:AbstractString, <:AbstractString}, Nothing} = nothing,
ccc::Union{AbstractString, Vector{<:AbstractString}} = "all",
event::Bool = false,
weights::Bool = true,
ci::Number = 0.95,
freq::Union{AbstractString, Nothing} = nothing,
freq_multiplier::Number = 1,
start_date::Union{AbstractString, Number, Date, Nothing} = nothing,
end_date::Union{AbstractString, Number, Date, Nothing} = nothing,
hc::Union{AbstractString, Number} = "hc1"
)The didint_plot() function produces a dataset in a long format that can easily be used for plotting parallel trends or event study plots.
Details
The arguments treated_states and treated_times should be entered such that the first element in treated_states refers to the state treated at the date entered as the first element in treated_times, the second element in treated_states refers to the state treated at the date entered as the second element in treated_times, and so on.
Parameters
Required Parameters
outcome::Union{AbstractString, Symbol}Input the name of the column which identifies the outcome of interest.state::Union{AbstractString, Symbol}Input the name of the column which identifies the state membership of the observation.time::Union{AbstractString, Symbol}Input the name of the column which identifies the date of the observation.data::DataFrameThe DataFrame to be used for the analysis.
Treatment Specification
gvar::Union{AbstractString, Symbol, Nothing} = nothingName of the column which indicates time of first treatment for each state.treatment_times::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Date, Nothing} = nothingA vector (or single entry) denoting the associated treatment times of the 'treated_states'.
Model Specifications
ccc::Union{AbstractString, Vector{<:AbstractString}} = "all"Specify which versions of DID-INT should be used. Options are either"all", or any combination of:"none","hom","time","state", "add", and"int".covariates::Union{T, Vector{T}} where T <: Union{AbstractString, Symbol} = nothingA vector of covariates entered as either strings or symbols (or a single covariate string or symbol), or,nothing(default).ref::Union{Dict{<:AbstractString, <:AbstractString}, Nothing} = nothingA dictionary specifying which category in a categorical variable should be used as the reference (baseline) category.
Event Study Plot
event::Bool = falseSpecify if data should be prepared for an event study plot as opposed to a parallel trends plot.weights::Bool = trueWhether to use weighted means when computing event study estimates. Iftrue, estimates are computed as weighted averages of state-level means for each period relative to treatment; iffalse, uses simple unweighted averages.treated_states::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Nothing} = nothingA vector of strings (or a single string) noting the treated state(s).ci::Number = 0.95Define the size of confidence bands for the event study plot.hc::Union{AbstractString, Number} = "hc1"Specify which heteroskedasticity-consistent covariance matrix estimator (HCCME) should be used. Options are0,1,2,3, and4(or"hc0","hc1","hc2","hc3","hc4").
Date Processing & Period Grid Construction
date_format::Union{AbstractString, Nothing} = nothingDate format (e.g. "yyyy" or "yyyy-mm-dd") to be used when parsing string dates from the time column, orstart_date,end_date, andtreatment_timesarguments.freq::Union{AbstractString, Nothing} = nothingA string indicating the desired timeframe of a period for the analysis for staggered adoption scenarios. Options are:"year","month","week","day".freq_multiplier::Number = 1An integer by which the 'freq' argument should be multiplied in a staggered adoption scenario, e.g. if a two-year period is desired, setfreq = "year"andfreq_multiplier = 2.start_date::Union{AbstractString, Number, Date, Nothing} = nothingAny data prior this date is dropped, and serves as the starting date for the period grid construction if activated.end_date::Union{AbstractString, Number, Date, Nothing} = nothingAny data after this date is dropped, and serves as the end date for the period grid construction if activated.
Returns
A DataFrame of means and means residualized by the specified covariates for each of the specified common causal covariates (CCC) violations by period for each state, or, a DataFrame of means of the treated states by periods before/after treatment (again, residualized by the specified covariates and for each of the specified CCC violations).
Citations
- Karim & Webb (2025). "Good Controls Gone Bad: Difference-in-Differences with Covariates". https://arxiv.org/abs/2412.14447
- MacKinnon & Webb (2020). "Randomization inference for difference-in-differences with few treated clusters". https://doi.org/10.1016/j.jeconom.2020.04.024