Functions

didint() - Estimate ATT

The function didint() estimates the ATT while adjusting for covariates that may vary by state, time, or both.

DiDInt.didintFunction
didint(
       outcome::Union{AbstractString, Symbol},
       state::Union{AbstractString, Symbol},
       time::Union{AbstractString, Symbol},
       data::Union{DataFrame, RConnector.RDataFrame};
       gvar::Union{AbstractString, Symbol, Nothing} = nothing,
       treated_states::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Nothing} = nothing,
       treatment_times::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Date, Nothing} = nothing,
       date_format::Union{AbstractString, Nothing} = nothing,
       covariates::Union{T, Vector{T}} where T <: Union{AbstractString, Symbol, Nothing} = nothing,
       ccc::AbstractString = "int",
       agg::AbstractString = "cohort",
       weighting::AbstractString = "both",
       ref::Union{Dict{<:AbstractString, <:AbstractString}, Nothing} = nothing,
       freq::Union{AbstractString, Nothing} = nothing,
       freq_multiplier::Number = 1,
       start_date::Union{AbstractString, Number, Date, Nothing} = nothing,
       end_date::Union{AbstractString, Number, Date, Nothing} = nothing,
       nperm::Number = 999,
       verbose::Bool = true,
       seed::Number = rand(1:1000000),
       use_pre_controls::Bool = false,
       notyet::Union{Nothing, Bool} = nothing,
       hc::Union{AbstractString, Number} = "hc1",
       truejack::Bool = false,
       recover::Union{Bool, Nothing} = nothing
      )

The didint() function estimates the average effect of treatment on the treated (ATT) while accounting for covariates that may vary by state, time, or by both state and time simultaneously.

Details

The arguments treated_states and treated_times should be entered such that the first element in treated_states refers to the state treated at the date entered as the first element in treated_times, the second element in treated_states refers to the state treated at the date entered as the second element in treated_times, and so on.

Parameters

Required Parameters

  • outcome::Union{AbstractString, Symbol} Input the name of the column which identifies the outcome of interest.
  • state::Union{AbstractString, Symbol} Input the name of the column which identifies the state membership of the observation.
  • time::Union{AbstractString, Symbol} Input the name of the column which identifies the date of the observation.
  • data::Union{DataFrame, RConnector.RDataFrame} The DataFrame to be used for the analysis.

Treatment Specification

  • gvar::Union{AbstractString, Symbol, Nothing} = nothing Name of the column which indicates time of first treatment for each state.
  • treated_states::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Nothing} = nothing A vector of strings (or a single string) noting the treated state(s).
  • treatment_times::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Date, Nothing} = nothing A vector (or single entry) denoting the associated treatment times of the 'treated_states'.

Model Specification

  • ccc::AbstractString = "int" Specify which version of DID-INT should be used. Options are: "hom", "time", "state", "add", and "int".
  • agg::AbstractString = "cohort" Enter the aggregation method as a string. Options are: "cohort", "simple", "state", "sgt", "none".
  • weighting::AbstractString = "both" Specify which weighting method should be used. Options are: "both", "att", "diff", or "none".
  • covariates::Union{T, Vector{T}} where T <: Union{AbstractString, Symbol, Nothing} = nothing A vector of covariates entered as either strings or symbols (or a single covariate string or symbol), or, nothing (default).
  • notyet::Bool = false Determine if pre-treatment periods from treated states should be used as controls.
  • recover::Union{Bool, Nothing} = nothing Determine if "zero'd out" lambda coefficients should be re-estimated while dropping any covariates that had introduced collinearity. Defaults to true for ccc = "int" and false otherwise. Note that for any ccc option besides "int" that setting recover to true or false will not affect the calculated ATTs.
  • ref::Union{Dict{<:AbstractString, <:AbstractString}, Nothing} = nothing A dictionary specifying which category in a categorical variable should be used as the reference (baseline) category.

Date Processing & Period Grid Construction

  • date_format::Union{AbstractString, Nothing} = nothing Date format (e.g. "yyyy" or "yyyy-mm-dd") to be used when parsing string dates from the time column, or start_date, end_date, and treatment_times arguments.
  • freq::Union{AbstractString, Nothing} = nothing A string indicating the desired timeframe of a period for the analysis for staggered adoption scenarios. Options are: "year", "month", "week", "day".
  • freq_multiplier::Number = 1 An integer by which the 'freq' argument should be multiplied in a staggered adoption scenario, e.g. if a two-year period is desired, set freq = "year" and freq_multiplier = 2.
  • start_date::Union{AbstractString, Number, Date, Nothing} = nothing Any data prior this date is dropped, and serves as the starting date for the period grid construction if activated.
  • end_date::Union{AbstractString, Number, Date, Nothing} = nothing Any data after this date is dropped, and serves as the end date for the period grid construction if activated.

Inference

  • nperm::Number = 999 The number of unique permutations (not including the initial assignment of treatment times) to be considered when performing the randomization inference.
  • verbose::Bool = true A boolean option for displaying progress of the randomization procedure.
  • seed::Number = rand(1:1000000) An integer to set the random seed for the randomization inference procedure.
  • hc::Union{AbstractString, Number} = "hc1" Specify which heteroskedasticity-consistent covariance matrix estimator (HCCME) should be used. Options are 0, 1, 2, 3, and 4 (or "hc0", "hc1", "hc2", "hc3", "hc4").
  • truejack::Bool = false When aggregation is set to either "add", "time", or "hom", then in order to get valid jackknife standard errors, we need to re-estimate the DID-INT model from square one (running the large FixedEffectsModels). This is because the covariate effects in those cases depend on values from across states, so dropping a state will change the lambda values, this is not true for the aggregation options of "int" or "state".

Returns

A DataFrame of results including the estimate of the ATT as well as standard errors and p-values.

Citations

  • Karim & Webb (2025). "Good Controls Gone Bad: Difference-in-Differences with Covariates". https://arxiv.org/abs/2412.14447
  • MacKinnon & Webb (2020). "Randomization inference for difference-in-differences with few treated clusters". https://doi.org/10.1016/j.jeconom.2020.04.024
source

didint_plot() - Prepare Data for Visualization

The didint_plot() function produces a dataset in a long format that can easily be used for plotting parallel trends or event study plots.

DiDInt.didint_plotFunction
didint_plot(
            outcome::Union{AbstractString, Symbol},
            state::Union{AbstractString, Symbol},
            time::Union{AbstractString, Symbol},
            data::DataFrame;
            gvar::Union{AbstractString, Symbol, Nothing} = nothing,
            treated_states::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Nothing} = nothing,
            treatment_times::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Date, Nothing} = nothing,
            date_format::Union{AbstractString, Nothing} = nothing,
            covariates::Union{Vector{<:AbstractString}, AbstractString, Nothing} = nothing,
            ref::Union{Dict{<:AbstractString, <:AbstractString}, Nothing} = nothing,
            ccc::Union{AbstractString, Vector{<:AbstractString}} = "all",
            event::Bool = false,
            weights::Bool = true,
            ci::Number = 0.95,
            freq::Union{AbstractString, Nothing} = nothing,
            freq_multiplier::Number = 1,
            start_date::Union{AbstractString, Number, Date, Nothing} = nothing,
            end_date::Union{AbstractString, Number, Date, Nothing} = nothing,
            hc::Union{AbstractString, Number} = "hc1"
           )

The didint_plot() function produces a dataset in a long format that can easily be used for plotting parallel trends or event study plots.

Details

The arguments treated_states and treated_times should be entered such that the first element in treated_states refers to the state treated at the date entered as the first element in treated_times, the second element in treated_states refers to the state treated at the date entered as the second element in treated_times, and so on.

Parameters

Required Parameters

  • outcome::Union{AbstractString, Symbol} Input the name of the column which identifies the outcome of interest.
  • state::Union{AbstractString, Symbol} Input the name of the column which identifies the state membership of the observation.
  • time::Union{AbstractString, Symbol} Input the name of the column which identifies the date of the observation.
  • data::DataFrame The DataFrame to be used for the analysis.

Treatment Specification

  • gvar::Union{AbstractString, Symbol, Nothing} = nothing Name of the column which indicates time of first treatment for each state.
  • treatment_times::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Date, Nothing} = nothing A vector (or single entry) denoting the associated treatment times of the 'treated_states'.

Model Specifications

  • ccc::Union{AbstractString, Vector{<:AbstractString}} = "all" Specify which versions of DID-INT should be used. Options are either "all", or any combination of: "none", "hom", "time", "state", "add", and "int".
  • covariates::Union{T, Vector{T}} where T <: Union{AbstractString, Symbol} = nothing A vector of covariates entered as either strings or symbols (or a single covariate string or symbol), or, nothing (default).
  • ref::Union{Dict{<:AbstractString, <:AbstractString}, Nothing} = nothing A dictionary specifying which category in a categorical variable should be used as the reference (baseline) category.

Event Study Plot

  • event::Bool = false Specify if data should be prepared for an event study plot as opposed to a parallel trends plot.
  • weights::Bool = true Whether to use weighted means when computing event study estimates. If true, estimates are computed as weighted averages of state-level means for each period relative to treatment; if false, uses simple unweighted averages.
  • treated_states::Union{T, Vector{T}} where T <: Union{AbstractString, Number, Nothing} = nothing A vector of strings (or a single string) noting the treated state(s).
  • ci::Number = 0.95 Define the size of confidence bands for the event study plot.
  • hc::Union{AbstractString, Number} = "hc1" Specify which heteroskedasticity-consistent covariance matrix estimator (HCCME) should be used. Options are 0, 1, 2, 3, and 4 (or "hc0", "hc1", "hc2", "hc3", "hc4").

Date Processing & Period Grid Construction

  • date_format::Union{AbstractString, Nothing} = nothing Date format (e.g. "yyyy" or "yyyy-mm-dd") to be used when parsing string dates from the time column, or start_date, end_date, and treatment_times arguments.
  • freq::Union{AbstractString, Nothing} = nothing A string indicating the desired timeframe of a period for the analysis for staggered adoption scenarios. Options are: "year", "month", "week", "day".
  • freq_multiplier::Number = 1 An integer by which the 'freq' argument should be multiplied in a staggered adoption scenario, e.g. if a two-year period is desired, set freq = "year" and freq_multiplier = 2.
  • start_date::Union{AbstractString, Number, Date, Nothing} = nothing Any data prior this date is dropped, and serves as the starting date for the period grid construction if activated.
  • end_date::Union{AbstractString, Number, Date, Nothing} = nothing Any data after this date is dropped, and serves as the end date for the period grid construction if activated.

Returns

A DataFrame of means and means residualized by the specified covariates for each of the specified common causal covariates (CCC) violations by period for each state, or, a DataFrame of means of the treated states by periods before/after treatment (again, residualized by the specified covariates and for each of the specified CCC violations).

Citations

  • Karim & Webb (2025). "Good Controls Gone Bad: Difference-in-Differences with Covariates". https://arxiv.org/abs/2412.14447
  • MacKinnon & Webb (2020). "Randomization inference for difference-in-differences with few treated clusters". https://doi.org/10.1016/j.jeconom.2020.04.024
source