Synthetic controls are a generalization of the difference-in-difference approach (Abadie and Gardeazabal, 2003; Abadie et al, 2010). Difference-in-difference methods often require the researcher to manually identify a control case, against which the treatment will be compared, on the basis of apparent similarity before the intervention and the plausibility that identical secular trends affect both the treatment and control equally after the intervention. Instead, the synthetic control method offers a formalized and more rigorous method for identifying comparison cases, by constructing a “synthetic” control unit that represents a weighted combination of many untreated cases. Weights are calculated in order to maximize the similarity between the synthetic control and the treatment unit in terms of specified “matching” variables. By matching on the observable characteristics between treatment and control, the method may also do a better job of matching on the unobservable characteristics (though by nature this cannot be verified).
The advantages over the general difference-in-difference approach are several: a) the observable similarity of control and treatment cases is maximized, and perhaps also similarity of unobservables, strengthening the assumptions (e.g., equal secular trends) inherent to the difference-in-difference approach; b) the method is feasible even when there exists no single untreated case adequately similar to the treatment case; and c) researchers can point to a formal and objective approach to the selection of controls, rather than having to justify ad hoc decisions which could potentially create the appearance of the researcher having his thumb on the scale.
Generally, synthetic controls have been applied in the context of a
single treatment case with a limited number (e.g., several dozens) of
untreated cases for comparison. The Synth
package has been
developed for R and designed for this type of application. But the
relative dearth of treatment and comparison data in such settings
complicates efforts to a) develop a synthetic control that matches the
treatment case, b) precisely estimate the effect of treatment, c) gauge
the significance of that effect, and d) jointly incorporate multiple
outcome variables.
This package is developed to address those limitations, by incorporating high-dimensional, micro-level data into the synthetic controls framework. Therefore, in addition to what Synth provides, microsynth offers several advantages and new tools:
With the advantage of a large number of smaller-scale
observations, microsynth is often better able to calculate weights that
provide exact matches between treatment and synthetic control units (on
all variables passed to match.out
[for time-variant
variables] and match.covar
[for time-variant variables]).
This bolsters the conceptual framework behind the synthetic control
method.
To generate an additional measure for significance, microsynth
can generate hundreds or thousands of placebo treatment units using
random permutations of the control units (e.g., with
perm = 250
and jack = TRUE
). This allows
estimated effects from the actual treatment unit to be compared to
effects for the placebo treatment units, after standardization (if
use.survey = TRUE
), generating a new variance estimator and
p-value. The sampling distribution of the effects from placebo treatment
units is plotted visually, along with Synth-style plots comparing
observed outcomes in the treatment and synthetic control units over
time.
An omnibus statistic is calculated to assess the statistical
significance across multiple variables (i.e., those set to
omnibus.var
), as may be desired in scenarios with limited
power where several outcome variables.
Results may be estimated across multiple follow-up periods (by
passing a vector to end.post
).
Matching variables may be specified flexibly. Time-variant
variables may be aggregated across multiple time periods before matching
(by passing a list to match.out
or passing a value to
period
), helping to reduce variable sparseness and improve
the likelihood of a satisfactory match.
microsynth provides parameters to assist users in finding
feasible models when a plethora of matching variables and a scarcity of
data make the calculation of satisfactory weights difficult. Users may
call check.feas
or use.backup
to call on more
computationally-intensive methods to calculate weights. Alternately,
difficult-to-match variables may be passed to
match.out.min
/match.covar.min
as to seek
weights that deliver the best-possible but not necessarily exact match
on those variables.
microsynth is also backwards compatible, i.e., it can be deployed on the Synth-like case of a single treatment with a limited number of untreated cases, although the relative dearth of data should be expected to decrease matching performance and limit the usefulness of the features discussed above (see Example 8).
For this example we will use to evaluate a Drug Market Intervention using the “seattledmi” dataset provided with the microsynth package. The intervention was applied to 39 blocks, which represent the treatment; the remaining 9,603 Seattle blocks are potential comparison units from which the synthetic control may be constructed. Data are available for block-level Census demographics and incidences of crime reported by the Seattle Police Department.
## [1] "ID" "time" "Intervention" "i_robbery" "i_aggassau"
## [6] "i_burglary" "i_larceny" "i_felony" "i_misdemea" "i_drugsale"
## [11] "i_drugposs" "any_crime" "i_drugs" "TotalPop" "BLACK"
## [16] "HISPANIC" "Males_1521" "HOUSEHOLDS" "FAMILYHOUS" "FEMALE_HOU"
## [21] "RENTER_HOU" "VACANT_HOU"
We would like to detect whether the program was effective at reducing the incidence of crime in those neighborhoods where the intervention was applied. Before beginning examples, we will specify the mandatory minimum parameters pursuant to the dataset and our basic research design.
The bedrock of the synthetic controls research design (like any
difference-in-difference method) involves comparing observations between
treatment (i.e., “intervention”) areas versus control areas, with
observations for each unit over a certain period of time. Therefore
microsynth requires we identify the idvar
,
timevar
, and intvar
columns.
In this case, we are provided with Census block-level observation
units (idvar = "ID"
) and quarterly observations
(timevar = "time"
), along with a binary variable with 0 for
all untreated groups and the treated groups during the pre-intervention
period and a 1 for treated groups at the time of intervention and later
(intvar = "Intervention"
).
Next, the user can specify parameters relating to the beginning of
the pre-intervention data (start.pre
), the last time period
of the pre-intervention period (end.pre
), and the time(s)
through which post-intervention effects ought to be estimated
(end.post
). For all observations up to and including
end.pre
, outcome variables and covariates will be used to
match treatment and control. (If the data is formatted such that 0s are
assigned to all end.pre
observations for the control units
and treatment units pre-intervention, and 1s assigned only to
treatment units post-intervention, then end.pre
will by
default be automatically set appropriately, such that
end.pre
will equal the last period of pre-intervention
data.)
In this case, our study period begins at the first quarter of data
available in the dataset; the intervention occurs after 12 quarters of
pre-intervention data (end.pre = 12
); and our study period
continues for four quarters of post-intervention data
(end.post = 16
). With this dataset, end.post
could also be left unassigned and would be automatically set to the
latest observation in the data; likewise, we can set
end.pre = NULL
, as we expect the program’s effects not to
occur instantaneously, the intvar
column is adequately
formatted to allow microsynth to detect the intervention time
automatically. Note that start.pre
will default to the
earliest time in the dataset.
Exact matches are not always possible, especially for variables that
are sparse (i.e., few non-zero values), containing little variation, or
for which the treatment units have values outside of the range of
observations from the un-treated units. In these cases, variables may be
moved from match.out
/match.covar
to
match.out.min
/match.out.covar
as to minimize
the distance between treatment and synthetic control on those variables
rather than find exact matches. Alternately, a value may be set to
period
to aggregate all variable names in
match.out
/match.covar
under the same regular
time duration; or, to set aggregation instructions with more detail,
match.out
/match.covar
may receive a list with
detailed parameters.
microsynth() provides several different ways to address this problem.
A variable can be treated such that the distance between treatment and
synthetic control is minimized, even if a distance of zero is
infeasible, by listing it under match.out.min
(for
time-variant outcome variables) or match.covar.min
(for
time-invariant variables). In this case, match.out
,
match.out.min
, match.covar
, and
match.covar.min
may each be vectors of variable names.
There ought not be any overlap: each variable should appear in only one
argument.
Another potential response is to aggregate the variable across
multiple time periods. match.out
,
match.out.min
, match.covar
, and
match.covar.min
all behave similarly in this manner. Rather
than being passed a vector of variable names, each may receive a list;
each element of the list is a vector corresponding to the time units
across which each variable should be aggregated before matching, with
each element named equal to the variable name. In this case, the element
vectors represent the duration during which the variable should be
aggregated, counting backwards from the intervention time.
Combining these approaches, if
match.covar.min = list("Y1" = c(1, 3, 3))
, then the
variable “Y1” will be used to match treatment to synthetic control at
the time of the intervention (t), the sum of values of “Y1”
across t-1 to t-3, and the sum across t-4 to
t-6.
If the dataset contains both time-variant outcome variables
and time-variant predictor variables (i.e., belonging on the
RHS of a regression rather than the LHS), then both 1)
match.out
or match.out.min
and 2)
result.var
must be specified. match.out
or
match.out.min
should include all time-variant variables
used for matching, whether they are true outcomes or predictors;
result.var
should specify only the subset of those that are
outcomes (for which estimated effects will be calculated).
Note: in some cases, the term “outcome variable” may be a misnomer.
Though by default all time-variant variables assigned to
match.out
and match.out.min
will be used to
estimate the program effect (result.var = T
), this doesn’t
have to be the case. result.var
may be set to a vector of
variable names representing a subset of the outcome variables entered
into match.out
and match.out.min
; this is
useful if the dataset includes time-variant variables that we’d like to
use to match treatment and synthetic control but which we do not want to
use for the purposes of evaluating the program effect.
microsynth allows for extensive configuration, for instance, relating to the mechanics of calculating weights, plotting options, and the calculation of variance estimators through permutation tests and jackknife replication groups. These aspects will be discussed in the later examples below.
In this minimal example, we will calculate and display results in the simplest way possible. This includes:
test=lower
)result.var = match.out
)plot_microsynth()
); to save to file, specify a .csv or
.xlsx as the file
argument in
plot_microsynth()
.result.file=NULL
); instead, results can be viewed by
inspecting the microsynth
object.As microsynth runs, it will display output relating to the calculation of weights, the matching of treatment to synthetic control, and the calculation of survey statistics (e.g., the variance estimator). The first table to display summarizes the matching properties after applying the main weights. It shows three columns: 1) characteristics of the treated areas for the time-variant and time-invariant variables, 2) characteristics of the synthetic control, and 3) characteristics of the entire population. Because this example is successful in creating a matching synthetic control, the first column and the second column will be nearly equal.
Note that match.out = match.out
,
result.var = match.out
, and
omnibus.var =match.out
. This means that the outcome
variables that we declared as match.out
will all be matched
on exactly, will be used to report results, and will feature in the
omnibus p-value. match.covar
indicates that the specified
covariates will also be matched on exactly. (By setting
result.var = match.out
, there is provided one chart per
time-variant outcome variable for which we calculate results.)
sea1 <- microsynth(seattledmi,
idvar="ID", timevar="time", intvar="Intervention",
start.pre=1, end.pre=12, end.post=16,
match.out=match.out, match.covar=cov.var,
result.var=match.out, omnibus.var=match.out,
test="lower",
n.cores = min(parallel::detectCores(), 2))
sea1
## microsynth object
##
## Scope:
## Units: Total: 9642 Treated: 39 Untreated: 9603
## Study Period(s): Pre-period: 1 - 12 Post-period: 13 - 16
## Constraints: Exact Match: 58 Minimized Distance: 0
## Time-variant outcomes:
## Exact Match: i_felony, i_misdemea, i_drugs, any_crime (4)
## Minimized Distance: (0)
## Time-invariant covariates:
## Exact Match: TotalPop, BLACK, HISPANIC, Males_1521, HOUSEHOLDS, FAMILYHOUS, FEMALE_HOU, RENTER_HOU, VACANT_HOU (9)
## Minimized Distance: (0)
##
## Results:
## end.post = 16
## Trt Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper
## i_felony 46 68.22 -32.6% 0.0109 -50.3% -8.4%
## i_misdemea 45 71.80 -37.3% 0.0019 -52.8% -16.7%
## i_drugs 20 23.76 -15.8% 0.2559 -46.4% 32.1%
## any_crime 788 986.44 -20.1% 0.0146 -32.9% -4.9%
## Omnibus -- -- -- 0.0004 -- --
## Weight Balance Table:
##
## Targets Weighted.Control All.scaled
## Intercept 39 39.000239 39.0000000
## TotalPop 2994 2994.051921 2384.7476665
## BLACK 173 173.000957 190.5224020
## HISPANIC 149 149.002632 159.2682016
## Males_1521 49 49.000000 97.3746111
## HOUSEHOLDS 1968 1968.033976 1113.5588052
## FAMILYHOUS 519 519.010767 475.1876167
## FEMALE_HOU 101 101.000957 81.1549471
## RENTER_HOU 1868 1868.020338 581.9340386
## VACANT_HOU 160 160.011485 98.4222153
## i_felony.12 14 14.000000 4.9023024
## i_felony.11 11 11.000239 4.6313006
## i_felony.10 9 9.000000 3.0740510
## i_felony.9 5 5.000000 3.2641568
## i_felony.8 20 20.000000 4.4331052
## i_felony.7 8 8.000000 3.7616677
## i_felony.6 13 13.000000 3.0012446
## i_felony.5 20 20.000718 3.1549471
## i_felony.4 10 10.000000 4.0245800
## i_felony.3 7 7.000000 3.3693217
## i_felony.2 13 13.000239 3.2803360
## i_felony.1 12 12.000000 3.4380834
## i_misdemea.12 15 15.000239 4.2470442
## i_misdemea.11 12 12.000000 4.6070317
## i_misdemea.10 12 12.000000 4.0771624
## i_misdemea.9 14 14.000000 3.7414437
## i_misdemea.8 12 12.000000 3.9679527
## i_misdemea.7 20 20.000000 4.2551338
## i_misdemea.6 16 16.000479 3.5594275
## i_misdemea.5 24 24.000000 3.5634723
## i_misdemea.4 21 21.000239 4.3360299
## i_misdemea.3 21 21.000000 4.3845675
## i_misdemea.2 14 14.000000 3.5351587
## i_misdemea.1 16 16.000000 4.1540137
## i_drugs.12 13 13.000000 1.6543248
## i_drugs.11 8 8.000000 1.5127567
## i_drugs.10 3 3.000000 1.3226509
## i_drugs.9 4 4.000000 0.9788426
## i_drugs.8 4 4.000000 1.1123211
## i_drugs.7 10 10.000000 1.0516490
## i_drugs.6 4 4.000000 1.2377100
## i_drugs.5 2 2.000000 1.2296204
## i_drugs.4 1 1.000000 1.1244555
## i_drugs.3 5 5.000000 1.3550093
## i_drugs.2 12 12.000000 1.1365899
## i_drugs.1 8 8.000239 1.3590541
## any_crime.12 272 272.001196 65.3397635
## any_crime.11 227 227.001675 64.2395769
## any_crime.10 183 183.000957 55.6929060
## any_crime.9 176 176.000479 53.2377100
## any_crime.8 228 228.000479 55.8142502
## any_crime.7 246 246.002393 55.8061605
## any_crime.6 200 200.000957 52.8291848
## any_crime.5 270 270.001436 50.6530803
## any_crime.4 250 250.000957 57.2946484
## any_crime.3 236 236.000957 58.8680772
## any_crime.2 250 250.001196 51.5429371
## any_crime.1 242 242.000957 55.1144991
##
## Results:
##
## end.post = 16
## Trt Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper
## i_felony 46 68.22 -32.6% 0.0109 -50.3% -8.4%
## i_misdemea 45 71.80 -37.3% 0.0019 -52.8% -16.7%
## i_drugs 20 23.76 -15.8% 0.2559 -46.4% 32.1%
## any_crime 788 986.44 -20.1% 0.0146 -32.9% -4.9%
## Omnibus -- -- -- 0.0004 -- --
After the call to microsynth has been made, the function displays a
brief description of the parameters used in the call along with the
results (if available). Also, the function can be used to display a
summary of the matching between treatment, synthetic control, and the
population, and the results table. Below we reproduce the results that
were saved to file in the previous example, with one row for each of the
variables entered to result.var
, which have each been used
to calculate an omnibus statistic (omnibus.var = TRUE
), and
two columns corresponding to the confidence interval
(confidence
) resulting from the variance estimator
generated by linearization. The first row of the output
(16
) refers to the maximum post-intervention time used to
compile results (end.post
).
Note that the p-value of the omnibus statistic is smaller than any of the individual outcome variables.
Above are produced plots under default settings. By default, if no
other arguments are declared in the call to
plot_microsynth()
, the plots will include one row for each
variable passed to result.var
in the original call.
Likewise, values for the duration of the pre- and post-intervention
periods (i.e. start.pre
, end.pre
,
end.post
) can also be automatically detected from the
original object if not specified manually.
The first plot column compares the observed outcomes among the
treatment, synthetic control, and population during the pre-intervention
and post-intervention periods. Outcomes are scaled by default
(scale.var = "Intercept"
) to the number of treatment units,
to facilitate comparison. The dotted red line indicates the last time
period of the pre-intervention period (end.pre
). Because
matching was successful, the treatment and synthetic control lines track
closely during the pre-intervention period; their divergence during the
post-intervention period represents an estimate of the causal effect of
the program (i.e., the red synthetic control line is treated as the
counterfactual to the black treatment line). This difference is charted
on the right plot column.
In addition to using linearization to calculate a variance estimate,
microsynth can approximate the estimator’s sampling distribution by
generating permuted placebo groups. When dealing with a large number of
treatment and control units, there is a near infinite number of
potential permutations. A default (perm = 250
) is set as
permutations are somewhat computationally intensive.
For each placebo, weights are calculated to match the placebo
treatment to a new synthetic control, and an effect is estimated,
generating a sampling distribution and an corresponding p-value. Because
the actual treatment area is a non-random group of treatment units,
while the placebo treatments are random groups, by default microsynth
will standardized the placebo treatment effects to filter out potential
design effects (use.survey = TRUE
).
We will also generate jackknife replication groups, using as many
groups as the lesser of the number of cases in the treatment group and
the number of cases in the control group (jack = TRUE
).
The output from this call to microsynth will be largely identical to the previous call, except for the appearance of the right column of plots. Now that permutation groups have been generated, the estimated effect under each of the placebo treatments (gray lines) will be shown along with the estimated effect of the real treatment. This displays the estimated treatment effect in the context of the estimator’s sampling distribution.
sea2 <- microsynth(seattledmi,
idvar="ID", timevar="time", intvar="Intervention",
start.pre=1, end.pre=12, end.post=c(14, 16),
match.out=match.out, match.covar=cov.var,
result.var=match.out, omnibus.var=match.out,
test="lower",
perm=250, jack=TRUE,
n.cores = min(parallel::detectCores(), 2))
Calling or identifies other new changes to the results. Columns are
added to display the confidence intervals
(confidence = 0.9
) and p-values
(test = "lower"
) from the jackknife and permutation tests.
Note that end.post=c(14,16)
in the code above, instructing
results to be calculated for two different follow-up periods, ending at
t=14 and t=16 respectively. One results table will be calculated for
each.
## microsynth object
##
## Scope:
## Units: Total: 9642 Treated: 39 Untreated: 9603
## Study Period(s): Pre-period: 1 - 12 Post-period: 13 - 14
## Study Period(s): Pre-period: 1 - 12 Post-period: 13 - 16
## Constraints: Exact Match: 58 Minimized Distance: 0
## Time-variant outcomes:
## Exact Match: i_felony, i_misdemea, i_drugs, any_crime (4)
## Minimized Distance: (0)
## Time-invariant covariates:
## Exact Match: TotalPop, BLACK, HISPANIC, Males_1521, HOUSEHOLDS, FAMILYHOUS, FEMALE_HOU, RENTER_HOU, VACANT_HOU (9)
## Minimized Distance: (0)
##
## Results:
## end.post = 14
## Trt Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Jack.pVal
## i_felony 28 37.56 -25.5% 0.0920 -49.3% 9.7% 0.1618
## i_misdemea 19 34.57 -45.0% 0.0068 -64.3% -15.4% 0.0589
## i_drugs 11 14.58 -24.6% 0.2193 -60.5% 44.2% 0.2809
## any_crime 401 504.05 -20.4% 0.0509 -37.1% 0.7% 0.0541
## Omnibus -- -- -- 0.0113 -- -- 0.0456
## Jack.Lower Jack.Upper Perm.pVal Perm.Lower Perm.Upper
## i_felony -52.8% 17.6% 0.1400 -49.4% 2.0%
## i_misdemea -69.7% -0.2% 0.0480 -66.9% -17.1%
## i_drugs -65.0% 62.5% 0.3160 -62.2% 19.3%
## any_crime -36.4% -0.5% 0.0440 -33.0% -2.6%
## Omnibus -- -- 0.1000 -- --
##
## end.post = 16
## Trt Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Jack.pVal
## i_felony 46 68.22 -32.6% 0.0109 -50.3% -8.4% 0.0359
## i_misdemea 45 71.80 -37.3% 0.0019 -52.8% -16.7% 0.0479
## i_drugs 20 23.76 -15.8% 0.2559 -46.4% 32.1% 0.3331
## any_crime 788 986.44 -20.1% 0.0146 -32.9% -4.9% 0.0460
## Omnibus -- -- -- 0.0009 -- -- 0.0370
## Jack.Lower Jack.Upper Perm.pVal Perm.Lower Perm.Upper
## i_felony -51.6% -6.1% 0.0120 -50.9% -13.3%
## i_misdemea -59.6% -2.9% 0.0240 -55.6% -19.6%
## i_drugs -55.8% 60.4% 0.3120 -49.4% 21.7%
## any_crime -35.6% -0.9% 0.0120 -30.6% -5.6%
## Omnibus -- -- 0.0240 -- --
Now, we will add additional outcome variables and also use them to match the treatment area to the synthetic control units. We do this at the risk of model feasibility, as each variable introduces another constraint.
match.out <- c("i_robbery", "i_aggassau", "i_burglary", "i_larceny", "i_felony",
"i_misdemea", "i_drugsale", "i_drugposs", "any_crime")
In the example below, without overriding the default weight
parameters, microsynth will fail to find a feasible model. Weights would
not be calculated, and no results or plots will be generated. But we may
still attempt to estimate the model by setting
check.feas = TRUE
and use.backup = TRUE
. This
will check for feasibility, and if needed, invoke the computationally
intensive LowRankQP
package to calculate the weights.
Note that the additional matching variables introduce further constraints to the calculation of weights, lengthening the output. Moreover, the introduction of additional time-variant matching variables results in a poorer match on each, shown in the left column of plots, where red and dashed-black lines no longer track perfectly in the pre-intervention period.
Also note that we need not specify values for start.pre
,
end.pre
, and end.post
, as the default settings
align with our intentions. Likewise, we can trust the default values for
specifying the variables for the omnibus statistic
(omnibus.var=result.var
) by default. This way we specify
the minimum number of non-default arguments.
sea3 <- microsynth(seattledmi,
idvar="ID", timevar="time", intvar="Intervention",
end.pre=12,
match.out=match.out, match.covar=cov.var,
result.var=match.out, perm=250, jack=0,
test="lower", check.feas=TRUE, use.backup = TRUE,
n.cores = min(parallel::detectCores(), 2))
## Weight Balance Table:
##
## Targets Final.Weighted.Control All.scaled
## Intercept 39 39.0000000 39.0000000
## TotalPop 2994 2993.9999962 2384.7476665
## BLACK 173 172.9999994 190.5224020
## HISPANIC 149 148.9999998 159.2682016
## Males_1521 49 48.9999999 97.3746111
## HOUSEHOLDS 1968 1967.9999981 1113.5588052
## FAMILYHOUS 519 518.9999991 475.1876167
## FEMALE_HOU 101 100.9999998 81.1549471
## RENTER_HOU 1868 1867.9999976 581.9340386
## VACANT_HOU 160 159.9999998 98.4222153
## i_robbery.1.12 68 68.0000000 15.6938395
## i_aggassau.1.12 43 43.0000000 12.0737399
## i_burglary.1.12 805 805.0000000 193.3739888
## i_larceny.1.12 486 486.0000000 121.4250156
## i_felony.1.12 142 142.0000000 44.3350965
## i_misdemea.1.12 197 197.0000000 48.4284381
## i_drugsale.1.12 25 25.0000000 5.2056627
## i_drugposs.1.12 49 49.0000000 9.8693217
## any_crime.1.12 2780 2780.0000000 676.4327940
## i_robbery.12 12 11.3780112 1.7352209
## i_robbery.11 9 8.1200119 1.6057872
## i_robbery.10 4 3.8790126 1.1891724
## i_robbery.9 1 1.4903926 1.1325451
## i_robbery.8 7 6.3735828 1.2619788
## i_robbery.7 3 3.3506037 1.2296204
## i_robbery.6 3 4.2342276 1.3833230
## i_robbery.5 6 5.8637089 1.1810828
## i_robbery.4 4 5.4218880 1.4278158
## i_robbery.3 5 4.7600341 1.3064717
## i_robbery.2 8 7.4409586 1.0314250
## i_robbery.1 6 5.6875679 1.2093964
## i_aggassau.12 5 4.5410304 1.2174860
## i_aggassau.11 3 2.8597944 1.0233354
## i_aggassau.10 2 2.3494037 0.9383945
## i_aggassau.9 3 2.9482200 1.0233354
## i_aggassau.8 6 5.4558168 0.9545737
## i_aggassau.7 3 3.2969035 0.9424393
## i_aggassau.6 4 3.7611863 0.9586185
## i_aggassau.5 3 3.0746288 0.8251400
## i_aggassau.4 3 3.6486181 1.0516490
## i_aggassau.3 4 3.8827241 1.1285003
## i_aggassau.2 3 3.0690729 0.9262601
## i_aggassau.1 4 4.1126009 1.0840075
## i_burglary.12 76 78.4499795 18.8811450
## i_burglary.11 63 65.7638099 18.4928438
## i_burglary.10 57 55.9086862 16.7697573
## i_burglary.9 61 60.0254431 15.5805849
## i_burglary.8 72 67.3447013 15.4268824
## i_burglary.7 67 67.5134900 16.2560672
## i_burglary.6 57 57.5235401 15.1437461
## i_burglary.5 63 65.6461061 14.2700685
## i_burglary.4 71 70.6807842 15.4430616
## i_burglary.3 73 70.4732975 16.1266335
## i_burglary.2 79 78.1895056 15.0264468
## i_burglary.1 66 67.4806565 15.9567517
## i_larceny.12 54 51.7015032 12.1344119
## i_larceny.11 47 44.7209985 11.4710641
## i_larceny.10 36 35.7656801 9.8329185
## i_larceny.9 29 33.0946047 9.9219042
## i_larceny.8 43 42.0869055 10.0108899
## i_larceny.7 34 34.2189069 9.8612321
## i_larceny.6 44 42.5202884 9.6347231
## i_larceny.5 48 46.3312430 9.2423771
## i_larceny.4 43 43.1104549 10.4234599
## i_larceny.3 26 30.9303559 10.2818917
## i_larceny.2 47 45.4172546 9.0037337
## i_larceny.1 35 36.1018043 9.6064095
## i_felony.12 14 13.4649840 4.9023024
## i_felony.11 11 12.2487198 4.6313006
## i_felony.10 9 8.8112286 3.0740510
## i_felony.9 5 5.6915868 3.2641568
## i_felony.8 20 18.5910281 4.4331052
## i_felony.7 8 9.5571747 3.7616677
## i_felony.6 13 12.1564995 3.0012446
## i_felony.5 20 18.6556720 3.1549471
## i_felony.4 10 11.1852992 4.0245800
## i_felony.3 7 7.6267445 3.3693217
## i_felony.2 13 11.7375924 3.2803360
## i_felony.1 12 12.2734704 3.4380834
## i_misdemea.12 15 15.9973186 4.2470442
## i_misdemea.11 12 12.9415472 4.6070317
## i_misdemea.10 12 12.3463740 4.0771624
## i_misdemea.9 14 13.4441811 3.7414437
## i_misdemea.8 12 12.5298186 3.9679527
## i_misdemea.7 20 20.2577232 4.2551338
## i_misdemea.6 16 14.8067300 3.5594275
## i_misdemea.5 24 23.1921570 3.5634723
## i_misdemea.4 21 20.5812890 4.3360299
## i_misdemea.3 21 20.3293288 4.3845675
## i_misdemea.2 14 14.8119633 3.5351587
## i_misdemea.1 16 15.7615689 4.1540137
## i_drugsale.12 4 3.7383723 0.5541381
## i_drugsale.11 3 2.7324843 0.5298693
## i_drugsale.10 1 1.1002646 0.5015557
## i_drugsale.9 3 2.3763474 0.2710019
## i_drugsale.8 0 1.0056748 0.4247044
## i_drugsale.7 3 2.7028525 0.3680772
## i_drugsale.6 2 2.1738995 0.4570629
## i_drugsale.5 2 1.8652074 0.4206596
## i_drugsale.4 0 0.5045129 0.3802116
## i_drugsale.3 1 1.4287441 0.4975109
## i_drugsale.2 3 2.7283202 0.3640324
## i_drugsale.1 3 2.6433200 0.4368388
## i_drugposs.12 9 8.5638432 1.1001867
## i_drugposs.11 5 5.0224967 0.9828874
## i_drugposs.10 2 2.1844705 0.8210952
## i_drugposs.9 1 1.1416792 0.7078407
## i_drugposs.8 4 3.8532406 0.6876167
## i_drugposs.7 7 6.2881683 0.6835719
## i_drugposs.6 2 1.9836260 0.7806472
## i_drugposs.5 0 1.0329800 0.8089608
## i_drugposs.4 1 2.3331660 0.7442439
## i_drugposs.3 4 3.4342926 0.8574984
## i_drugposs.2 9 8.0074062 0.7725576
## i_drugposs.1 5 5.1546306 0.9222153
## any_crime.12 272 269.2276175 65.3397635
## any_crime.11 227 229.8320383 64.2395769
## any_crime.10 183 188.6396628 55.6929060
## any_crime.9 176 181.0969413 53.2377100
## any_crime.8 228 228.8212560 55.8142502
## any_crime.7 246 240.5669304 55.8061605
## any_crime.6 200 206.6802610 52.8291848
## any_crime.5 270 261.3992326 50.6530803
## any_crime.4 250 243.6446898 57.2946484
## any_crime.3 236 237.5358682 58.8680772
## any_crime.2 250 253.0277830 51.5429371
## any_crime.1 242 239.5277190 55.1144991
##
## Results:
##
## end.post = 16
## Trt Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Perm.pVal
## i_robbery 11 21.49 -48.8% 0.0119 -70.2% -12.0% 0.0520
## i_aggassau 12 16.46 -27.1% 0.1620 -58.5% 28.2% 0.1640
## i_burglary 245 294.10 -16.7% 0.0857 -33.2% 3.9% 0.1040
## i_larceny 145 165.45 -12.4% 0.1754 -30.5% 10.5% 0.1240
## i_felony 46 51.74 -11.1% 0.2788 -36.4% 24.2% 0.2520
## i_misdemea 45 62.75 -28.3% 0.0387 -47.5% -2.0% 0.0240
## i_drugsale 11 4.22 160.9% 0.9482 23.6% 450.7% 0.9920
## i_drugposs 9 17.15 -47.5% 0.0225 -71.1% -4.7% 0.0800
## any_crime 788 921.55 -14.5% 0.0963 -29.8% 4.1% 0.0680
## Omnibus -- -- -- 0.0129 -- -- 0.0320
## Perm.Lower Perm.Upper
## i_robbery -70.8% -17.0%
## i_aggassau -63.2% 11.3%
## i_burglary -32.2% 2.5%
## i_larceny -31.3% 6.1%
## i_felony -35.5% 15.9%
## i_misdemea -51.6% -7.9%
## i_drugsale 13.0% 313.8%
## i_drugposs -72.1% -14.9%
## any_crime -27.9% 0.0%
## Omnibus -- --
The results file now shows additional rows for the new outcome variables, and these are also displayed in plots.
Another potential response is to aggregate sparse variables across
multiple time periods before using them to match to synthetic control.
Rather than passing a vector of variable names to match.out
and/or match.out.min
, the user may pass a list; each
element of the list is a vector corresponding to the time units across
which each variable should be aggregated before matching, with each
element named equal to the variable name. In this case, the element
vectors represent the duration during which the variable should be
aggregated, counting backwards from the intervention time.
In our dataset, incidences of drug sale are relatively scarce, and so
are aggregated every four months before matching
('i_drugsale'=rep(4,3)
); meanwhile, larceny is relatively
common and so is matched un-aggregated
('i_larceny'=rep(1, 12)'
).
Each vector indicates the time-durations for aggregation, starting
from the period directly prior to intervention and finishing with the
earliest observations in the dataset. Because our
end.pre = 12
, to use the full dataset, each vector-element
in the list should add to 12. Sums less than 12 would ignore portions of
the pre-intervention data; sums more than 12 will throw an error,
calling on more pre-intervention data than are available.
The aggregated variables now appear in the main weights summary table, e.g., “i_robbery.11.12”, representing the sum of reported robberies in time periods 11 and 12.
match.out <- list( 'i_robbery'=rep(2, 6),
'i_aggassau'=rep(2, 6),
'i_burglary'=rep(1, 12),
'i_larceny'=rep(1, 12),
'i_felony'=rep(2, 6),
'i_misdemea'=rep(2, 6),
'i_drugsale'=rep(4, 3),
'i_drugposs'=rep(4, 3),
'any_crime'=rep(1, 12))
sea4 <- microsynth(seattledmi,
idvar="ID", timevar="time",intvar="Intervention",
match.out=match.out, match.covar=cov.var,
result.var=names(match.out), omnibus.var=names(match.out),
end.pre=12,
perm=250, jack = TRUE,
test="lower",
n.cores = min(parallel::detectCores(), 2))
## Weight Balance Table:
##
## Targets Weighted.Control All.scaled
## Intercept 39 39.000439 39.000000
## TotalPop 2994 2994.026331 2384.747666
## BLACK 173 173.000878 190.522402
## HISPANIC 149 149.001317 159.268202
## Males_1521 49 49.000878 97.374611
## HOUSEHOLDS 1968 1968.018432 1113.558805
## FAMILYHOUS 519 519.001317 475.187617
## FEMALE_HOU 101 101.000000 81.154947
## RENTER_HOU 1868 1868.017554 581.934039
## VACANT_HOU 160 160.000439 98.422215
## i_robbery.11.12 21 21.000000 3.341008
## i_robbery.9.10 5 5.000000 2.321717
## i_robbery.7.8 10 10.000000 2.491599
## i_robbery.5.6 9 9.000439 2.564406
## i_robbery.3.4 9 9.000439 2.734287
## i_robbery.1.2 14 14.000439 2.240821
## i_aggassau.11.12 8 8.000439 2.240821
## i_aggassau.9.10 5 5.000000 1.961730
## i_aggassau.7.8 9 9.000000 1.897013
## i_aggassau.5.6 7 7.000000 1.783759
## i_aggassau.3.4 7 7.000000 2.180149
## i_aggassau.1.2 7 7.000000 2.010268
## i_burglary.12 76 76.001317 18.881145
## i_burglary.11 63 63.001755 18.492844
## i_burglary.10 57 57.002194 16.769757
## i_burglary.9 61 61.000000 15.580585
## i_burglary.8 72 72.000439 15.426882
## i_burglary.7 67 67.000878 16.256067
## i_burglary.6 57 57.000878 15.143746
## i_burglary.5 63 63.001317 14.270068
## i_burglary.4 71 71.000439 15.443062
## i_burglary.3 73 73.000439 16.126633
## i_burglary.2 79 79.000878 15.026447
## i_burglary.1 66 66.000439 15.956752
## i_larceny.12 54 54.000878 12.134412
## i_larceny.11 47 47.000878 11.471064
## i_larceny.10 36 36.000000 9.832918
## i_larceny.9 29 29.000000 9.921904
## i_larceny.8 43 43.000439 10.010890
## i_larceny.7 34 34.000878 9.861232
## i_larceny.6 44 44.000000 9.634723
## i_larceny.5 48 48.000439 9.242377
## i_larceny.4 43 43.002633 10.423460
## i_larceny.3 26 26.000878 10.281892
## i_larceny.2 47 47.000439 9.003734
## i_larceny.1 35 35.000000 9.606409
## i_felony.11.12 25 25.000439 9.533603
## i_felony.9.10 14 14.000000 6.338208
## i_felony.7.8 28 28.000000 8.194773
## i_felony.5.6 33 33.000439 6.156192
## i_felony.3.4 17 17.000000 7.393902
## i_felony.1.2 25 25.000000 6.718419
## i_misdemea.11.12 27 27.000000 8.854076
## i_misdemea.9.10 26 26.000000 7.818606
## i_misdemea.7.8 32 32.000878 8.223086
## i_misdemea.5.6 40 40.000878 7.122900
## i_misdemea.3.4 42 42.000000 8.720597
## i_misdemea.1.2 30 30.001317 7.689172
## i_drugsale.9.12 11 11.000000 1.856565
## i_drugsale.5.8 7 7.000000 1.670504
## i_drugsale.1.4 7 7.000000 1.678594
## i_drugposs.9.12 17 17.000000 3.612010
## i_drugposs.5.8 13 13.000439 2.960797
## i_drugposs.1.4 19 19.000439 3.296515
## any_crime.12 272 272.004827 65.339764
## any_crime.11 227 227.003511 64.239577
## any_crime.10 183 183.003511 55.692906
## any_crime.9 176 176.000878 53.237710
## any_crime.8 228 228.001755 55.814250
## any_crime.7 246 246.003950 55.806161
## any_crime.6 200 200.002633 52.829185
## any_crime.5 270 270.006583 50.653080
## any_crime.4 250 250.003511 57.294648
## any_crime.3 236 236.003511 58.868077
## any_crime.2 250 250.004389 51.542937
## any_crime.1 242 242.002194 55.114499
##
## Results:
##
## end.post = 16
## Trt Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Jack.pVal
## i_robbery 11 18.60 -40.9% 0.0427 -65.8% 2.3% 0.2601
## i_aggassau 12 16.64 -27.9% 0.1593 -59.3% 27.9% 0.3665
## i_burglary 245 314.59 -22.1% 0.0217 -36.7% -4.1% 0.3074
## i_larceny 145 168.76 -14.1% 0.1303 -31.2% 7.3% 0.3442
## i_felony 46 52.83 -12.9% 0.2555 -38.6% 23.5% 0.3721
## i_misdemea 45 56.78 -20.7% 0.1133 -42.4% 9.0% 0.3067
## i_drugsale 11 6.13 79.5% 0.8643 -18.1% 293.3% 0.7541
## i_drugposs 9 15.46 -41.8% 0.0618 -68.8% 8.5% 0.3000
## any_crime 788 961.64 -18.1% 0.0405 -32.1% -1.1% 0.3170
## Omnibus -- -- -- 0.0032 -- -- 0.3171
## Jack.Lower Jack.Upper Perm.pVal Perm.Lower Perm.Upper
## i_robbery -80.9% 83.5% 0.0840 -69.2% -8.4%
## i_aggassau -81.6% 183.5% 0.1960 -61.2% 17.3%
## i_burglary -62.2% 60.4% 0.0200 -35.6% -7.5%
## i_larceny -51.9% 53.5% 0.1040 -29.8% 3.2%
## i_felony -55.3% 69.6% 0.2240 -39.7% 17.4%
## i_misdemea -60.5% 59.2% 0.1400 -44.4% 2.8%
## i_drugsale -66.7% 866.4% 0.9360 -20.0% 157.4%
## i_drugposs -84.7% 120.8% 0.1520 -68.6% -7.4%
## any_crime -56.2% 53.3% 0.0200 -29.1% -6.6%
## Omnibus -- -- 0.0560 -- --
The aggregation of the outcome variables over time is not directly reflected in the results table, though the estimates have changed as a consequence of the aggregation.
The following examples will demonstrate that microsynth can be used to calculate weights and variance estimators, produce results, and display charts separately, one at a time. This can be useful given the time-intensive nature of calculating weights and generating permutation groups. It allows for weights to be saved once calculated and for plots and results to be reproduced iteratively without repeating the matching process.
This setting represses reporting of results by setting
result.var
= FALSE. Only weights will be calculated. Note
that settings for permutation groups (perm
) and jackknife
replication groups (jack
) are considered when calculating
weights, and then will not be referred to again in calls that only
produce plots or display results.
match.out <- c("i_felony", "i_misdemea", "i_drugs", "any_crime")
sea5 <- microsynth(seattledmi,
idvar="ID", timevar="time",intvar="Intervention",
end.pre=12,
match.out=match.out, match.covar=cov.var,
result.var=FALSE, perm=0, jack=FALSE,
n.cores = min(parallel::detectCores(), 2))
summary(sea5)
Appropriately, the table summarizing the main weights may be viewed, but results are unavailable.
If weights have already been calculated, then microsynth() can also
be configured to only reproduce results. Results are displayed for all
outcome variables used for exact matches
(result.var = match.out
). Further, results can now be
calculated for any single or group of follow-up periods
(end.post=c(14,16)
) without having to re-calculate
weights.
sea6 <- microsynth(seattledmi,
idvar="ID", timevar="time", intvar="Intervention",
end.pre=12, end.post=c(14, 16),
result.var=match.out,
test="lower",
w=sea5$w,
n.cores = min(parallel::detectCores(), 2))
sea6
For each follow-up period, a separate results table is provided. If saving to file, this requires the file be saved as an XLSX rather than a CSV; each table will be saved to a different XLSX tab.
## microsynth object
##
## Scope:
## Units: Total: 9642 Treated: 39 Untreated: 9603
## Study Period(s): Pre-period: - 12 Post-period: 13 - 14
## Study Period(s): Pre-period: - 12 Post-period: 13 - 16
## Constraints: Exact Match: Minimized Distance:
## Time-variant outcomes:
## Exact Match: TRUE (1)
## Minimized Distance: (0)
## Time-invariant covariates:
## Exact Match: TRUE (1)
## Minimized Distance: (0)
##
## Results:
## end.post = 14
## Trt Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper
## i_felony 28 37.56 -25.5% 0.0920 -49.3% 9.7%
## i_misdemea 19 34.57 -45.0% 0.0068 -64.3% -15.4%
## i_drugs 11 14.58 -24.6% 0.2193 -60.5% 44.2%
## any_crime 401 504.05 -20.4% 0.0509 -37.1% 0.7%
## Omnibus -- -- -- 0.0094 -- --
##
## end.post = 16
## Trt Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper
## i_felony 46 68.22 -32.6% 0.0109 -50.3% -8.4%
## i_misdemea 45 71.80 -37.3% 0.0019 -52.8% -16.7%
## i_drugs 20 23.76 -15.8% 0.2559 -46.4% 32.1%
## any_crime 788 986.44 -20.1% 0.0146 -32.9% -4.9%
## Omnibus -- -- -- 0.0015 -- --
One of the major differences between Synth and microsynth is that Synth requires that the treatment is confined to a single unit of observation, and to estimating the effect on a single outcome variable; in contrast, microsynth anticipates that treatment has been applied to multiple areas and can estimate effects with respect to multiple outcomes. But microsynth can also be applied to this simpler case.
To demonstrate, first we will create a reduced dataset with 1 treatment block and 100 control blocks.
set.seed(86872)
ids.t <- names(table(seattledmi$ID[seattledmi$Intervention==1]))
ids.c <- names(table(seattledmi$ID[seattledmi$Intervention==0]))
ids.synth <- c(sample(ids.t, 1), sample(ids.c, 100))
seattledmi.one <- seattledmi[is.element(seattledmi$ID, as.numeric(ids.synth)), ]
Then microsynth can be run on the dataset with just a single variable
passed out match.out
, so that effect is estimated for only
one variable, as with Synth. Due to the small size of the reduced
dataset, model feasibility may be an issue (so we set
use.backup = TRUE
and check.feas = TRUE
) and
variance estimators will be less reliable.
sea8 <- microsynth(seattledmi.one,
idvar="ID", timevar="time", intvar="Intervention",
match.out=match.out[4], match.covar=cov.var,
result.var=match.out[4],
test="lower", perm=250, jack=FALSE,
check.feas=TRUE, use.backup=TRUE,
n.cores = min(parallel::detectCores(), 2))
## Weight Balance Table:
##
## Targets Final.Weighted.Control All.scaled
## Intercept 1 1.0000000 1.0000000
## TotalPop 43 45.0872376 63.9207921
## BLACK 0 0.6857716 5.0396040
## HISPANIC 0 1.1513635 4.1485149
## Males_1521 0 0.9157095 2.5148515
## HOUSEHOLDS 21 22.1250470 30.5346535
## FAMILYHOUS 14 9.6115861 13.1584158
## FEMALE_HOU 2 1.2966062 2.0594059
## RENTER_HOU 21 14.0944158 16.1089109
## VACANT_HOU 0 0.6825798 2.2079208
## any_crime.12 3 2.3875630 1.4554455
## any_crime.11 1 0.9742075 1.2574257
## any_crime.10 1 0.8642002 1.0891089
## any_crime.9 1 0.8096172 1.1188119
## any_crime.8 2 0.9482611 1.0297030
## any_crime.7 0 0.9685659 1.1089109
## any_crime.6 2 1.3311058 0.9801980
## any_crime.5 0 0.4332195 0.8514851
## any_crime.4 1 1.1243468 0.9405941
## any_crime.3 1 1.0562650 1.0396040
## any_crime.2 1 1.3648515 0.9603960
## any_crime.1 1 0.4958930 1.0297030
##
## Results:
##
## end.post = 16
## Trt Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Perm.pVal
## any_crime 2 5.53 -63.8% 0.0073 -85.3% -11.1% 0.0800
## Omnibus -- -- -- 0.1091 -- -- 0.0800
microsynth() may also be used to calculate propensity score-type weights. We will demonstrate this by transforming our panel data into a cross-sectional dataset with data corresponding to our final observed period.
By setting match.out = FALSE
, no outcome variables will
be used to calculate weights, only (time-invariant) covariates
(match.covar
). No outcome-reporting variables
(result.var = NULL
) need be reported. Plots are therefore
inappropriate, but results (i.e., a summary of weights only) can be
saved to file or viewed using summary
.
sea9 <- microsynth(seattledmi.cross,
idvar="ID", intvar="Intervention",
match.out=FALSE, match.covar=cov.var,
result.var=NULL,
test="lower",
perm=250, jack=TRUE,
n.cores = min(parallel::detectCores(), 2))
## microsynth object
##
## Scope:
## Units: Total: 9642 Treated: 39 Untreated: 9603
## Study Period(s): Pre-period: - Post-period: -
## Constraints: Exact Match: 10 Minimized Distance: 0
## Time-variant outcomes:
## Exact Match: FALSE (1)
## Minimized Distance: (0)
## Time-invariant covariates:
## Exact Match: TotalPop, BLACK, HISPANIC, Males_1521, HOUSEHOLDS, FAMILYHOUS, FEMALE_HOU, RENTER_HOU, VACANT_HOU (9)
## Minimized Distance: (0)
Abadie A, Gardeazabal J (2003). “The economic costs of conflict: A case study of the Basque Country.” , pp. 113-132.
Abadie A, Diamond A, Hainmueller J (2010). “Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program.” , 105(490), 493-505.