MicroSynth: A Tutorial

Michael Robbins and Steven Davenport

2023-06-29

Introduction to Synthetic Controls

Synthetic controls are a generalization of the difference-in-difference approach (Abadie and Gardeazabal, 2003; Abadie et al, 2010). Difference-in-difference methods often require the researcher to manually identify a control case, against which the treatment will be compared, on the basis of apparent similarity before the intervention and the plausibility that identical secular trends affect both the treatment and control equally after the intervention. Instead, the synthetic control method offers a formalized and more rigorous method for identifying comparison cases, by constructing a “synthetic” control unit that represents a weighted combination of many untreated cases. Weights are calculated in order to maximize the similarity between the synthetic control and the treatment unit in terms of specified “matching” variables. By matching on the observable characteristics between treatment and control, the method may also do a better job of matching on the unobservable characteristics (though by nature this cannot be verified).

The advantages over the general difference-in-difference approach are several: a) the observable similarity of control and treatment cases is maximized, and perhaps also similarity of unobservables, strengthening the assumptions (e.g., equal secular trends) inherent to the difference-in-difference approach; b) the method is feasible even when there exists no single untreated case adequately similar to the treatment case; and c) researchers can point to a formal and objective approach to the selection of controls, rather than having to justify ad hoc decisions which could potentially create the appearance of the researcher having his thumb on the scale.

Generally, synthetic controls have been applied in the context of a single treatment case with a limited number (e.g., several dozens) of untreated cases for comparison. The Synth package has been developed for R and designed for this type of application. But the relative dearth of treatment and comparison data in such settings complicates efforts to a) develop a synthetic control that matches the treatment case, b) precisely estimate the effect of treatment, c) gauge the significance of that effect, and d) jointly incorporate multiple outcome variables.

This package is developed to address those limitations, by incorporating high-dimensional, micro-level data into the synthetic controls framework. Therefore, in addition to what Synth provides, microsynth offers several advantages and new tools:

An example: Using microsynth to evaluate a Drug Market Intervention

For this example we will use to evaluate a Drug Market Intervention using the “seattledmi” dataset provided with the microsynth package. The intervention was applied to 39 blocks, which represent the treatment; the remaining 9,603 Seattle blocks are potential comparison units from which the synthetic control may be constructed. Data are available for block-level Census demographics and incidences of crime reported by the Seattle Police Department.

colnames(seattledmi)
##  [1] "ID"           "time"         "Intervention" "i_robbery"    "i_aggassau"  
##  [6] "i_burglary"   "i_larceny"    "i_felony"     "i_misdemea"   "i_drugsale"  
## [11] "i_drugposs"   "any_crime"    "i_drugs"      "TotalPop"     "BLACK"       
## [16] "HISPANIC"     "Males_1521"   "HOUSEHOLDS"   "FAMILYHOUS"   "FEMALE_HOU"  
## [21] "RENTER_HOU"   "VACANT_HOU"
set.seed(99199)

We would like to detect whether the program was effective at reducing the incidence of crime in those neighborhoods where the intervention was applied. Before beginning examples, we will specify the mandatory minimum parameters pursuant to the dataset and our basic research design.

Setting ID columns

The bedrock of the synthetic controls research design (like any difference-in-difference method) involves comparing observations between treatment (i.e., “intervention”) areas versus control areas, with observations for each unit over a certain period of time. Therefore microsynth requires we identify the idvar, timevar, and intvar columns.

In this case, we are provided with Census block-level observation units (idvar = "ID") and quarterly observations (timevar = "time"), along with a binary variable with 0 for all untreated groups and the treated groups during the pre-intervention period and a 1 for treated groups at the time of intervention and later (intvar = "Intervention").

Setting time parameters

Next, the user can specify parameters relating to the beginning of the pre-intervention data (start.pre), the last time period of the pre-intervention period (end.pre), and the time(s) through which post-intervention effects ought to be estimated (end.post). For all observations up to and including end.pre, outcome variables and covariates will be used to match treatment and control. (If the data is formatted such that 0s are assigned to all end.pre observations for the control units and treatment units pre-intervention, and 1s assigned only to treatment units post-intervention, then end.pre will by default be automatically set appropriately, such that end.pre will equal the last period of pre-intervention data.)

In this case, our study period begins at the first quarter of data available in the dataset; the intervention occurs after 12 quarters of pre-intervention data (end.pre = 12); and our study period continues for four quarters of post-intervention data (end.post = 16). With this dataset, end.post could also be left unassigned and would be automatically set to the latest observation in the data; likewise, we can set end.pre = NULL, as we expect the program’s effects not to occur instantaneously, the intvar column is adequately formatted to allow microsynth to detect the intervention time automatically. Note that start.pre will default to the earliest time in the dataset.

An aside: advanced methods for setting matching variables

Exact matches are not always possible, especially for variables that are sparse (i.e., few non-zero values), containing little variation, or for which the treatment units have values outside of the range of observations from the un-treated units. In these cases, variables may be moved from match.out/match.covar to match.out.min/match.out.covar as to minimize the distance between treatment and synthetic control on those variables rather than find exact matches. Alternately, a value may be set to period to aggregate all variable names in match.out/match.covar under the same regular time duration; or, to set aggregation instructions with more detail, match.out/match.covar may receive a list with detailed parameters.

microsynth() provides several different ways to address this problem. A variable can be treated such that the distance between treatment and synthetic control is minimized, even if a distance of zero is infeasible, by listing it under match.out.min (for time-variant outcome variables) or match.covar.min (for time-invariant variables). In this case, match.out, match.out.min, match.covar, and match.covar.min may each be vectors of variable names. There ought not be any overlap: each variable should appear in only one argument.

Another potential response is to aggregate the variable across multiple time periods. match.out, match.out.min, match.covar, and match.covar.min all behave similarly in this manner. Rather than being passed a vector of variable names, each may receive a list; each element of the list is a vector corresponding to the time units across which each variable should be aggregated before matching, with each element named equal to the variable name. In this case, the element vectors represent the duration during which the variable should be aggregated, counting backwards from the intervention time.

Combining these approaches, if match.covar.min = list("Y1" = c(1, 3, 3)), then the variable “Y1” will be used to match treatment to synthetic control at the time of the intervention (t), the sum of values of “Y1” across t-1 to t-3, and the sum across t-4 to t-6.

If the dataset contains both time-variant outcome variables and time-variant predictor variables (i.e., belonging on the RHS of a regression rather than the LHS), then both 1) match.out or match.out.min and 2) result.var must be specified. match.out or match.out.min should include all time-variant variables used for matching, whether they are true outcomes or predictors; result.var should specify only the subset of those that are outcomes (for which estimated effects will be calculated).

Note: in some cases, the term “outcome variable” may be a misnomer. Though by default all time-variant variables assigned to match.out and match.out.min will be used to estimate the program effect (result.var = T), this doesn’t have to be the case. result.var may be set to a vector of variable names representing a subset of the outcome variables entered into match.out and match.out.min; this is useful if the dataset includes time-variant variables that we’d like to use to match treatment and synthetic control but which we do not want to use for the purposes of evaluating the program effect.

Other parameters that may be set

microsynth allows for extensive configuration, for instance, relating to the mechanics of calculating weights, plotting options, and the calculation of variance estimators through permutation tests and jackknife replication groups. These aspects will be discussed in the later examples below.

Basic estimation and plotting

Example 1: Barebones results

In this minimal example, we will calculate and display results in the simplest way possible. This includes:

As microsynth runs, it will display output relating to the calculation of weights, the matching of treatment to synthetic control, and the calculation of survey statistics (e.g., the variance estimator). The first table to display summarizes the matching properties after applying the main weights. It shows three columns: 1) characteristics of the treated areas for the time-variant and time-invariant variables, 2) characteristics of the synthetic control, and 3) characteristics of the entire population. Because this example is successful in creating a matching synthetic control, the first column and the second column will be nearly equal.

Note that match.out = match.out, result.var = match.out, and omnibus.var =match.out. This means that the outcome variables that we declared as match.out will all be matched on exactly, will be used to report results, and will feature in the omnibus p-value. match.covar indicates that the specified covariates will also be matched on exactly. (By setting result.var = match.out, there is provided one chart per time-variant outcome variable for which we calculate results.)

sea1 <- microsynth(seattledmi, 
                   idvar="ID", timevar="time", intvar="Intervention", 
                   start.pre=1, end.pre=12, end.post=16, 
                   match.out=match.out, match.covar=cov.var, 
                   result.var=match.out, omnibus.var=match.out,
                   test="lower",
                   n.cores = min(parallel::detectCores(), 2))
sea1
##  microsynth object
## 
## Scope:
##  Units:          Total: 9642 Treated: 39 Untreated: 9603
##  Study Period(s):    Pre-period: 1 - 12  Post-period: 13 - 16
##  Constraints:        Exact Match: 58     Minimized Distance: 0
## Time-variant outcomes:
##  Exact Match: i_felony, i_misdemea, i_drugs, any_crime (4)
##  Minimized Distance: (0)
## Time-invariant covariates:
##  Exact Match: TotalPop, BLACK, HISPANIC, Males_1521, HOUSEHOLDS, FAMILYHOUS, FEMALE_HOU, RENTER_HOU, VACANT_HOU (9)
##  Minimized Distance: (0)
## 
## Results:
## end.post = 16
##            Trt    Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper
## i_felony    46  68.22   -32.6%      0.0109       -50.3%        -8.4%
## i_misdemea  45  71.80   -37.3%      0.0019       -52.8%       -16.7%
## i_drugs     20  23.76   -15.8%      0.2559       -46.4%        32.1%
## any_crime  788 986.44   -20.1%      0.0146       -32.9%        -4.9%
## Omnibus     --     --       --      0.0004           --           --
summary(sea1)
## Weight Balance Table: 
## 
##               Targets Weighted.Control   All.scaled
## Intercept          39        39.000239   39.0000000
## TotalPop         2994      2994.051921 2384.7476665
## BLACK             173       173.000957  190.5224020
## HISPANIC          149       149.002632  159.2682016
## Males_1521         49        49.000000   97.3746111
## HOUSEHOLDS       1968      1968.033976 1113.5588052
## FAMILYHOUS        519       519.010767  475.1876167
## FEMALE_HOU        101       101.000957   81.1549471
## RENTER_HOU       1868      1868.020338  581.9340386
## VACANT_HOU        160       160.011485   98.4222153
## i_felony.12        14        14.000000    4.9023024
## i_felony.11        11        11.000239    4.6313006
## i_felony.10         9         9.000000    3.0740510
## i_felony.9          5         5.000000    3.2641568
## i_felony.8         20        20.000000    4.4331052
## i_felony.7          8         8.000000    3.7616677
## i_felony.6         13        13.000000    3.0012446
## i_felony.5         20        20.000718    3.1549471
## i_felony.4         10        10.000000    4.0245800
## i_felony.3          7         7.000000    3.3693217
## i_felony.2         13        13.000239    3.2803360
## i_felony.1         12        12.000000    3.4380834
## i_misdemea.12      15        15.000239    4.2470442
## i_misdemea.11      12        12.000000    4.6070317
## i_misdemea.10      12        12.000000    4.0771624
## i_misdemea.9       14        14.000000    3.7414437
## i_misdemea.8       12        12.000000    3.9679527
## i_misdemea.7       20        20.000000    4.2551338
## i_misdemea.6       16        16.000479    3.5594275
## i_misdemea.5       24        24.000000    3.5634723
## i_misdemea.4       21        21.000239    4.3360299
## i_misdemea.3       21        21.000000    4.3845675
## i_misdemea.2       14        14.000000    3.5351587
## i_misdemea.1       16        16.000000    4.1540137
## i_drugs.12         13        13.000000    1.6543248
## i_drugs.11          8         8.000000    1.5127567
## i_drugs.10          3         3.000000    1.3226509
## i_drugs.9           4         4.000000    0.9788426
## i_drugs.8           4         4.000000    1.1123211
## i_drugs.7          10        10.000000    1.0516490
## i_drugs.6           4         4.000000    1.2377100
## i_drugs.5           2         2.000000    1.2296204
## i_drugs.4           1         1.000000    1.1244555
## i_drugs.3           5         5.000000    1.3550093
## i_drugs.2          12        12.000000    1.1365899
## i_drugs.1           8         8.000239    1.3590541
## any_crime.12      272       272.001196   65.3397635
## any_crime.11      227       227.001675   64.2395769
## any_crime.10      183       183.000957   55.6929060
## any_crime.9       176       176.000479   53.2377100
## any_crime.8       228       228.000479   55.8142502
## any_crime.7       246       246.002393   55.8061605
## any_crime.6       200       200.000957   52.8291848
## any_crime.5       270       270.001436   50.6530803
## any_crime.4       250       250.000957   57.2946484
## any_crime.3       236       236.000957   58.8680772
## any_crime.2       250       250.001196   51.5429371
## any_crime.1       242       242.000957   55.1144991
## 
## Results: 
## 
## end.post = 16
##            Trt    Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper
## i_felony    46  68.22   -32.6%      0.0109       -50.3%        -8.4%
## i_misdemea  45  71.80   -37.3%      0.0019       -52.8%       -16.7%
## i_drugs     20  23.76   -15.8%      0.2559       -46.4%        32.1%
## any_crime  788 986.44   -20.1%      0.0146       -32.9%        -4.9%
## Omnibus     --     --       --      0.0004           --           --

After the call to microsynth has been made, the function displays a brief description of the parameters used in the call along with the results (if available). Also, the function can be used to display a summary of the matching between treatment, synthetic control, and the population, and the results table. Below we reproduce the results that were saved to file in the previous example, with one row for each of the variables entered to result.var, which have each been used to calculate an omnibus statistic (omnibus.var = TRUE), and two columns corresponding to the confidence interval (confidence) resulting from the variance estimator generated by linearization. The first row of the output (16) refers to the maximum post-intervention time used to compile results (end.post).

Note that the p-value of the omnibus statistic is smaller than any of the individual outcome variables.

plot_microsynth(sea1)

Above are produced plots under default settings. By default, if no other arguments are declared in the call to plot_microsynth(), the plots will include one row for each variable passed to result.var in the original call. Likewise, values for the duration of the pre- and post-intervention periods (i.e. start.pre, end.pre, end.post) can also be automatically detected from the original object if not specified manually.

The first plot column compares the observed outcomes among the treatment, synthetic control, and population during the pre-intervention and post-intervention periods. Outcomes are scaled by default (scale.var = "Intercept") to the number of treatment units, to facilitate comparison. The dotted red line indicates the last time period of the pre-intervention period (end.pre). Because matching was successful, the treatment and synthetic control lines track closely during the pre-intervention period; their divergence during the post-intervention period represents an estimate of the causal effect of the program (i.e., the red synthetic control line is treated as the counterfactual to the black treatment line). This difference is charted on the right plot column.

Example 2: Adding permutations and jackknife

In addition to using linearization to calculate a variance estimate, microsynth can approximate the estimator’s sampling distribution by generating permuted placebo groups. When dealing with a large number of treatment and control units, there is a near infinite number of potential permutations. A default (perm = 250) is set as permutations are somewhat computationally intensive.

For each placebo, weights are calculated to match the placebo treatment to a new synthetic control, and an effect is estimated, generating a sampling distribution and an corresponding p-value. Because the actual treatment area is a non-random group of treatment units, while the placebo treatments are random groups, by default microsynth will standardized the placebo treatment effects to filter out potential design effects (use.survey = TRUE).

We will also generate jackknife replication groups, using as many groups as the lesser of the number of cases in the treatment group and the number of cases in the control group (jack = TRUE).

The output from this call to microsynth will be largely identical to the previous call, except for the appearance of the right column of plots. Now that permutation groups have been generated, the estimated effect under each of the placebo treatments (gray lines) will be shown along with the estimated effect of the real treatment. This displays the estimated treatment effect in the context of the estimator’s sampling distribution.

sea2 <- microsynth(seattledmi, 
                   idvar="ID", timevar="time", intvar="Intervention", 
                   start.pre=1, end.pre=12, end.post=c(14, 16),
                   match.out=match.out, match.covar=cov.var, 
                   result.var=match.out, omnibus.var=match.out, 
                   test="lower", 
                   perm=250, jack=TRUE,
                   n.cores = min(parallel::detectCores(), 2))

Calling or identifies other new changes to the results. Columns are added to display the confidence intervals (confidence = 0.9) and p-values (test = "lower") from the jackknife and permutation tests. Note that end.post=c(14,16) in the code above, instructing results to be calculated for two different follow-up periods, ending at t=14 and t=16 respectively. One results table will be calculated for each.

##  microsynth object
## 
## Scope:
##  Units:          Total: 9642 Treated: 39 Untreated: 9603
##  Study Period(s):    Pre-period: 1 - 12  Post-period: 13 - 14
##      Study Period(s):    Pre-period: 1 - 12  Post-period: 13 - 16
##  Constraints:        Exact Match: 58     Minimized Distance: 0
## Time-variant outcomes:
##  Exact Match: i_felony, i_misdemea, i_drugs, any_crime (4)
##  Minimized Distance: (0)
## Time-invariant covariates:
##  Exact Match: TotalPop, BLACK, HISPANIC, Males_1521, HOUSEHOLDS, FAMILYHOUS, FEMALE_HOU, RENTER_HOU, VACANT_HOU (9)
##  Minimized Distance: (0)
## 
## Results:
## end.post = 14
##            Trt    Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Jack.pVal
## i_felony    28  37.56   -25.5%      0.0920       -49.3%         9.7%    0.1618
## i_misdemea  19  34.57   -45.0%      0.0068       -64.3%       -15.4%    0.0589
## i_drugs     11  14.58   -24.6%      0.2193       -60.5%        44.2%    0.2809
## any_crime  401 504.05   -20.4%      0.0509       -37.1%         0.7%    0.0541
## Omnibus     --     --       --      0.0113           --           --    0.0456
##            Jack.Lower Jack.Upper Perm.pVal Perm.Lower Perm.Upper
## i_felony       -52.8%      17.6%    0.1400     -49.4%       2.0%
## i_misdemea     -69.7%      -0.2%    0.0480     -66.9%     -17.1%
## i_drugs        -65.0%      62.5%    0.3160     -62.2%      19.3%
## any_crime      -36.4%      -0.5%    0.0440     -33.0%      -2.6%
## Omnibus            --         --    0.1000         --         --
## 
## end.post = 16
##            Trt    Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Jack.pVal
## i_felony    46  68.22   -32.6%      0.0109       -50.3%        -8.4%    0.0359
## i_misdemea  45  71.80   -37.3%      0.0019       -52.8%       -16.7%    0.0479
## i_drugs     20  23.76   -15.8%      0.2559       -46.4%        32.1%    0.3331
## any_crime  788 986.44   -20.1%      0.0146       -32.9%        -4.9%    0.0460
## Omnibus     --     --       --      0.0009           --           --    0.0370
##            Jack.Lower Jack.Upper Perm.pVal Perm.Lower Perm.Upper
## i_felony       -51.6%      -6.1%    0.0120     -50.9%     -13.3%
## i_misdemea     -59.6%      -2.9%    0.0240     -55.6%     -19.6%
## i_drugs        -55.8%      60.4%    0.3120     -49.4%      21.7%
## any_crime      -35.6%      -0.9%    0.0120     -30.6%      -5.6%
## Omnibus            --         --    0.0240         --         --
plot_microsynth(sea2)

Example 3: Model feasibility while matching on more variables

Now, we will add additional outcome variables and also use them to match the treatment area to the synthetic control units. We do this at the risk of model feasibility, as each variable introduces another constraint.

match.out <- c("i_robbery", "i_aggassau", "i_burglary", "i_larceny", "i_felony", 
               "i_misdemea", "i_drugsale", "i_drugposs", "any_crime")

In the example below, without overriding the default weight parameters, microsynth will fail to find a feasible model. Weights would not be calculated, and no results or plots will be generated. But we may still attempt to estimate the model by setting check.feas = TRUE and use.backup = TRUE. This will check for feasibility, and if needed, invoke the computationally intensive LowRankQP package to calculate the weights.

Note that the additional matching variables introduce further constraints to the calculation of weights, lengthening the output. Moreover, the introduction of additional time-variant matching variables results in a poorer match on each, shown in the left column of plots, where red and dashed-black lines no longer track perfectly in the pre-intervention period.

Also note that we need not specify values for start.pre, end.pre, and end.post, as the default settings align with our intentions. Likewise, we can trust the default values for specifying the variables for the omnibus statistic (omnibus.var=result.var) by default. This way we specify the minimum number of non-default arguments.

sea3 <- microsynth(seattledmi, 
                   idvar="ID", timevar="time", intvar="Intervention", 
                   end.pre=12,
                   match.out=match.out, match.covar=cov.var, 
                   result.var=match.out, perm=250, jack=0, 
                   test="lower", check.feas=TRUE, use.backup = TRUE,
                   n.cores = min(parallel::detectCores(), 2))
## Weight Balance Table: 
## 
##                 Targets Final.Weighted.Control   All.scaled
## Intercept            39             39.0000000   39.0000000
## TotalPop           2994           2993.9999962 2384.7476665
## BLACK               173            172.9999994  190.5224020
## HISPANIC            149            148.9999998  159.2682016
## Males_1521           49             48.9999999   97.3746111
## HOUSEHOLDS         1968           1967.9999981 1113.5588052
## FAMILYHOUS          519            518.9999991  475.1876167
## FEMALE_HOU          101            100.9999998   81.1549471
## RENTER_HOU         1868           1867.9999976  581.9340386
## VACANT_HOU          160            159.9999998   98.4222153
## i_robbery.1.12       68             68.0000000   15.6938395
## i_aggassau.1.12      43             43.0000000   12.0737399
## i_burglary.1.12     805            805.0000000  193.3739888
## i_larceny.1.12      486            486.0000000  121.4250156
## i_felony.1.12       142            142.0000000   44.3350965
## i_misdemea.1.12     197            197.0000000   48.4284381
## i_drugsale.1.12      25             25.0000000    5.2056627
## i_drugposs.1.12      49             49.0000000    9.8693217
## any_crime.1.12     2780           2780.0000000  676.4327940
## i_robbery.12         12             11.3780112    1.7352209
## i_robbery.11          9              8.1200119    1.6057872
## i_robbery.10          4              3.8790126    1.1891724
## i_robbery.9           1              1.4903926    1.1325451
## i_robbery.8           7              6.3735828    1.2619788
## i_robbery.7           3              3.3506037    1.2296204
## i_robbery.6           3              4.2342276    1.3833230
## i_robbery.5           6              5.8637089    1.1810828
## i_robbery.4           4              5.4218880    1.4278158
## i_robbery.3           5              4.7600341    1.3064717
## i_robbery.2           8              7.4409586    1.0314250
## i_robbery.1           6              5.6875679    1.2093964
## i_aggassau.12         5              4.5410304    1.2174860
## i_aggassau.11         3              2.8597944    1.0233354
## i_aggassau.10         2              2.3494037    0.9383945
## i_aggassau.9          3              2.9482200    1.0233354
## i_aggassau.8          6              5.4558168    0.9545737
## i_aggassau.7          3              3.2969035    0.9424393
## i_aggassau.6          4              3.7611863    0.9586185
## i_aggassau.5          3              3.0746288    0.8251400
## i_aggassau.4          3              3.6486181    1.0516490
## i_aggassau.3          4              3.8827241    1.1285003
## i_aggassau.2          3              3.0690729    0.9262601
## i_aggassau.1          4              4.1126009    1.0840075
## i_burglary.12        76             78.4499795   18.8811450
## i_burglary.11        63             65.7638099   18.4928438
## i_burglary.10        57             55.9086862   16.7697573
## i_burglary.9         61             60.0254431   15.5805849
## i_burglary.8         72             67.3447013   15.4268824
## i_burglary.7         67             67.5134900   16.2560672
## i_burglary.6         57             57.5235401   15.1437461
## i_burglary.5         63             65.6461061   14.2700685
## i_burglary.4         71             70.6807842   15.4430616
## i_burglary.3         73             70.4732975   16.1266335
## i_burglary.2         79             78.1895056   15.0264468
## i_burglary.1         66             67.4806565   15.9567517
## i_larceny.12         54             51.7015032   12.1344119
## i_larceny.11         47             44.7209985   11.4710641
## i_larceny.10         36             35.7656801    9.8329185
## i_larceny.9          29             33.0946047    9.9219042
## i_larceny.8          43             42.0869055   10.0108899
## i_larceny.7          34             34.2189069    9.8612321
## i_larceny.6          44             42.5202884    9.6347231
## i_larceny.5          48             46.3312430    9.2423771
## i_larceny.4          43             43.1104549   10.4234599
## i_larceny.3          26             30.9303559   10.2818917
## i_larceny.2          47             45.4172546    9.0037337
## i_larceny.1          35             36.1018043    9.6064095
## i_felony.12          14             13.4649840    4.9023024
## i_felony.11          11             12.2487198    4.6313006
## i_felony.10           9              8.8112286    3.0740510
## i_felony.9            5              5.6915868    3.2641568
## i_felony.8           20             18.5910281    4.4331052
## i_felony.7            8              9.5571747    3.7616677
## i_felony.6           13             12.1564995    3.0012446
## i_felony.5           20             18.6556720    3.1549471
## i_felony.4           10             11.1852992    4.0245800
## i_felony.3            7              7.6267445    3.3693217
## i_felony.2           13             11.7375924    3.2803360
## i_felony.1           12             12.2734704    3.4380834
## i_misdemea.12        15             15.9973186    4.2470442
## i_misdemea.11        12             12.9415472    4.6070317
## i_misdemea.10        12             12.3463740    4.0771624
## i_misdemea.9         14             13.4441811    3.7414437
## i_misdemea.8         12             12.5298186    3.9679527
## i_misdemea.7         20             20.2577232    4.2551338
## i_misdemea.6         16             14.8067300    3.5594275
## i_misdemea.5         24             23.1921570    3.5634723
## i_misdemea.4         21             20.5812890    4.3360299
## i_misdemea.3         21             20.3293288    4.3845675
## i_misdemea.2         14             14.8119633    3.5351587
## i_misdemea.1         16             15.7615689    4.1540137
## i_drugsale.12         4              3.7383723    0.5541381
## i_drugsale.11         3              2.7324843    0.5298693
## i_drugsale.10         1              1.1002646    0.5015557
## i_drugsale.9          3              2.3763474    0.2710019
## i_drugsale.8          0              1.0056748    0.4247044
## i_drugsale.7          3              2.7028525    0.3680772
## i_drugsale.6          2              2.1738995    0.4570629
## i_drugsale.5          2              1.8652074    0.4206596
## i_drugsale.4          0              0.5045129    0.3802116
## i_drugsale.3          1              1.4287441    0.4975109
## i_drugsale.2          3              2.7283202    0.3640324
## i_drugsale.1          3              2.6433200    0.4368388
## i_drugposs.12         9              8.5638432    1.1001867
## i_drugposs.11         5              5.0224967    0.9828874
## i_drugposs.10         2              2.1844705    0.8210952
## i_drugposs.9          1              1.1416792    0.7078407
## i_drugposs.8          4              3.8532406    0.6876167
## i_drugposs.7          7              6.2881683    0.6835719
## i_drugposs.6          2              1.9836260    0.7806472
## i_drugposs.5          0              1.0329800    0.8089608
## i_drugposs.4          1              2.3331660    0.7442439
## i_drugposs.3          4              3.4342926    0.8574984
## i_drugposs.2          9              8.0074062    0.7725576
## i_drugposs.1          5              5.1546306    0.9222153
## any_crime.12        272            269.2276175   65.3397635
## any_crime.11        227            229.8320383   64.2395769
## any_crime.10        183            188.6396628   55.6929060
## any_crime.9         176            181.0969413   53.2377100
## any_crime.8         228            228.8212560   55.8142502
## any_crime.7         246            240.5669304   55.8061605
## any_crime.6         200            206.6802610   52.8291848
## any_crime.5         270            261.3992326   50.6530803
## any_crime.4         250            243.6446898   57.2946484
## any_crime.3         236            237.5358682   58.8680772
## any_crime.2         250            253.0277830   51.5429371
## any_crime.1         242            239.5277190   55.1144991
## 
## Results: 
## 
## end.post = 16
##            Trt    Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Perm.pVal
## i_robbery   11  21.49   -48.8%      0.0119       -70.2%       -12.0%    0.0520
## i_aggassau  12  16.46   -27.1%      0.1620       -58.5%        28.2%    0.1640
## i_burglary 245 294.10   -16.7%      0.0857       -33.2%         3.9%    0.1040
## i_larceny  145 165.45   -12.4%      0.1754       -30.5%        10.5%    0.1240
## i_felony    46  51.74   -11.1%      0.2788       -36.4%        24.2%    0.2520
## i_misdemea  45  62.75   -28.3%      0.0387       -47.5%        -2.0%    0.0240
## i_drugsale  11   4.22   160.9%      0.9482        23.6%       450.7%    0.9920
## i_drugposs   9  17.15   -47.5%      0.0225       -71.1%        -4.7%    0.0800
## any_crime  788 921.55   -14.5%      0.0963       -29.8%         4.1%    0.0680
## Omnibus     --     --       --      0.0129           --           --    0.0320
##            Perm.Lower Perm.Upper
## i_robbery      -70.8%     -17.0%
## i_aggassau     -63.2%      11.3%
## i_burglary     -32.2%       2.5%
## i_larceny      -31.3%       6.1%
## i_felony       -35.5%      15.9%
## i_misdemea     -51.6%      -7.9%
## i_drugsale      13.0%     313.8%
## i_drugposs     -72.1%     -14.9%
## any_crime      -27.9%       0.0%
## Omnibus            --         --

The results file now shows additional rows for the new outcome variables, and these are also displayed in plots.

plot_microsynth(sea3)

Example 4: Provide match.out as a list (time-aggregating matching variables)

Another potential response is to aggregate sparse variables across multiple time periods before using them to match to synthetic control. Rather than passing a vector of variable names to match.out and/or match.out.min, the user may pass a list; each element of the list is a vector corresponding to the time units across which each variable should be aggregated before matching, with each element named equal to the variable name. In this case, the element vectors represent the duration during which the variable should be aggregated, counting backwards from the intervention time.

In our dataset, incidences of drug sale are relatively scarce, and so are aggregated every four months before matching ('i_drugsale'=rep(4,3)); meanwhile, larceny is relatively common and so is matched un-aggregated ('i_larceny'=rep(1, 12)').

Each vector indicates the time-durations for aggregation, starting from the period directly prior to intervention and finishing with the earliest observations in the dataset. Because our end.pre = 12, to use the full dataset, each vector-element in the list should add to 12. Sums less than 12 would ignore portions of the pre-intervention data; sums more than 12 will throw an error, calling on more pre-intervention data than are available.

The aggregated variables now appear in the main weights summary table, e.g., “i_robbery.11.12”, representing the sum of reported robberies in time periods 11 and 12.

match.out <- list( 'i_robbery'=rep(2, 6), 
                   'i_aggassau'=rep(2, 6), 
                   'i_burglary'=rep(1, 12), 
                   'i_larceny'=rep(1, 12), 
                   'i_felony'=rep(2, 6), 
                   'i_misdemea'=rep(2, 6), 
                   'i_drugsale'=rep(4, 3), 
                   'i_drugposs'=rep(4, 3), 
                   'any_crime'=rep(1, 12))

sea4 <- microsynth(seattledmi, 
                   idvar="ID", timevar="time",intvar="Intervention",
                   match.out=match.out, match.covar=cov.var, 
                   result.var=names(match.out), omnibus.var=names(match.out), 
                   end.pre=12,
                   perm=250, jack = TRUE,
                   test="lower",
                   n.cores = min(parallel::detectCores(), 2))
## Weight Balance Table: 
## 
##                  Targets Weighted.Control  All.scaled
## Intercept             39        39.000439   39.000000
## TotalPop            2994      2994.026331 2384.747666
## BLACK                173       173.000878  190.522402
## HISPANIC             149       149.001317  159.268202
## Males_1521            49        49.000878   97.374611
## HOUSEHOLDS          1968      1968.018432 1113.558805
## FAMILYHOUS           519       519.001317  475.187617
## FEMALE_HOU           101       101.000000   81.154947
## RENTER_HOU          1868      1868.017554  581.934039
## VACANT_HOU           160       160.000439   98.422215
## i_robbery.11.12       21        21.000000    3.341008
## i_robbery.9.10         5         5.000000    2.321717
## i_robbery.7.8         10        10.000000    2.491599
## i_robbery.5.6          9         9.000439    2.564406
## i_robbery.3.4          9         9.000439    2.734287
## i_robbery.1.2         14        14.000439    2.240821
## i_aggassau.11.12       8         8.000439    2.240821
## i_aggassau.9.10        5         5.000000    1.961730
## i_aggassau.7.8         9         9.000000    1.897013
## i_aggassau.5.6         7         7.000000    1.783759
## i_aggassau.3.4         7         7.000000    2.180149
## i_aggassau.1.2         7         7.000000    2.010268
## i_burglary.12         76        76.001317   18.881145
## i_burglary.11         63        63.001755   18.492844
## i_burglary.10         57        57.002194   16.769757
## i_burglary.9          61        61.000000   15.580585
## i_burglary.8          72        72.000439   15.426882
## i_burglary.7          67        67.000878   16.256067
## i_burglary.6          57        57.000878   15.143746
## i_burglary.5          63        63.001317   14.270068
## i_burglary.4          71        71.000439   15.443062
## i_burglary.3          73        73.000439   16.126633
## i_burglary.2          79        79.000878   15.026447
## i_burglary.1          66        66.000439   15.956752
## i_larceny.12          54        54.000878   12.134412
## i_larceny.11          47        47.000878   11.471064
## i_larceny.10          36        36.000000    9.832918
## i_larceny.9           29        29.000000    9.921904
## i_larceny.8           43        43.000439   10.010890
## i_larceny.7           34        34.000878    9.861232
## i_larceny.6           44        44.000000    9.634723
## i_larceny.5           48        48.000439    9.242377
## i_larceny.4           43        43.002633   10.423460
## i_larceny.3           26        26.000878   10.281892
## i_larceny.2           47        47.000439    9.003734
## i_larceny.1           35        35.000000    9.606409
## i_felony.11.12        25        25.000439    9.533603
## i_felony.9.10         14        14.000000    6.338208
## i_felony.7.8          28        28.000000    8.194773
## i_felony.5.6          33        33.000439    6.156192
## i_felony.3.4          17        17.000000    7.393902
## i_felony.1.2          25        25.000000    6.718419
## i_misdemea.11.12      27        27.000000    8.854076
## i_misdemea.9.10       26        26.000000    7.818606
## i_misdemea.7.8        32        32.000878    8.223086
## i_misdemea.5.6        40        40.000878    7.122900
## i_misdemea.3.4        42        42.000000    8.720597
## i_misdemea.1.2        30        30.001317    7.689172
## i_drugsale.9.12       11        11.000000    1.856565
## i_drugsale.5.8         7         7.000000    1.670504
## i_drugsale.1.4         7         7.000000    1.678594
## i_drugposs.9.12       17        17.000000    3.612010
## i_drugposs.5.8        13        13.000439    2.960797
## i_drugposs.1.4        19        19.000439    3.296515
## any_crime.12         272       272.004827   65.339764
## any_crime.11         227       227.003511   64.239577
## any_crime.10         183       183.003511   55.692906
## any_crime.9          176       176.000878   53.237710
## any_crime.8          228       228.001755   55.814250
## any_crime.7          246       246.003950   55.806161
## any_crime.6          200       200.002633   52.829185
## any_crime.5          270       270.006583   50.653080
## any_crime.4          250       250.003511   57.294648
## any_crime.3          236       236.003511   58.868077
## any_crime.2          250       250.004389   51.542937
## any_crime.1          242       242.002194   55.114499
## 
## Results: 
## 
## end.post = 16
##            Trt    Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Jack.pVal
## i_robbery   11  18.60   -40.9%      0.0427       -65.8%         2.3%    0.2601
## i_aggassau  12  16.64   -27.9%      0.1593       -59.3%        27.9%    0.3665
## i_burglary 245 314.59   -22.1%      0.0217       -36.7%        -4.1%    0.3074
## i_larceny  145 168.76   -14.1%      0.1303       -31.2%         7.3%    0.3442
## i_felony    46  52.83   -12.9%      0.2555       -38.6%        23.5%    0.3721
## i_misdemea  45  56.78   -20.7%      0.1133       -42.4%         9.0%    0.3067
## i_drugsale  11   6.13    79.5%      0.8643       -18.1%       293.3%    0.7541
## i_drugposs   9  15.46   -41.8%      0.0618       -68.8%         8.5%    0.3000
## any_crime  788 961.64   -18.1%      0.0405       -32.1%        -1.1%    0.3170
## Omnibus     --     --       --      0.0032           --           --    0.3171
##            Jack.Lower Jack.Upper Perm.pVal Perm.Lower Perm.Upper
## i_robbery      -80.9%      83.5%    0.0840     -69.2%      -8.4%
## i_aggassau     -81.6%     183.5%    0.1960     -61.2%      17.3%
## i_burglary     -62.2%      60.4%    0.0200     -35.6%      -7.5%
## i_larceny      -51.9%      53.5%    0.1040     -29.8%       3.2%
## i_felony       -55.3%      69.6%    0.2240     -39.7%      17.4%
## i_misdemea     -60.5%      59.2%    0.1400     -44.4%       2.8%
## i_drugsale     -66.7%     866.4%    0.9360     -20.0%     157.4%
## i_drugposs     -84.7%     120.8%    0.1520     -68.6%      -7.4%
## any_crime      -56.2%      53.3%    0.0200     -29.1%      -6.6%
## Omnibus            --         --    0.0560         --         --

The aggregation of the outcome variables over time is not directly reflected in the results table, though the estimates have changed as a consequence of the aggregation.

plot_microsynth(sea4)

Partial calls to microsynth

The following examples will demonstrate that microsynth can be used to calculate weights and variance estimators, produce results, and display charts separately, one at a time. This can be useful given the time-intensive nature of calculating weights and generating permutation groups. It allows for weights to be saved once calculated and for plots and results to be reproduced iteratively without repeating the matching process.

Example 5: Weights only

This setting represses reporting of results by setting result.var = FALSE. Only weights will be calculated. Note that settings for permutation groups (perm) and jackknife replication groups (jack) are considered when calculating weights, and then will not be referred to again in calls that only produce plots or display results.

match.out <- c("i_felony", "i_misdemea", "i_drugs", "any_crime")

sea5 <- microsynth(seattledmi, 
                   idvar="ID", timevar="time",intvar="Intervention",
                   end.pre=12,
                   match.out=match.out, match.covar=cov.var,
                   result.var=FALSE, perm=0, jack=FALSE,
                   n.cores = min(parallel::detectCores(), 2))

summary(sea5)

Appropriately, the table summarizing the main weights may be viewed, but results are unavailable.

Example 6: Results only

If weights have already been calculated, then microsynth() can also be configured to only reproduce results. Results are displayed for all outcome variables used for exact matches (result.var = match.out). Further, results can now be calculated for any single or group of follow-up periods (end.post=c(14,16)) without having to re-calculate weights.

sea6 <- microsynth(seattledmi, 
                   idvar="ID", timevar="time", intvar="Intervention",
                   end.pre=12, end.post=c(14, 16),
                   result.var=match.out,
                   test="lower", 
                   w=sea5$w,
                   n.cores = min(parallel::detectCores(), 2))

sea6

For each follow-up period, a separate results table is provided. If saving to file, this requires the file be saved as an XLSX rather than a CSV; each table will be saved to a different XLSX tab.

##  microsynth object
## 
## Scope:
##  Units:          Total: 9642 Treated: 39 Untreated: 9603
##  Study Period(s):    Pre-period:  - 12   Post-period: 13 - 14
##      Study Period(s):    Pre-period:  - 12   Post-period: 13 - 16
##  Constraints:        Exact Match:        Minimized Distance: 
## Time-variant outcomes:
##  Exact Match: TRUE (1)
##  Minimized Distance: (0)
## Time-invariant covariates:
##  Exact Match: TRUE (1)
##  Minimized Distance: (0)
## 
## Results:
## end.post = 14
##            Trt    Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper
## i_felony    28  37.56   -25.5%      0.0920       -49.3%         9.7%
## i_misdemea  19  34.57   -45.0%      0.0068       -64.3%       -15.4%
## i_drugs     11  14.58   -24.6%      0.2193       -60.5%        44.2%
## any_crime  401 504.05   -20.4%      0.0509       -37.1%         0.7%
## Omnibus     --     --       --      0.0094           --           --
## 
## end.post = 16
##            Trt    Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper
## i_felony    46  68.22   -32.6%      0.0109       -50.3%        -8.4%
## i_misdemea  45  71.80   -37.3%      0.0019       -52.8%       -16.7%
## i_drugs     20  23.76   -15.8%      0.2559       -46.4%        32.1%
## any_crime  788 986.44   -20.1%      0.0146       -32.9%        -4.9%
## Omnibus     --     --       --      0.0015           --           --

Example 7: Plots only

If weights have already been calculated, then plot_microsynth() can be used to display plots from the original microsynth object. In this case, we limit plots to a subset of time-variant variables (plot.var=match.out[1:2]).

plot_microsynth(sea6, plot.var=match.out[1:2])

Alternative applications for microsynth

Example 8: Apply microsynth in the traditional setting of Synth

One of the major differences between Synth and microsynth is that Synth requires that the treatment is confined to a single unit of observation, and to estimating the effect on a single outcome variable; in contrast, microsynth anticipates that treatment has been applied to multiple areas and can estimate effects with respect to multiple outcomes. But microsynth can also be applied to this simpler case.

To demonstrate, first we will create a reduced dataset with 1 treatment block and 100 control blocks.

set.seed(86872)
ids.t <- names(table(seattledmi$ID[seattledmi$Intervention==1]))
ids.c <- names(table(seattledmi$ID[seattledmi$Intervention==0]))
ids.synth <- c(sample(ids.t, 1), sample(ids.c, 100))
seattledmi.one <- seattledmi[is.element(seattledmi$ID, as.numeric(ids.synth)), ]

Then microsynth can be run on the dataset with just a single variable passed out match.out, so that effect is estimated for only one variable, as with Synth. Due to the small size of the reduced dataset, model feasibility may be an issue (so we set use.backup = TRUE and check.feas = TRUE) and variance estimators will be less reliable.

sea8 <- microsynth(seattledmi.one, 
                   idvar="ID", timevar="time", intvar="Intervention", 
                   match.out=match.out[4], match.covar=cov.var, 
                   result.var=match.out[4],
                   test="lower", perm=250, jack=FALSE, 
                   check.feas=TRUE, use.backup=TRUE,
                   n.cores = min(parallel::detectCores(), 2))
## Weight Balance Table: 
## 
##              Targets Final.Weighted.Control All.scaled
## Intercept          1              1.0000000  1.0000000
## TotalPop          43             45.0872376 63.9207921
## BLACK              0              0.6857716  5.0396040
## HISPANIC           0              1.1513635  4.1485149
## Males_1521         0              0.9157095  2.5148515
## HOUSEHOLDS        21             22.1250470 30.5346535
## FAMILYHOUS        14              9.6115861 13.1584158
## FEMALE_HOU         2              1.2966062  2.0594059
## RENTER_HOU        21             14.0944158 16.1089109
## VACANT_HOU         0              0.6825798  2.2079208
## any_crime.12       3              2.3875630  1.4554455
## any_crime.11       1              0.9742075  1.2574257
## any_crime.10       1              0.8642002  1.0891089
## any_crime.9        1              0.8096172  1.1188119
## any_crime.8        2              0.9482611  1.0297030
## any_crime.7        0              0.9685659  1.1089109
## any_crime.6        2              1.3311058  0.9801980
## any_crime.5        0              0.4332195  0.8514851
## any_crime.4        1              1.1243468  0.9405941
## any_crime.3        1              1.0562650  1.0396040
## any_crime.2        1              1.3648515  0.9603960
## any_crime.1        1              0.4958930  1.0297030
## 
## Results: 
## 
## end.post = 16
##           Trt  Con Pct.Chng Linear.pVal Linear.Lower Linear.Upper Perm.pVal
## any_crime   2 5.53   -63.8%      0.0073       -85.3%       -11.1%    0.0800
## Omnibus    --   --       --      0.1091           --           --    0.0800

Example 9: Cross-sectional data for propensity score-type weights

microsynth() may also be used to calculate propensity score-type weights. We will demonstrate this by transforming our panel data into a cross-sectional dataset with data corresponding to our final observed period.

seattledmi.cross <- seattledmi[seattledmi$time==16, colnames(seattledmi)!="time"]

By setting match.out = FALSE, no outcome variables will be used to calculate weights, only (time-invariant) covariates (match.covar). No outcome-reporting variables (result.var = NULL) need be reported. Plots are therefore inappropriate, but results (i.e., a summary of weights only) can be saved to file or viewed using summary.

sea9 <- microsynth(seattledmi.cross, 
                   idvar="ID", intvar="Intervention",
                   match.out=FALSE, match.covar=cov.var,
                   result.var=NULL, 
                   test="lower",
                   perm=250, jack=TRUE,
                   n.cores = min(parallel::detectCores(), 2))
##  microsynth object
## 
## Scope:
##  Units:          Total: 9642 Treated: 39 Untreated: 9603
##  Study Period(s):    Pre-period:  -  Post-period:  - 
##  Constraints:        Exact Match: 10     Minimized Distance: 0
## Time-variant outcomes:
##  Exact Match: FALSE (1)
##  Minimized Distance: (0)
## Time-invariant covariates:
##  Exact Match: TotalPop, BLACK, HISPANIC, Males_1521, HOUSEHOLDS, FAMILYHOUS, FEMALE_HOU, RENTER_HOU, VACANT_HOU (9)
##  Minimized Distance: (0)

References

Abadie A, Gardeazabal J (2003). “The economic costs of conflict: A case study of the Basque Country.” , pp. 113-132.

Abadie A, Diamond A, Hainmueller J (2010). “Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program.” , 105(490), 493-505.