hgwrr

The package hgwrr is used to calibrate Hierarchical and Geographically Weighted Regression (HGWR) model on spatial data. It requires the spatial hierarchical structure in the data; i.e., samples are grouped by their locations. All the variables are either in the group level or sample level. For the group-level variables, they can have fixed effects (globally constant) or spatially weighted effects (varying with the location). For the sample-level variables, they can have fixed effects or random effects (varying among groups). We note the fixed effects as \(\beta\), the group-level spatially weighted (GLSW) effects as \(\gamma\), and sample-level random (SLR) effects as \(\mu\). The HGWR model consists of these three kinds of effects and estimates the three kinds of effects considering the spatial heterogeneity.

library(hgwrr)
#> Loading required package: sf
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
#> Loading required package: MASS

Usage

Model calibration

To calibrate a HGWR model, use the function hgwr().

hgwr(
  formula, data, ..., bw = "CV",
  kernel = c("gaussian", "bisquared"),
  alpha = 0.01, eps_iter = 1e-6, eps_gradient = 1e-6,
  max_iters = 1e6, max_retries = 1e6,
  ml_type = c("D_Only", "D_Beta"), verbose = 0
)

The following is explanation of some important parameters.

formula

This parameter specifies the model form. Recall that the three kinds of effects are GLSW, fixed, and SLR effects. They are specified in different parts of the formula.

response ~ L(GLSW) + fixed + (SLR | group)

In the formula, L() is used to mark some effects as GLSW effects, and ( | group) is used to set the SLR effects and grouping indicator. Only group-level variables can have GLSW effects.

data

sf objects

From version 0.3-1, this parameter supports sf objects. In this case, no further arguments in ... are required. Here is an example.

data(wuhan.hp)
m_sf <- hgwr(
  formula = Price ~ L(d.Water + d.Commercial) + BuildingArea + (Floor.High | group),
  data = wuhan.hp,
  bw = 299
)

data.frame objects

If the data is a normal data.frame object, an extra argument coords is required to specify the coordinates of each group. Note that the row order of coords needs to match that of the group variable. Here is an example.

data(mulsam.test)
m_df <- hgwr(
  formula = y ~ L(g1 + g2) + x1 + (z1 | group),
  data = mulsam.test$data,
  coords = mulsam.test$coords
)

bw and kernel

Argument bw is the bandwidth used to estimate GLSW effects. It can be either of the following options:

  • A integer value representing the number of nearest neighbours.
  • "CV" letting the algorithm select one.

Argument kernel is the kernel function used to estimate GLSW effects. Currently, there are only two choices: "gaussian" and "bisquared".

Results

The output of returned object of hgwr() shows the estimates of the effects.

m_df
#> Hierarchical and geographically weighted regression model
#> =========================================================
#> Formula: y ~ L(g1 + g2) + x1 + (z1 | group)
#>  Method: Back-fitting and Maximum likelihood
#>    Data: mulsam.test$data
#> 
#> Fixed Effects
#> -------------
#>  Intercept        x1 
#>   4.056759  1.967648 
#> 
#> Group-level Spatially Weighted Effects
#> --------------------------------------
#> Bandwidth: 9.35816 (nearest neighbours)
#> 
#> Coefficient estimates:
#>  Coefficient        Min  1st Quartile     Median  3rd Quartile        Max 
#>    Intercept  -2.769060     -2.708289  -2.356463     -2.225995  -2.022646 
#>           g1   0.876505      1.253144   1.702822      1.939969   2.336628 
#>           g2   1.082775      1.279601   1.424307      1.607909   1.722892 
#> 
#> Sample-level Random Effects
#> ---------------------------
#>    Groups       Name  Std.Dev.      Corr 
#>     group  Intercept  1.032962           
#>                   z1  1.032962  0.000000 
#>  Residual             1.032962           
#> 
#> Other Information
#> -----------------
#> Number of Obs: 873
#>        Groups: group , 25

And the summary() method shows some diagnostic information.

summary(m_df)
#> Hierarchical and geographically weighted regression model
#> =========================================================
#> Formula: y ~ L(g1 + g2) + x1 + (z1 | group)
#>  Method: Back-fitting and Maximum likelihood
#>    Data: mulsam.test$data
#> 
#> Parameter Estimates
#> -------------------
#> Fixed effects:
#>             Estimated   Sd. Err      t.val  Pr(>|t|)      
#>  Intercept   4.056759  0.203079  19.976270  0.000000  *** 
#>         x1   1.967648  0.033827  58.168658  0.000000  *** 
#> 
#> Bandwidth: 9.35816 (nearest neighbours)
#> 
#> GLSW effects:
#>             Mean Est.  Mean Sd.     ***    **     *     . 
#>  Intercept  -2.421973  0.251700  100.0%  0.0%  0.0%  0.0% 
#>         g1   1.641343  1.823056    0.0%  0.0%  0.0%  0.0% 
#>         g2   1.435709  1.506236    0.0%  0.0%  0.0%  0.0% 
#> 
#> SLR effects:
#>    Groups       Name      Mean  Std.Dev.      Corr 
#>     group  Intercept  0.000000  1.032962           
#>                   z1  1.869552  1.032962  0.000000 
#>  Residual             0.088510  1.032962           
#> 
#> 
#> Diagnostics
#> -----------
#>  rsquared  0.905066 
#>    logLik       NaN 
#>       AIC       NaN 
#> 
#> Scaled Residuals
#> ----------------
#>        Min         1Q    Median        3Q       Max 
#>  -3.408088  -0.576387  0.100854  0.734105  3.036324 
#> 
#> Other Information
#> -----------------
#> Number of Obs: 873
#>        Groups: group , 25

The significance level of spatial heterogeneity in GLSW effects can be tested with the following codes.

summary(m_df, test_hetero = T)
#> Hierarchical and geographically weighted regression model
#> =========================================================
#> Formula: y ~ L(g1 + g2) + x1 + (z1 | group)
#>  Method: Back-fitting and Maximum likelihood
#>    Data: mulsam.test$data
#> 
#> Parameter Estimates
#> -------------------
#> Fixed effects:
#>             Estimated   Sd. Err      t.val  Pr(>|t|)      
#>  Intercept   4.056759  0.203079  19.976270  0.000000  *** 
#>         x1   1.967648  0.033827  58.168658  0.000000  *** 
#> 
#> Bandwidth: 9.35816 (nearest neighbours)
#> 
#> GLSW effects:
#>             Mean Est.  Mean Sd.     ***    **     *     . 
#>  Intercept  -2.421973  0.251700  100.0%  0.0%  0.0%  0.0% 
#>         g1   1.641343  1.823056    0.0%  0.0%  0.0%  0.0% 
#>         g2   1.435709  1.506236    0.0%  0.0%  0.0%  0.0% 
#> 
#> SLR effects:
#>    Groups       Name      Mean  Std.Dev.      Corr 
#>     group  Intercept  0.000000  1.032962           
#>                   z1  1.869552  1.032962  0.000000 
#>  Residual             0.088510  1.032962           
#> 
#> 
#> Diagnostics
#> -----------
#>  rsquared  0.905066 
#>    logLik       NaN 
#>       AIC       NaN 
#> 
#> Scaled Residuals
#> ----------------
#>        Min         1Q    Median        3Q       Max 
#>  -3.408088  -0.576387  0.100854  0.734105  3.036324 
#> 
#> Other Information
#> -----------------
#> Number of Obs: 873
#>        Groups: group , 25

Some other methods are provided.

head(coef(m_df))
#>   Intercept       g1       g2       x1       z1
#> 1 0.9143066 2.336628 1.633698 1.967648 1.817139
#> 2 1.1269566 1.932128 1.626517 1.967648 2.305685
#> 3 1.8867179 2.027690 1.659433 1.967648 2.251592
#> 4 1.1245250 2.265663 1.536906 1.967648 1.591036
#> 5 1.7726751 2.219179 1.607909 1.967648 1.698600
#> 6 0.7008420 2.082628 1.421329 1.967648 1.855599
head(fitted(m_df))
#> [1]  3.659871  4.317510  6.929765  1.768491  0.511762 -2.023591
head(residuals(m_df))
#> [1] -0.5654830 -0.7380541  0.9197850  0.5707894 -0.3850239 -0.1648946

Further reading

Model comparison

Mathematical basis

The following papers shows more details about the mathematical basis about the HGWR model.