Estimate coefficients of a high-dimensional GLM

This is the generic function and methods associated with the adjusted MLE for high dimensional generalized linear models (glm).

Usage

adjust_glm(glm_output, verbose = FALSE, echo = TRUE, ...)

Arguments

glm_output: An object from the class glm, for example the output from the glm function
verbose: If TRUE, print progress at every step.
echo: If TRUE, returns the original glm_output.
...: Additional arguments.

Details

The function adjust_glm is built on the glm function: the input should be from the class glm, such as the output from glm. Currently, it works with binary regressions only, for which it calls the function adjust_binary to estimate coefficients and their standard deviations. In high dimensions, the magnitude of MLE is biased upward, and the variance estimates from classical theory based on inverse Fisher information underestimates the true variability. adjust_binary adjusts the MLE and variance estimates applying the asymptotic theory assuming \(p, n\to\infty\) (\(p<n\)), and is exact when covariates \(X\) is multivariate Gaussian.

adjust_glm creates an object from class glmadj. summary.glmadj creates a data table of adjusted coefficients, estimates of the standard error and p-values using a two-sided t-test. If the covariates are multivariate Gaussian, the p-values are asymptotically from a uniform distribution.

predict.glmadj computes the prediction \(X^\top \hat{\beta}^{\mathrm{Adj}}\) for new input data. The new data should be of the same format which you used to fit the glm. If there's no new data, it outputs the estimated \(X^\top \hat{\beta}_{\mathrm{Adj}}\) for the original dataset.

Examples

# Problem size
n <- 1000L
p <- 300L
# Generate data matrix
X <- matrix(rnorm(n*p, 0, 1), n, p)
# Generate parameters
beta <- rep(c(0, 1), p / 2) / sqrt(p) * 2
#' ## No intercept model
# Sample response vector
Y <- rbinom(n, 1, 1 / (1 + exp(- X %*% beta)))
# Fit a glm, by default computes a logistic regression
fit <- glm(Y ~ X + 0, family = binomial, x = TRUE, y = TRUE)
adjusted_fit <- adjust_glm(fit)
# Print summary
print(summary(adjusted_fit), max = 40)
#> Call:
#> adjust_glm(glm_output = fit)
#> 
#>      adjusted_mle         std z.value   p.value    
#> X1    -0.00847596  0.09080651 -0.0933 0.9256328    
#> X2     0.08228799  0.09158411  0.8985 0.3689210    
#> X3     0.00225270  0.09113093  0.0247 0.9802788    
#> X4     0.12776741  0.08180704  1.5618 0.1183317    
#> X5     0.12208436  0.09093356  1.3426 0.1794124    
#> X6     0.14182345  0.09224081  1.5375 0.1241624    
#> X7     0.16738694  0.09266929  1.8063 0.0708742 .  
#> X8     0.26618628  0.08890818  2.9939 0.0027539 ** 
#>  [ reached getOption("max.print") -- omitted 292 rows ]
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Predict on new data
head(predict(adjusted_fit))
#> [1]  2.36189042  1.55461876 -1.53630085 -0.03311306 -1.12050853  2.36940721
# Extract adjusted coefficients
head(adjusted_fit$coef_adj)
#>           X1           X2           X3           X4           X5           X6 
#> -0.008475955  0.082287991  0.002252704  0.127767408  0.122084356  0.141823446 
# Extract adjusted standard error
head(adjusted_fit$std_adj)
#>        X1        X2        X3        X4        X5        X6 
#> 0.1575873 0.1589367 0.1581503 0.1419694 0.1578077 0.1600764

Estimate coefficients of a high-dimensional GLM

Usage

Arguments

Details

See also

Examples