This is the generic function and methods associated with the adjusted MLE for high dimensional generalized linear models (glm).
Details
The function adjust_glm
is built on the glm
function: the input should be from the
class glm
, such as the output from glm
.
Currently, it works with binary regressions only, for which it calls the function adjust_binary
to estimate coefficients and their standard deviations. In high dimensions,
the magnitude of MLE is biased upward, and the variance estimates from classical theory based on
inverse Fisher information underestimates the true variability. adjust_binary
adjusts the MLE and variance estimates
applying the asymptotic theory assuming \(p, n\to\infty\) (\(p<n\)), and is exact when
covariates \(X\) is multivariate Gaussian.
adjust_glm
creates an object from class glmadj
. summary.glmadj
creates a data table of adjusted coefficients,
estimates of the standard error and p-values using a two-sided t-test. If
the covariates are multivariate Gaussian, the p-values are asymptotically
from a uniform distribution.
predict.glmadj
computes the prediction \(X^\top \hat{\beta}^{\mathrm{Adj}}\) for
new input data. The new data should be of the same format which you
used to fit the glm
. If there's no new data, it outputs the estimated
\(X^\top \hat{\beta}_{\mathrm{Adj}}\) for the original dataset.
Examples
# Problem size
n <- 1000L
p <- 300L
# Generate data matrix
X <- matrix(rnorm(n*p, 0, 1), n, p)
# Generate parameters
beta <- rep(c(0, 1), p / 2) / sqrt(p) * 2
#' ## No intercept model
# Sample response vector
Y <- rbinom(n, 1, 1 / (1 + exp(- X %*% beta)))
# Fit a glm, by default computes a logistic regression
fit <- glm(Y ~ X + 0, family = binomial, x = TRUE, y = TRUE)
adjusted_fit <- adjust_glm(fit)
# Print summary
print(summary(adjusted_fit), max = 40)
#> Call:
#> adjust_glm(glm_output = fit)
#>
#> adjusted_mle std z.value p.value
#> X1 -0.00847596 0.09080651 -0.0933 0.9256328
#> X2 0.08228799 0.09158411 0.8985 0.3689210
#> X3 0.00225270 0.09113093 0.0247 0.9802788
#> X4 0.12776741 0.08180704 1.5618 0.1183317
#> X5 0.12208436 0.09093356 1.3426 0.1794124
#> X6 0.14182345 0.09224081 1.5375 0.1241624
#> X7 0.16738694 0.09266929 1.8063 0.0708742 .
#> X8 0.26618628 0.08890818 2.9939 0.0027539 **
#> [ reached getOption("max.print") -- omitted 292 rows ]
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Predict on new data
head(predict(adjusted_fit))
#> [1] 2.36189042 1.55461876 -1.53630085 -0.03311306 -1.12050853 2.36940721
# Extract adjusted coefficients
head(adjusted_fit$coef_adj)
#> X1 X2 X3 X4 X5 X6
#> -0.008475955 0.082287991 0.002252704 0.127767408 0.122084356 0.141823446
# Extract adjusted standard error
head(adjusted_fit$std_adj)
#> X1 X2 X3 X4 X5 X6
#> 0.1575873 0.1589367 0.1581503 0.1419694 0.1578077 0.1600764