This function computes the asymptotic inflation and variance of the MLE of a binary regression when both \(n\) and \(p\) are large, assuming that covariates \(X\) are multivariate Gaussian.
Arguments
- glm_output
An object from the class
glm
, for example the output from theglm
function- verbose
If
TRUE
, print progress at every step.- echo
If
TRUE
, return the inputglm_output
.- ...
Additional arguments.
Value
A list with elements
- glm_output
If
echo = TRUE
, returns the inputglm_output
.- gamma_hat
Estimated signal strength \(\gamma_0\).
- param
Estimated paramters \((\alpha_\star, \lambda_\star, \sigma_\star, b_\star)\). If the model does not contain an intercept, returns \((\alpha_\star, \lambda_\star, \sigma_\star)\).
- intercept
Does the model contain an intercept?
- tau_hat
Estimated conditional standard deviation.
- coef_adj
Adjusted MLE: this is \(\hat{\beta}^{\mathrm{MLE}} / \alpha_\star\).
- std_adj
Estimated standard error of \(\hat{\beta}_j\): \(\sigma_\star / \tau_j\).
- coef_unadj
Unadjusted MLE.
- std_unadj
Unadjusted standard error. This is output from the
glm
function.
Details
This function estimates coefficients of a binary regression by adjusting the MLE
based on the theory when \(n\) and \(p\) grows to infinity, and the covariates are multivariate Gaussian.
In this setting, the MLE of one variable \(\beta_j\) is
$$
\hat{\beta}^{\mathrm{MLE}}_j - \alpha_\star \beta_j \approx \mathcal{N}(0, \sigma_\star^2 / \tau_j^2).
$$
where \(\tau_j^2\) is the conditional variance of \(x_j\) given all the other variables.
This implies \(\hat{\beta}_j^\mathrm{Adj} = \hat{\beta}^{MLE}_j / \alpha_\star\) is unbiased for the coefficient \(\beta_j\).
adjust_binary
computes \(\hat{\beta}_j^\mathrm{Adj}\) and returns it in the vector coef_adj
.
Additional arguments
- B
Number of repetitions at each
kappa
in ProbeFrontier algorithm.- tol
tol
parameter when solving for signal strength and intercept. seesignal_strength
.
Notes
The variables should be centered to have zero mean.
The input to this function should be an object from the class
glm
, and it should includefamily
specifying the link function. Currently, the algorithm can handle logit and probit link.if estimated \(\hat{\gamma}<0.0001\), we set \(\gamma=0\).
References
The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance, Qian Zhao, Pragya Sur and Emmanuel J. Candes, arXiv:2001.09351
Examples
# Problem size
n <- 1000L
p <- 300L
# Generate data matrix
X <- matrix(rnorm(n*p, 0, 1), n, p) / sqrt(p)
# Sample parameters
beta <- rep(c(0, 1), p / 2) * 2
# Generate response
Y <- rbinom(n, 1, 1 / (1 + exp(- X %*% beta)))
fit <- glm(Y ~ X + 0, family = binomial, x = TRUE, y = TRUE)
adjusted_fit <- adjust_binary(fit)
#> Link function: logit
#> ------ Begin Probe Frontior ------
#> kappa = 0.405 ; pi_hat = 0.3
#> kappa = 0.4575 ; pi_hat = 1
#> kappa = 0.43125 ; pi_hat = 0.7
#> kappa = 0.418125 ; pi_hat = 0.7
#> kappa = 0.4115625 ; pi_hat = 0.5
#> kappa = 0.4148438 ; pi_hat = 0.6
#> kappa = 0.4132031 ; pi_hat = 0.6
#> kappa = 0.4123828 ; pi_hat = 0.7
#> Found kappa_s = 0.4097962
#> -------
#> -- Finding signal strength and the intercept --
#> No intercept, estimated signal strength gamma_hat = 1.267636
#> ----
#> Found parameters:
#> alpha_s lambda_s sigma_s
#> 1.766048 4.471162 2.795273
# Adjusted MLE
head(adjusted_fit$coef_unadj)
#> X1 X2 X3 X4 X5 X6
#> -2.044095 1.215670 3.915436 6.992635 -5.281657 9.027532