Skip to contents

This function estimates the sample size \(n_s\), or equivalently problem dimension \(\kappa_s = p/n_s\), that two classes from the data becomes separable. To locate \(\kappa_s\), we bisect the interval \([p/n, 0.5]\), until the window size is smaller than eps. For each sample size nn, it generates B subsamples of size nn, and estimate the separable probability \(\hat{\pi}\) with the proportion of separable subsamples. Finally we fit a logistic regression using \(\hat{\pi}\) as response and \(\kappa = p/nn\) as covariate to determine the \(\hat{\kappa}\) where separable probability is 0.5.

Usage

probe_frontier(X, Y, B = 10, eps = 0.001, verbose = FALSE)

Arguments

X

Covariate matrix. Each row in X is one observation.

Y

Response vector of \(+1\) and \(-1\) representing the two classes. Y has the same length as the number of rows in X.

B

Numeric. How many subsamples should I generate for each sample size?

eps

Numeric. Minimum window size. Terminate when the search interval is smaller than eps

verbose

Print prgress if TRUE.

Value

Numeric. Estimated \(\hat{\kappa}\).

References

A modern maximum-likelihood theory for high-dimensional logistic regression, Pragya Sur and Emmanuel J. Candes, Proceedings of the National Academy of Sciences Jul 2019, 116 (29) 14516-14525

Examples

# Y is independent of X, kappa_s is approximately 0.5
n <- 1000; p <- 200
X <- matrix(rnorm(n*p, 0, 1), n, p)
Y <- 2 * rbinom(n, 1, 0.5) - 1
probe_frontier(X, Y, verbose = TRUE)
#> ------ Begin Probe Frontior ------
#> kappa =  0.355 ; pi_hat =  0 
#> kappa =  0.4325 ; pi_hat =  0 
#> kappa =  0.47125 ; pi_hat =  0 
#> kappa =  0.490625 ; pi_hat =  0.2 
#> kappa =  0.5003125 ; pi_hat =  0.2 
#> kappa =  0.5051562 ; pi_hat =  0.8 
#> kappa =  0.5027344 ; pi_hat =  0.4 
#> kappa =  0.5039453 ; pi_hat =  0.3 
#> kappa =  0.5045508 ; pi_hat =  0.6 
#> Found kappa_s =  0.5 
#> ------- 
#> [1] 0.5