Estimate the problem dimension where two classes become linearly separable
Source:R/probe_frontier.R
probe_frontier.Rd
This function estimates the sample size \(n_s\), or equivalently problem dimension
\(\kappa_s = p/n_s\), that two classes from the data becomes separable. To locate \(\kappa_s\),
we bisect the interval \([p/n, 0.5]\), until the window
size is smaller than eps
. For each sample size nn
, it generates
B
subsamples of size nn
, and estimate the separable probability
\(\hat{\pi}\) with the proportion of separable subsamples.
Finally we fit a logistic regression using \(\hat{\pi}\) as response
and \(\kappa = p/nn\) as covariate to determine the \(\hat{\kappa}\)
where separable probability is 0.5.
Arguments
- X
Covariate matrix. Each row in
X
is one observation.- Y
Response vector of \(+1\) and \(-1\) representing the two classes.
Y
has the same length as the number of rows inX
.- B
Numeric. How many subsamples should I generate for each sample size?
- eps
Numeric. Minimum window size. Terminate when the search interval is smaller than
eps
- verbose
Print prgress if
TRUE
.
References
A modern maximum-likelihood theory for high-dimensional logistic regression, Pragya Sur and Emmanuel J. Candes, Proceedings of the National Academy of Sciences Jul 2019, 116 (29) 14516-14525
Examples
# Y is independent of X, kappa_s is approximately 0.5
n <- 1000; p <- 200
X <- matrix(rnorm(n*p, 0, 1), n, p)
Y <- 2 * rbinom(n, 1, 0.5) - 1
probe_frontier(X, Y, verbose = TRUE)
#> ------ Begin Probe Frontior ------
#> kappa = 0.355 ; pi_hat = 0
#> kappa = 0.4325 ; pi_hat = 0
#> kappa = 0.47125 ; pi_hat = 0
#> kappa = 0.490625 ; pi_hat = 0.2
#> kappa = 0.5003125 ; pi_hat = 0.2
#> kappa = 0.5051562 ; pi_hat = 0.8
#> kappa = 0.5027344 ; pi_hat = 0.4
#> kappa = 0.5039453 ; pi_hat = 0.3
#> kappa = 0.5045508 ; pi_hat = 0.6
#> Found kappa_s = 0.5
#> -------
#> [1] 0.5