Home
Scholarly Works
Exploring dimension learning via a penalized...
Journal article

Exploring dimension learning via a penalized probabilistic principal component analysis

Abstract

Establishing a low-dimensional representation of the data leads to efficient data learning strategies. In many cases, the reduced dimension needs to be explicitly stated and estimated from the data. We explore the estimation of dimension in finite samples as a constrained optimization problem, where the estimated dimension is a maximizer of a penalized profile likelihood criterion within the framework of a probabilistic principal components analysis. Unlike other penalized maximization problems that require an ‘optimal’ penalty tuning parameter, we propose a data-averaging procedure whereby the estimated dimension emerges as the most favourable choice over a range of plausible penalty parameters. The proposed heuristic is compared to a large number of alternative criteria in simulations and an application to gene expression data. Extensive simulation studies reveal that none of the methods uniformly dominate the other and highlight the importance of subject-specific knowledge in choosing statistical methods for dimension learning. Our application results also suggest that gene expression data have a higher intrinsic dimension than previously thought. Overall, our proposed heuristic strikes a good balance and is the method of choice when model assumptions deviated moderately.

Authors

Deng WQ; Craiu RV

Journal

Journal of Statistical Computation and Simulation, Vol. 93, No. 2, pp. 266–297

Publisher

Taylor & Francis

Publication Date

January 22, 2023

DOI

10.1080/00949655.2022.2100890

ISSN

0094-9655

Contact the Experts team