Harnessing the power of PCA in modern applications
Registration Link: https://cityu.zoom.us/meeting/register/tJcuc-6prz4rGtSO46fmIn05JQoRXaUSoDkm
In the era of big data, the analysis of large scale text and network data has become a widely studied hot topic. Notably, many of these popular data models exhibit a low-rank signal plus noise structure, with the quantities of interest hidden in the signal. A fundamental challenge arises in accurately estimating these quantities from empirical data. Principal Component Analysis (PCA) is a widely recognized tool for addressing such challenges. However, because of the inherent heterogeneity in data, the performance of PCA is usually unsatisfactory. In this talk, I will explain how adopting the normalization idea can improve the performance of PCA in the context of network membership estimation. The focus will be on the Degree-Corrected Mixed Membership (DCMM) model under severe degree heterogeneity. Specifically, I will introduce an optimal spectral algorithm to estimate network memberships (the weights of each node in different communities), by leveraging the Laplacian normalization and Mixed-SCORE algorithm (Jin et al. 2022). Additionally, new random matrix theory (RMT) results on the entry-wise eigenvector analysis will be discussed. These results are not only crucial for the technical aspects of our algorithm but also hold independent interest. This is a joint work with Zheng Tracy Ke.