Grassmann Averages for Scalable Robust PCA
As the collection of large datasets becomes increasingly
automated, the occurrence of outliers will increase --
or in terms of buzzwords: "big data implies big outliers".
While principal component analysis (PCA) is often used
to reduce the size of data, and scalable solutions exist,
it is well-known that outliers can arbitrarily corrupt
the results. Unfortunately, state-of-the-art approaches
for robust PCA do not scale beyond small-to-medium sized
datasets. To address this, we introduce the Grassmann
Average (GA), which expresses dimensionality reduction
as an average of the subspaces spanned by the data.
Because averages can be efficiently computed, we immediately
gain scalability. GA is inherently more robust than PCA,
but we show that they coincide for Gaussian data.
We exploit that averages can be made robust to formulate
the Robust Grassmann Average (RGA) as a form of robust PCA.
Robustness can be with respect to vectors (subspaces) or
elements of vectors; we focus on the latter and use a
trimmed average. The resulting Trimmed Grassmann Average
(TGA) is particularly appropriate for computer vision
because it is robust to pixel outliers.
The algorithm has low computational complexity and minimal
memory requirements, making it scalable to "big noisy data."
We demonstrate TGA for background modeling, video restoration,
and shadow removal. We show scalability by performing robust
PCA on the entire Star Wars IV movie; a task beyond any
currently existing method.
Work in collaboration with Aasa Feragen (DIKU) and Michael
J. Black (MPI-IS).