• Robust Matrix Completion
    In this project, we worked on the problem of performing matrix completion given samples from a sparsely corrupted matrix. As opposed to existing convex optimization based techniques, we proposed an efficient algorithm based on hard thresholding and singular value projection to solve the problem. Under some identifiability constraints which relate to the unique identification of the sparse and low rank components of the matrix, we showed that our algorithm perfectly recovers the low rank part of the matrix (Note that the sparse component cannot be recovered as we may not sample some entries from the matrix). The bounds on the amount of corruption that we can tolerate are known to be optimal upto a constant factor and our running time and sample complexities nearly match the best known bounds for matrix completion, a strictly easier problem. As an additional consequence of our analysis, we also substantially improved the previous running time and sample complexity bounds for non-convex methods for matrix completion. We validated our algorithm on both synthetic and real-world datasets. [ArXiv]

  • Non-Convex Outlier-Robust PCA
    Here, we worked on the problem of performing PCA when a subset of columns of the matrix have been adversarially corrupted. As opposed to the standard robust PCA setups, which generally assure the exact recovery of the whole matrix, we cannot guarantee that we will be able to accurately recover the whole matrix. We instead prove that we will recover the principal subspace of the matrix. For this problem as well, we proposed an efficient algorithm based on hard thresholding and singular value decomposition. Under suitable identifiability conditions, we showed that our algorithm recovers the true subspace in time that is nearly linear in the number of non-zero entries of the matrix. Similar to the previous project, the fraction of columns that can be corrupted is optimal up to a constant factor. We empirically validated our algorithm on both synthetic and real-world anomaly detection datasets.

  • Entity Linking and Disambiguation
    Entity linking which is the task of annotating mentions of entities on web pages to their entries in a Knowledge Base. Entity linking is an important problem in semantic web search and knowledge base population. However, several entities are not included in our Knowledge Base. Because of this problem, a large fraction of entity mentions in web text do not have any referents in the Knowledge Base. To address this issue, we developed a model based on hierarchical non-parametric topic models which attempts to automatically discover new entities in web text and add them to the Knowledge Base. We made several optimizations to the basic Gibbs sampling algorithm which leveraged the hierarchy as well as several blocking-based heuristics. We evaluated the model both on Wikipedia as well as a smaller dataset constructed from Yago and Wikipedia. [Slides] [Report]

  • Contour and Junction Detection in Architectural Images
    In this project, I worked on the problem of detecting contours and junctions in architectural images. The goal was to build a system which when provided with images of buildings would be able to automatically construct a 3D model of the building meant to aid architects in renovating a building. We implemented the gPB algorithm, a state of the art algorithm for detecting contours on natural images. We made further improvements to the algorithm which exploited the regularity in our input images. We also evaluated our algorithm on a dataset of architectural images.