Gradstudents

Download

gradstudents@cs.vt.edu

August 2021

20 participants
51 discussions

PhD Defense of Khoa Doan
by Chandan Reddy 01 Aug '21

01 Aug '21

Dear all, You are cordially invited to attend Khoa Doan’s PhD defense on 2nd August (Tomorrow) at 11:30 AM. Title: Generative models meet similarity search: robust, heuristic-free and explainable retrieval models Time: Monday, August 2, 2021, 11:30 AM Eastern Time Zoom link: https://virginiatech.zoom.us/j/88669363880?pwd=UmtpdWJjalhMSE9acUNMa3lqb21v… Committee: Dr. Chandan K. Reddy, CS, VT, (Chair) Dr. Bimal Viswanath, CS, VT Dr. Anuj Karpatne, CS, VT Dr. Lifu Huang, C <https://www.wisc.edu/>S, VT Dr. Sathiya Keerthi Selvaraj, Linkedin AI Abstract: The rapid growth of digital data, especially visual and textual contents, brings many challenges to the problem of finding similar data. Exact similarity search, which aims to exhaustively find all relevant items through a linear scan in a dataset, is impractical due to its high computational complexity. Approximate-nearest-neighbor (ANN) search methods, especially the Learning-to-hash or Hashing methods, provide principled approaches that balance the trade-offs between the quality of the guesses and the computational cost for web-scale databases. In this era of data explosion, it is crucial for the hashing methods to be both computationally efficient and robust to various scenarios such as the presence of noisy data or data that slightly changes over time (i.e., out-of-distribution). This thesis focuses on the development of practical generative learning-to-hash methods and explainable retrieval models. We first identify and discuss the various components of the generative modeling framework which can be used to improve the model design and generalization of the hashing methods. We then propose an unsupervised adversarial framework and a supervised energy-based hashing network that can efficiently learn the hash functions directly from raw data. The unsupervised framework can be easily adapted to a new problem domain. We also show that the proposed generative hashing methods enjoy several appealing empirical and theoretical properties such as low-sample requirement, and out-of-distribution and data-corruption robustness. Finally, in domains with structured data such as graphs, we show that the computational methods in generative modeling have an interesting utility beyond estimating the data distribution and describe a retrieval framework that can explain its decision by borrowing the algorithmic ideas developed in these methods. Specifically, we propose an optimal alignment algorithm that achieves a better similarity approximation for a pair of structured objects, such as graphs, while capturing the alignment between the nodes of the graphs to explain the similarity calculation. This "explainable" feature is valuable for domain experts, who also want to understand how the model makes its predictions. -- Chandan K. Reddy Professor Department of Computer Science Virginia Tech http://www.cs.vt.edu/~reddy/

1 0