Dear all,
You are cordially invited to attend Khoa Doan’s PhD defense on 2nd August
(Tomorrow) at 11:30 AM.
Title: Generative models meet similarity search: robust, heuristic-free and
explainable retrieval models
Time: Monday, August 2, 2021, 11:30 AM Eastern Time
Zoom link:
https://virginiatech.zoom.us/j/88669363880?pwd=UmtpdWJjalhMSE9acUNMa3lqb21v…
Committee:
Dr. Chandan K. Reddy, CS, VT, (Chair)
Dr. Bimal Viswanath, CS, VT
Dr. Anuj Karpatne, CS, VT
Dr. Lifu Huang, C <https://www.wisc.edu/>S, VT
Dr. Sathiya Keerthi Selvaraj, Linkedin AI
Abstract:
The rapid growth of digital data, especially visual and textual contents,
brings many challenges to the problem of finding similar data. Exact
similarity search, which aims to exhaustively find all relevant items
through a linear scan in a dataset, is impractical due to its high
computational complexity. Approximate-nearest-neighbor (ANN) search
methods, especially the Learning-to-hash or Hashing methods, provide
principled approaches that balance the trade-offs between the quality of
the guesses and the computational cost for web-scale databases. In this era
of data explosion, it is crucial for the hashing methods to be both
computationally efficient and robust to various scenarios such as the
presence of noisy data or data that slightly changes over time (i.e.,
out-of-distribution).
This thesis focuses on the development of practical generative
learning-to-hash methods and explainable retrieval models. We first
identify and discuss the various components of the generative modeling
framework which can be used to improve the model design and generalization
of the hashing methods. We then propose an unsupervised adversarial
framework and a supervised energy-based hashing network that can
efficiently learn the hash functions directly from raw data. The
unsupervised framework can be easily adapted to a new problem domain. We
also show that the proposed generative hashing methods enjoy several
appealing empirical and theoretical properties such as low-sample
requirement, and out-of-distribution and data-corruption robustness.
Finally, in domains with structured data such as graphs, we show that the
computational methods in generative modeling have an interesting utility
beyond estimating the data distribution and describe a retrieval framework
that can explain its decision by borrowing the algorithmic ideas developed
in these methods. Specifically, we propose an optimal alignment algorithm
that achieves a better similarity approximation for a pair of structured
objects, such as graphs, while capturing the alignment between the nodes of
the graphs to explain the similarity calculation. This "explainable"
feature is valuable for domain experts, who also want to understand how the
model makes its predictions.
--
Chandan K. Reddy
Professor
Department of Computer Science
Virginia Tech
http://www.cs.vt.edu/~reddy/