Hello everyone,
Sorry for the late notice: if you are free, please join us for Bijaya
Adhikari's proposal (prelim exam) TODAY May 7 12pm at Torg 3160A.
You can also join through zoom @
https://virginiatech.zoom.us/j/578961900
<https://www.google.com/url?q=https%3A%2F%2Fvirginiatech.zoom.us%2Fj%2F57896…>
Meeting ID: 578 961 900
Title+abstract are below.
Best,
- Aditya
==================
***Title***
Domain-based Frameworks and Embeddings for Dynamics over Networks
*** Abstract ***
Broadly this thesis looks into network and time-series mining problems
pertaining to dynamics over networks in various domains. Which locations
and staff should we monitor in order to detect C. Difficle outbreaks in
hospitals? How do we predict the peak intensity of the influenza
incidence in an interpretable fashion? How do we infer the states of all
nodes in a critical infrastructure network where failures have occurred?
Leveraging domain-based information should make it is possible to answer
these questions. However, several new challenges arise, such as (a)
presence of more complex dynamics. The dynamics over networks that we
consider are complex. For example, C. Difficle spreads via both
people-to-people and surface-to-people interactions and correlations
between failures in critical infrastructures go beyond the network
structure and depend on the geography as well. Traditional approaches
either rely on models like Susceptible Infectious (SI) and Independent
Cascade (IC) which are too restrictive because they focus only on single
pathways or do not incorporate the model at all, resulting in
sub-optimality. (b) handling data sparsity. Additionally, data sparsity
still persists in this space. Specifically, it is difficult to collect
exact state of each node in the network as it is high-dimensional and
difficult to directly sample from. (c) lack of generalizability. In many
situations, the dynamics depend on a mixture of several models or are
unknown. In such cases, methods which generalize well to unobserved (or
unknown) models are required. Current approaches often fail in tackling
these challenges as they either rely on restrictive models, or require
large volume of data.
In this thesis, we propose to leverage domain-based frameworks, which
include novel models and analysis techniques, and domain-based low
dimensional representation learning to tackle the challenges above for
networks and time-series mining tasks. By developing novel frameworks,
we can capture the complex dynamics accurately and analyze them more
efficiently. For example, to detect C. Difficle outbreaks in a hospital
setting, we use a two-mode model to capture multiple pathways of
outbreaks and discrete lattice-based optimization framework. Similarly,
we propose an information theoretic framework which includes
geographically correlated failures in critical infrastructure networks
to infer network states. Moreover, as we use more realistic frameworks
to accurately capture and analyze the mechanistic processes themselves,
our approaches are effective even with sparse data. At the same time,
learning low-dimensional domain-aware embeddings capture domain specific
properties (like incidence-based similarity between historical influenza
seasons) more efficiently from sparse data, which is useful for
subsequent tasks. Similarly, since domain-aware embeddings capture the
model information directly from the data without any modeling
assumptions, they generalize better to new models.
Our domain-aware frameworks and embeddings enable many applications in
critical domains. For example, our domain-aware frameworks for C.
Difficle allows different monitoring rates for people and locations,
thus detecting more than 95% of outbreaks. Similarly, our framework for
product recommendation in E-commerce for queries with sparse engagement
data results in a 34% improvement over the current Walmart.com search
engine. Similarly, our novel framework leads to a near optimal
algorithms, with additive approximation guarantee, for inferring network
states given partial observation of the failures in networks. By
exploiting domain-aware embeddings, we outperform non-trivial
competitors by upto 40% for influenza forecasting. Similarly,
domain-aware representations of subgraphs helped us outperform
non-trivial baselines by upto 68% in graph classification tasks.
*** Committee ****
B. Aditya Prakash (chair)
Madhav Marathe
Naren Ramakrishnan
Chandan Reddy
Jimeng Sun (Georgia Tech)
Please join us for our next DM-Meeting.
****DM-Meeting Spring 2019****
WHO: Bijaya Adhikari
WHEN: Monday, May 6th, 2019 - 11 AM
WHERE: McBryde 133C
WHAT: Bijaya will practice his preliminary proposal
""
Title: Domain-based Frameworks and Embeddings for Dynamics over Networks
Abstract: Broadly this thesis looks into network and time-series mining
problems pertaining to dynamics over networks in various domains. We answer
questions like which locations and staff should we monitor in order to
detect *C. Difficle* outbreaks in hospitals as quickly as possible? How do
we predict the peak intensity of the influenza incidence in the current
season in an interpretable fashion? How do we select the group of queries
which are most critical to ensure high user satisfaction for an E-commerce
system? Due to ease of data collection in various domains, it is now
possible to answer these questions. However, these questions pose several
new challenges. The first challenge is to (a) incorporate more realistic
dynamics. The dynamical processes in the domains we consider are
complicated. For example, *C. Difficle* spreads via both people-to-people
and surface-to-people interactions and in E-commerce users can decide to
continue interaction with the system or arbitrarily decide to stop and
move on. Traditional transmission models like SI, IC, and SIR are too
restrictive to capture these processes. Hence, we need to incorporate more
realistic dynamics into account to accurately answer the questions above.
The second challenge is to (b) handle data sparsity. Despite data
collection being easier and cheaper, data sparsity problems persist.
Specially, the data related to incidence like infections and node failures
in infrastructure networks are difficult to obtain as they are constrained
by actual events and are hence hard to observe and directly sample from.
The third challenge is to (c) ensure robust performance. For the questions
presented above, the existing approaches are based on heuristics and fail
to provide robust guarantees in performance. However, robustness is of much
importance for the critical problems we study.
In this thesis, we propose to leverage domain-based frameworks, which
includes novel models and analysis techniques, and domain-based low
dimensional representation learning to tackle the challenges above for
networks and time-series mining tasks. The first challenge we must overcome
is to incorporate realistic transmission models to capture the complicated
dynamics. However, more realistic transmission model often requires more
challenging analysis techniques in formulating problems and designing
algorithms. Hence, we propose to leverage domain-based frameworks,
including both the models and analysis techniques to answer tackle these
problems. For example, for *C. Difficle* outbreak detection problem, we use
two-mode disease model to capture both people-to-people and
locations-to-people infection pathways of *C. Difficle* and discrete
lattice based optimization framework. Similarly, we propose randomized user
navigation model along with submodular function optimization for critical
queries mining in e-commerce and leverage geographically correlated failure
models with Minimum Description Length formulation to infer network states.
We overcome the second challenge on data sparsity by leveraging novel
models as well as by including domain-aware low dimensional embeddings. As
we use more realistic models to accurately capture the dynamics, we can
generalize from the sparse data better, hence overcoming the data sparsity
problem. On the other hand, domain-aware embeddings capture the important
domain-based aspect of the data and can help guide the learning
architecture to infer the correct function with less data. We carefully
construct domain-based frameworks by incorporating realistic models and
most relevant analysis techniques necessitated by the model. This approach
leads to meaningful problem formulation and often leads to near optimal
algorithms. On the other hand, domain-based embeddings augment learning
architectures and improve the performance.
Our thesis has several applications in critical tasks in variety of
domains. For example, our domain-aware frameworks for *C. Difficle*allows
different monitoring rates for people and locations, which the prior works
fail to do. Hence, our approach can detect more than 95% of outbreaks,
which is greater than any other competitor. Similarly, our framework for
product recommendation in E-commerce for queries with sparse engagement
data results in 34% improvement over Walmart.com search engine. It also
helped us in detecting most critical queries in e-commerce outperforming
non-trivial baselines like page-ranks, most frequent queries, and so on.
Similarly, we propose near optimal algorithms for inferring network states
given partial observation of the failures in networks. On the other hand,
by exploiting domain aware low dimensional representations, we outperform
non-trivial competitors by up to 40 percentage for influenza forecasting.
Similarly, domain-aware representations of subgraphs helped us outperform
non-trivial baselines in the graph classification task.
Best regards,
*Alexander Rodriguez*
PhD Student
Department of Computer Science
Virginia Tech
_______________________________________________
Seminar website: *http://people.cs.vt.edu/~arodriguez/DM-Meeting/home.html
<http://people.cs.vt.edu/~arodriguez/DM-Meeting/home.html>*
Please contact Alexander Rodriguez (arodriguez(a)cs.vt.edu <esorour(a)cs.vt.edu>),
if interested in giving a talk.
Dm-meeting mailing list
Dm-meeting(a)cs.vt.edu
https://mailman.cs.vt.edu/mailman/listinfo/dm-meeting