September 2017 - Dm-meeting - mailman.cs.vt.edu

Thursday, Sep. 28, @ 11:00 am, Torg 3160 A, Bijaya Adhikari
by Sorour Ekhtiari Amiri 27 Sep '17

27 Sep '17

Hi everyone, Please join us for our next DM-Meeting. ****DM-Meeting Fall 2017**** WHO: Bijaya Adhikari WHEN: Thursday, Sep. 28 2017, @ 11:00 am WHERE: Torg 3160 A WHAT: Bijaya will talk about his internship project: Title: Mining E-Commerce Query Relations using Customer Interaction Networks Abstract: Customer Interaction Networks (CINs) are a natural framework for representing and mining customer interactions with E-Commerce search engines. Customer interactions begin with the submission of a query formulated based on an initial product intent, followed by a sequence of product engagement and query reformulation actions. Engagement with a product (eg. clicks), signals its relevance to the customer’s product intent. Reformulation to a new query indicates either dissatisfaction with current results, or an evolution in the customer’s product intent. Analyzing such interactions within and across sessions, enables us to discover various query-query and query-product relationships. In this work, we begin by studying the properties of a real-world customer interaction network developed usingWalmart.com’s product search logs. We observe that CINs exhibit significantly different properties compared to other real world networks (eg. WWW, social networks etc.), making it possible to mine intent relationships between queries, based purely on their structural information. In particular, we show that one can formulate the problem of clustering queries with similar intents, as a community detection task on CINs. Our results show that existing community detection methods already do a good job at identifying intent based query clusters, without using any textual features. We further identify their limitations and propose improved methods for the task. Finally, we show how these relations can be exploited to a) significantly improve search quality for poorly performing queries, and b) identify the most influential (aka. Critical) queries whose search quality is crucial in enabling an E-Commerce search engine satisfy the most customers. Via extensive experiments, we show that our CIN based methods significantly outperform existing baselines in practice. Best regards, Sorour

1 0

Great news: Yao completes his defense!
by B. Aditya Prakash 21 Sep '17

21 Sep '17

Warmest congratulations now Dr. (Prof.!) Zhang who successfully finished his defense earlier today! Best wishes for a great research career ahead! - Aditya

2 1

Beginner's Creed
by B. Aditya Prakash 15 Sep '17

15 Sep '17

Nice article in ACM CACM recently, thought will forward it to everyone: https://cacm.acm.org/magazines/2017/7/218869-the-beginners-creed/fulltext cheers, - Aditya

1 0

Monday Sep. 18 @ 1:15 pm, Torg 3160, Yao Zhang
by Sorour Ekhtiari Amiri 14 Sep '17

14 Sep '17

Hi everyone, Please join us for our next DM-Meeting. ****DM-Meeting Fall 2017**** WHO: Yao Zhang *WHEN: Monday, Sep. 18, 2017 @ 1:15 pmWHERE: Torgerson Hall, Room 3160* WHAT: Yao will give his prep talk for his final defense. Please, note that we will skip our meeting next week on Thursday. The following is the title and the abstract of Yao's talk, ================= Title: Optimizing and Understanding Network Structure for Diffusion Abstract Given a population contact network and electronic medical records of patients, how to distribute vaccines to individuals to effectively control a flu epidemic? Similarly, given the Twitter following network and tweets, how to choose the best communities/groups to stop rumors from spreading? How to find the best accounts that bridge celebrities and ordinary users? These questions are related to diffusion (aka propagation) phenomena. Diffusion can be treated as a behavior of spreading contagions (like viruses, ideas, memes, etc.) on some underlying network. It is omnipresent in areas such as social media, public health, and cyber security. Examples include diseases like flu spreading on person-to-person contact networks, memes disseminating by online adoption over online friendship networks, and malware propagating among computer networks. When a contagion spreads, network structure (like nodes/edges/groups, etc.) plays a major role in determining the outcome. For instance, a rumor, if propagated by celebrities, can go viral. Similarly, an epidemic can die out quickly, if vulnerable demographic groups are successfully targeted for vaccination. Hence in this thesis, we aim to optimize and understand network structure better in light of diffusion. We optimize graph topologies by removing nodes/edges for controlling rumors/ viruses from spreading, and gain a deeper understanding of a network in terms of diffusion by exploring how nodes group together for similar roles of dissemination. We develop several novel graph mining algorithms, with different levels of granularity (node/edge level to group/community level), from model-driven and data-driven perspectives, focusing on topics like immunization on networks, graph summarization, community detection and graph embedding. In contrast to previous work, we are the first to systematically develops more realistic, implementable and data-based graph algorithms to control contagions. In addition, our thesis is also the first work to use diffusion to effectively summarize graphs and understand communities/groups of networks in a general way 1. Model-driven. Diffusion processes are usually described using mathematical models, e.g., the Independent Cascade (IC) model in social media, and the Susceptible-Infectious-Recovered (SIR) model in epidemiology. Given such models, we propose to optimize network structure for controlling propagation (the immunization problem) in several practical and implementable settings, taking into account the presence of infections, the uncertain nature of the data and group structure of the population. We develop efficient algorithms for different interventions, such as vaccination (node removal) and quarantining (edge removal). In addition, we study the graph coarsening problem to obtain a better understanding of relations among nodes when a contagion is propagating. We seek to get a much smaller representation of a large network, while preserving its diffusive properties. 2. Data-driven. Model-driven approaches can provide ideal results if underlying diffusion models are given. However, in many situations, diffusion processes are very complicated, and it is challenging or even impossible to pick the most suited model to describe them. In addition, rapid technological development has provided an abundance of data such as tweets and electronic medical records. Hence, in the second part of the thesis, we explore data-driven approaches for diffusion in networks, which can directly work on propagation data by relaxing modeling assumptions of diffusion. To be specific, we first develop data-driven immunization strategies to stop rumors or allocate vaccines by optimizing network topologies, using large-scale national-level diagnostic patient data with billions of flu records. Second, we propose a novel community detection problem to discover "bridge" and "celebrity" communities from social media data, and design case studies to understand roles of nodes/communities using diffusion. Second, Finally, we study the subgraph embedding problem, which seeks to map subgraphs such as a snapshot of a cascade into low dimensional feature space to facilitate network analysis. Our work has many applications in multiple areas such as epidemiology, sociology and computer science. For example, our work on efficient immunization algorithms, such as data-driven immunization, can help CDC better allocate vaccines to control flu epidemics in major cities. Similarly, in social media, our work on understanding network structure using diffusion can lead to better community discovery, such as finding media accounts that can boost tweet promotions in Twitter. Best regards, Sorour

1 0

Thursday, Sep.14th @ McB 133c
by Sorour Ekhtiari Amiri 12 Sep '17

12 Sep '17

Hi everyone, Hope you had a great summer and welcome back! In Fall 2017 we schedule our meetings from 11:00 am to 12:00 pm on Thursdays in the meeting room 3160A Torgerson hall. We will have our first meeting this Thursday, Sep. 14th in McB 133c *as an exception* since the room in Torgerson hall was reserved. This week's meeting will be about welcoming everyone back, and discussing KDD papers and what people did during the summer. Hope to see you all then! Best regards, Sorour

1 0

Fwd: Re: [Bburg-fac] CS grad seminar: Tao Xie, UIUC
by B. Aditya Prakash 08 Sep '17

08 Sep '17

Hi guys, FYI. Paper deadlines not withstanding, I think you should go to this talk. Especially folks who are graduating, graduating soon, or graduating in future :-) cheers, - Aditya -------- Forwarded Message -------- Subject: Re: [Bburg-fac] CS grad seminar: Tao Xie, UIUC Date: Fri, 8 Sep 2017 10:38:47 -0400 From: Bburg-fac via Scott McCrickard <bburg-fac(a)cs.vt.edu> Reply-To: Scott McCrickard <mccricks(a)cs.vt.edu> To: Bburg-fac via Scott McCrickard <bburg-fac(a)cs.vt.edu> CC: bburg-gradstudents(a)cs.vt.edu A reminder about today's talk--hope to see everyone for this external speaker. Bburg-fac via Scott McCrickard wrote: > > Hi all, > > A reminder about this week's CS Grad Seminar speaker, Tao Xie from UIUC. > He is hosted by Na Meng and Danfeng Yao. His talk will be on Friday > September 8 11:15am-12:30pm in Torg 2160. > > Students, particularly those interested in systems and related research > areas, are encouraged to attend his student meeting on Friday 4-5 in > KW2 1110. > > Information about his talk is below. > > Scott > > > Title: Software Analytics: Data Analytics for Software Engineering and > Security > > Abstract: A huge wealth of various data exists in software life cycle, > including source/byte code, feature specifications, bug reports, test > cases, execution traces/logs, and real-world user feedback, etc. Data plays > an essential role in modern software development, because hidden in the > data is information about the quality of software and services as well as > the dynamics of software development. Software analytics is to utilize > data-driven approaches to enable software practitioners to perform data > exploration and analysis in order to obtain insightful and actionable > information for completing various tasks around software systems, software > users, and software development process, cutting across the areas of > software engineering and security, etc. > > Short Bio: Tao Xie is a Professor and Willett Faculty Scholar in the > Department of Computer Science at the University of Illinois at > Urbana-Champaign, USA. His research interests are in software engineering, > focusing on software testing, program analysis, software analytics, > software security, and educational software engineering. He received a > Microsoft Research Outstanding Collaborators Award, a Google Faculty > Research Award, an IBM Jazz Innovation Award, and three-time IBM Faculty > Awards. He is an ACM Distinguished Speaker and an IEEE Computer Society > Distinguished Visitor. He is an ACM Distinguished Scientist. His homepage > is at http://taoxie.cs.illinois.edu. > > _______________________________________________ > Bburg-fac mailing list > Bburg-fac(a)cs.vt.edu -- D. Scott McCrickard Associate Professor, Department of Computer Science Fellow, Institute for Creativity, Arts, and Technology Virginia Tech, Blacksburg VA 24061-0902 http://www.cs.vt.edu/~mccricks/ phone: (540) 231-6698 fax: (540) 231-9218 _______________________________________________ Bburg-fac mailing list Bburg-fac(a)cs.vt.edu

1 0