TUTORIAL
Social Media Mining and Analysis for Business Innovation.
Monday June 13 Morning, 2016 Tutorial Speaker: Feida Zhu Assistant Professor in School of Information Systems, Singapore Management University (SMU). |
Abstract: Our time has been characterised by an explosion of data of all sorts. In particular, the recent blossom of social network services has provided everyone with an unprecedented level of ease and fun of sharing information of all kinds. These public social data therefore reveal a surprisingly large amount of information about an individual which is otherwise unavailable. The business, consumer and social insights attainable from this big and dynamic social data are critically important and immensely valuable in a wide range of applications for both private and public sectors. What can we tell from the social data on the context of consumer behaviour, such that we can enrich the transaction-based data of traditional corporate databases? How can we unleash the power of social connections to identify potential high-value customers and perform cost-effective risk management? How to achieve dynamic social listening on 200 million users and detect in realtime marketing opportunities based on bursty events? In this tutorial, we will introduce a cluster of research results that underlie some initial answers to these questions, along with recent advances in real-life enterprise-level applications.
Biography: Feida Zhu is an assistant professor in School of Information Systems, Singapore Management University (SMU). His research interests include large-scale data mining, text mining, graph/network mining and social network analysis. Feida is the Founding Director of the Pinnacle Lab for Analytics with China Ping An Insurance Group and the DBS-SMU Life Analytics Lab. He has published more than 80 papers in referred international conferences and journals, including ICDE, VLDB, SIGMOD, ICDM, WWW, JMLR, TODS, TKDE, etc. His work has won The Best Paper Award at 2016 International Conference on Database Systems for Advanced Applications (DASFAA’16) and The Best Student Paper Awards at 2007 IEEE International Conference on Data Engineering (ICDE’07) and 2007 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’07). Feida obtained his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign (UIUC) in 2009, supervised by Prof. Jiawei Han.
Privacy Preserving Data Publishing: From K-Anonymity to Differential Privacy.
Monday June 13 Afternoon, 2016 Tutorial Speaker: Xiaokui Xiao Professor at the School of Computer Science and Engineering, Nanyang Technological University (NTU), Singapore. |
Abstract: The advancement of information technologies has made it never easier for various organizations (e.g., hospitals, census bureaus) to create large repositories of user data (e.g., patient data, census data). Such data repositories are of tremendous research value, due to which there is much benefit in making them publicly available. Nevertheless, as the data are sensitive in nature, proper measures must be taken to ensure that their publication does not endanger the privacy of the individuals that contributed the data. In this tutorial, I will review the general methodologies for privacy preserving data publishing, with focuses on three classic notions of privacy (i.e., k-anonymity, l-diversity, and differential privacy) and their variants. I will summarize the techniques developed for each privacy notion, and clarify the pros and cons of each notion. I will also discuss open problems and directions for future research.
Biography: Xiaokui Xiao is an associate professor at the School of Computer Science and Engineering, Nanyang Technological University (NTU), Singapore. His research focuses on data management and data privacy. He received a PhD degree from the Chinese University of Hong Kong, and worked as a postdoctoral associate at the Cornell University before joining NTU. He was a winner of the Hong Kong Young Scientist Award in 2009, and has two papers invited to the TKDE special issues on “The Best of ICDE 2010” and “The Best of ICDE 2015”, respectively.
Towards Interactive Big Spatial Data Analytics.
Monday June 13 Evening, 2016 Tutorial Speaker: Feifei Li Associate Professor at the School of Computing, University of Utah. |
Abstract: Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and high-throughput spatial queries and analytics for numerous applications in location-based services (LBS). Traditional spatial databases and spatial analytics systems are disk-based and optimized for IO efficiency. But increasingly, data are stored and processed in memory to achieve low latency, and CPU time becomes the new bottleneck. We will present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API. It introduces the concept and construction of indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba's superior performance compared against other spatial analytics system. Through its SQL and DataFrame API, Simba provides interactive analytics over big spatial data, but when data grows too big and/or computation becomes too expensive, we will talk about achieving interactive (or becoming more interactive in these scenarios) spatial analytics through online sampling, online aggregation, and online analytics. We will survey related work for systems that process big spatial data and techniques for interactive and online queries and analytics.
Biography: Feifei Li is currently an associate professor at the School of Computing, University of Utah. His research focuses on improving the scalability, the efficiency, and the effectiveness of database and big data management systems. He also works on various data security problems in these systems. He was a recipient for an NSF career award in 2011, two HP IRP awards in 2011 and 2012 respectively, a Google App Engine award in 2013, the IEEE ICDE best paper award in 2004, the IEEE ICDE 10+ Years Most Influential Paper Award in 2014, a Google Faculty award in 2015, and the SIGMOD Best Demonstration Award in SIGMOD 2015. He is/was the demo PC chair for VLDB 2014, the general co-chair for SIGMOD 2014, a PC area chair for both ICDE 2014 and SIGMOD 2015, and an associate editor for IEEE TKDE.