大数据系列讲座之三: Online Feature Selection with Streaming Features 2013-06-13


题目:Online Feature Selection with Streaming Features


主讲人:丁薇 教授 Department of Computer Science University of Massachusetts Boston


时间:2013年6月14日(周五)上午10:00-11:30


地点:tyc234cc 太阳成集团313室


主持人:廖貅武 教授


欢迎广大师生前来参加!


附件:


丁薇教授简历:


Wei Ding received her Ph.D. degree ifrom the University of Houston in 2008. She has been an Assistant Professor in the University of Massachusetts Boston since 2008. Her main research interests include Big data and Data Mining, etc.. She has published more than 70 refereed research papers, 1 book, and has 1 patent. She is an Associate Editor of Knowledge and Information Systems (KAIS) and an editorial board member of the Journal of System Education (JISE). She is the recipient of a Best Paper Award at IEEE International Conference on Tools with Artificial Intelligence (ICTAI) 2011, a Best Paper Award at IEEE International Conference on Cognitive Informatics (ICCI) 2010, a Best Poster Presentation award at ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPAITAL GIS) 2008, and a Best Ph.D. Work Award between 2007 and 2010 from the University of Houston. Her research projects are currently sponsored by NASA and DOE.

 

讲座内容:

 

We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for online streaming feature selection include (1) the continuous growth of feature volumes over time; (2) a large feature space, possibly of unknown or infinite size; and (3) the unavailability of the entire feature set before learning starts.

In the paper, we present a novel Online Streaming Feature Selection (OSFS) method to select strongly relevant and non-redundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.