The Computer Science Colloquium

Thursday, February 18, 4:15pm, room 9204/05



Juliana Freire
(University of Utah)

"Conquering the Digital Data Overload"

      Computing has been an enormous accelerator to science and industry alike and it has led to an information explosion in many different fields. The unprecedented volume of data acquired by sensors, derived by simulations and analysis processes, and shared on the Web opens up new opportunities, but it also creates many challenges when it comes to managing data and its life cycle. In this talk, I will give an overview of work we did over the past five years whose goal is to conquer this information overload. I start by presenting techniques we have developed to help users locate, organize and retrieve information in the deep Web. Such information is currently out of reach for search engines, as it typically resides in online databases and document collections (e.g., PubMed, GenBank) and is only exposed on demand, as a user fills out and submit forms. Then, I discuss the importance of maintaining detailed provenance (also referred to as lineage and pedigree) for digital data. Provenance provides important documentation that is key to preserve data, to determine the data's quality and authorship, to understand, reproduce, as well as validate results. Besides introducing the provenance infrastructure we built for the VisTrails system (http://www.vistrailsorg), I describe novel uses of provenance in enabling scientists to collaborative analyze data; in teaching science; and in supporting rich, reproducible publications.

About the speaker: Juliana Freire is an Associate Professor at the School of Computing at the University of Utah. An important theme is Professor Freire's work is the development of data management technology to address new problems introduced by emerging applications, including the Web and e-Science. Her recent research has focused on two main topics: scientific data management and Web mining. Within scientific data management, she is best known for her work in provenance and scientific workflows, and for being a co-creator of the open-source VisTrails system. In Web mining, her research has spanned several topics, including focused Web crawling, deep-Web information discovery and retrieval, information extraction and integration. Professor Freire is an active member of the database and Web research communities, having co-authored over 90 technical papers and holding 4 U.S. patents. She is a recipient of an NSF CAREER and an IBM Faculty award. She has chaired or co-chaired several workshops and conferences, and she has participated as a program committee member in over 50 events. She is program chair for the World Wide Web Conference 2010. Her research has been funded by grants from the National Science Foundation, Department of Energy, National Institutes of Health, the University of Utah, and gifts from Microsoft Research, Yahoo! and IBM.


The Colloquium is supported by generous contributions from the Bloomberg, Information Builders, Inc., and Netlogic, Inc.

       


365 Fifth Ave, New York City 10016 | Room 4319 | Phone: 212.817.8190 | Fax: 212.817.1510 | compsci@gc.cuny.edu