A variety of massive data sets exhibit an underlying
structure that can be modeled as dynamic weighted multi-digraphs.
Their sizes range from tens of
gigabytes to petabytes. These include the World Wide Web, Internet
Traffic
and Telephone Call Detail. These data sets sheer volume brings
with it a
series of computational and visualization challenges due mainly
to the
I/O and Screen Bottlenecks.
We present external memory algorithms for connectivity and
minimum spanning
trees together with heuristics for quasi-clique finding. We
describe how
hierarchy trees help us to cope in a unified manner with both,
the I/O and
screen bottlenecks. This line of research has suggested the
need to look for
"novel" graph representations in order to provide
a user or a data mining
engine with informed navigation paths.
We will discuss our results with graphs having on the order
of 200 million
vertices and several billion edges and we will point out some
mathematical
problems that have surfaced along the way.
The overall goal is to extract useful information that can be
brought into
a user's palm top and to export these techniques to other mining
domains.
Related information can be obtained by accessing
http://www.research.att.com/~abello
and
www.visdays.com
James Abello received the PhD degree in Combinatorial Algorithms from
the University of California, San Diego, and the MS degree in Operating Systems
from the University of California, Santa Barbara.
He is the the recipient of a University of California President's Postdoctoral
Fellowship in Computer Science and was awarded a UCSB Outstanding Teaching Award.
James is the co-editor of
James has published in Discrete Mathematics, Combinatorial and Computational
Geometry, Algorithms and Data Structures, Massive Data Sets, Algorithm
Animation and Visualization.
He has worked on the development of software systems like:
-
MGV(A Massive Graph Visualizer, with J. Korn),
-
AGE(An Animated Graph Environment, with T. Veatch),
-
Mirage(An interpreted language for algorithm animation, with C. Smith), and
-
a Quasi-Clique Extractor(with S. Sudarsky).
James has held several academic positions and has been a senior
member of technical staff at AT&T Shannon Laboratories and Bell Labs.
He is currently a visiting scientist at Dimacs, Rutgers University.
e-mail:
abello@dimacs.rutgers.edu,
abelloj@optonline.net