KDD 2006 August 20 - 23, 2006    Philadelphia, USA

The Twelfth Annual SIGKDD International Conference on
Knowledge Discovery and Data Mining

August 20 - 23, 2006
Philadelphia, USA


A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees
Wei Fan, Joe McCloskey, Philip Yu

A New Efficient Probabilistic Model for Mining Labeled Ordered Trees
Kosuke Hashimoto, Kiyoko Aoki-Kinoshita, Nobuhisa Ueda, Minoru Kanehisa, Hiroshi Mamitsuka

Acclimatizing Taxonomic Semantics for Hierarchical Content Classification
Lei Tang, Jianping Zhang, Huan Liu

Adaptive Event Detection using Time-Varying Poisson Processes
Alexander Ihler, Jon Hutchins, Padhraic Smyth

Aggregating Time Partitions
Taneli Mielikainen, Evimaria Terzi, Panayiotis Tsaparas

Anonymization for Sequential Releases
Ke Wang, Benjamin C. M. Fung

Assessing data mining results via swap randomization
Aris Gionis, Heikki Mannila, Taneli Mielikainen, Panayiotis Tsaparas

Beyond Streams and Graphs: Dynamic Tensor Analysis
Jimeng Sun, Dacheng Tao, Christos Faloutsos

Center-Piece Subgraphs: Problem Definition and Fast Solutions
Hanghang Tong, Christos Faloutsos

Deriving Quantitative Models for Correlation Clusters
Elke Achtert, Christian Bohm, Hans-Peter Kriegel, Peer Kroger, Arthur Zimek

Detecting outliers using transduction and statistical significance testing
Daniel Barbara, Carlotta Domeniconi, James Rogers

Discovering significant rules
Geoff Webb

Efficient Anonymity-Preserving Data Collection
Justin Brickell, Vitaly Shmatikov

Estimating the Global PageRank of Web Communities
Jason Davis, Inderjit Dhillon

Event Detection from Evolution of Click-through Data
Qiankun Zhao, Tie-Yan Liu, Sourav S Bhowmick, Wei-Ying Ma

Extracing Redundancy-Aware Top-K Patterns
Dong Xin, Hong Cheng, Xifeng Yan, Jiawei Han

Extracting Key-Substring-Group Features for Text Classification
Dell Zhang, Wee Sun Lee

Fast Mining of High Dimensional Expressive Contrast Patterns Using Zero-suppressed
Binary Decision Diagrams

Elsa Loekito, James Bailey

Frequent Subgraph Mining in Outerplanar Graphs
Tamas Horvath, Jan Ramon, Stefan Wrobel

Generating Semantic Annotations for Frequent Patterns with Context Analysis
Qiaozhu Mei, Dong Xin, Hong Cheng, ChengXiang Zhai, Jiawei Han

Global Distance-Based Segmentation of Trajectories
Aris Anagnostopoulos, Michail Vlachos, Marios Hadjieleftheriou, Eamonn Keogh, Philip Yu

Group Formation in Large Social Networks: Membership, Growth, and Evolution
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, Xiangyang Lan

Hierarchical Topic Segmentation of Websites
Ravi Kumar, Kunal Punera, Andrew Tomkins

Learning Sparse Metrics via Linear Programming
Romer Rosales, Glenn Fung

Learning the Unified Kernel Machines for Classification
Steven C.H. Hoi, Edward Y. Chang, Michael R. Lyu

Learning to Rank Networked Entities
Sunny Aggarwal, Alekh Agarwal, Soumen Chakrabarti

Maximally Informative k-Itemsets and their Efficient Discovery
Arno Knobbe, Eric Ho

Measuring and Extracting Proximity in Networks
Yehuda Koren, Stephen North, Chris Volinsky

Mining Distance-based Outliers from Large Databases in Any Metric Space
Yufei Tao, Xiaokui Xiao, Shuigeng Zhou

Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach
Yiping Ke, James Cheng, Wilfred Ng

Mining Rank-Correlated Sets of Numerical Attributes
Toon Calders, Bart Goethals, Szymon Jaroszewicz

NeMoFinder: Dissecting genome wide protein-protein interactions with repeated and unique network motifs
Jin Chen, Wynne Hsu, Mong Li Lee, Seekiong Ng

New EM Derived from Kullback-Leibler Divergence
Longin Jan Latecki, Marc Sobel, Rolf Lakaemper

Orthogonal Nonnegative Matrix Tri-factorizations for Clustering
Chris Ding, Tao Li, Wei Peng

Out-of-Core Frequent Pattern Mining on a Commodity PC
Gregory Buehrer, Srinivasan Parthasarathy, Amol Ghoting

Quantifying Trends Accurately Despite Classifier Error and Class Imbalance
George Forman

Regularized Discriminant Analysis for high dimensional low sample size data
Jieping Ye, Tie Wang

ReverseTesting: An Efficient Framework to Select Amongst Classifiers under Sample Selection Bias
Wei Fan, Ian Davidson

Robust Information-theoretic Clustering
Christian Bohm, Christos Faloutsos, Jia-Yu Pan, Claudia Plant

Rule Interestingness Analysis Using OLAP Operations
Bing Liu, Kaidi Zhao, Jeffrey Benkler, Weimin Xiao

Simultaneous Record Detection and Attribute Labeling in Web Data Extraction
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma

Spatial Scan Statistics: Approximations and Performance Study
Deepak Agarwal, Andrew McGregor, Jeff Phillips, Suresh Venkatasubramanian, Zhengyuan Zhu

Supervised Probabilistic Principal Component Analysis
Shipeng Yu, Kai Yu, Volker Tresp, Hans-Peter Kriegel, Mingrui Wu

Tensor-CUR Decompositions For Tensor-Based Data
Michael Mahoney, Mauro Maggioni, Petros Drineas

Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends
Xuerui Wang, Andrew McCallum

Training Linear SVMs in Linear Time
Thorsten Joachims

Unsupervised Learning on K-partite Graphs
Bo Long, Xiaoyun Wu, Zhongfei Zhang, Philip Yu

Using Structure Indices For Efficient Approximation of Network Properties
Matthew Rattigan, Marc Maier, David Jensen

Very Sparse Random Projections
Ping Li, Trevor Hastie, Kenneth Church

Workload-Aware Anonymization
Kristen LeFevre, David DeWitt, Raghu Ramakrishnan


(alpha, k)-Anonymity: An Enhanced k-Anonymity Model for Privacy-Preserving Data Publishing
Raymond Chi-Wing Wong, Jiuyong Li, Ada Fu, Ke Wang

A Framework for Analysis of Dynamic Social Networks
Tanya Berger-Wolf, Jared Saia

A Large-Scale Analysis of Query Logs for Assessing Personalization Opportunities
Steve Wedig, Omid Madani

A Mixture Model for Contextual Text Mining
Qiaozhu Mei, ChengXiang Zhai

A New Multi-View Regression Method with an Application to Customer Wallet Estimation
Srujana Merugu, Saharon Rosset, Claudia Perlich

Algorithms for discovering bucket orders from data
Aris Gionis, Heikki Mannila, Kai Puolamaki, Antti Ukkonen

Algorithms for Storytelling
Deept Kumar, Naren Ramakrishnan, Richard Helm, Malcolm Potts

Algorithms for time series knowledge mining
Fabian Morchen

Attack Detection in Time Series for Recommendation Systems
Sheng Zhang, Amit Chakrabarti, James Ford, Fillia Makedon

Automatic Mining of Fruit Fly Embryo Images
Jia-Yu Pan, Andre Balan, Eric P. Xing, Agma Juci Machado Traina, Christos Faloutsos

Bias and Controversy: Beyond the Statistical Deviation
Hady W. Lauw, Ee-Peng Lim, Ke Wang

BLOSOM: A Framework for Mining Boolean Expressions
Lizhuang Zhao, Mohammed Zaki, Naren Ramakrishnan

CCCS: A Top-down Associative Classifier for Imbalanced Class Distribution
Bavani Arunasalam, Sanjay Chawla

CFI-Stream: Mining Closed Frequent Itemsets in Data Streams
Nan Jiang, Le Gruenwald

Classification Features for Attack Detection in Collaborative Recommender Systems
Robin Burke, Bamshad Mobasher, Chad Williams, Runa Bhaumik

Clustering Based Large Margin Classification: A Scalable Approach using SOCP Formulation
Saketha Nath, Chiranjib Bhattacharyya, M. Narasimha Murty

Clustering Pair-wise Dissimilarity Data into Partially Ordered Sets
Jinze Liu, Qi Zhang, Wei Wang, Mcmillan Leonard, Jan Prins

Coherent Closed Quasi-Clique Discovery from Large Dense Graph Databases
Zhiping Zeng, Jianyong Wang, Lizhu Zhou, George Karypis

Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents
Fabian Suchanek, Georgiana Ifrim, Gerhard Weikum

Cryptographically Private Support Vector Machines
Sven Laur, Helger Lipmaa, Taneli Mielikainen

Discovering Interesting Patterns Through User's Interactive Feedback
Dong Xin, Xuehua Shen, Qiaozhu Mei, Jiawei Han

Dynamic, Real-time Forecasting of Online Auctions via Functional Models
Wolfgang Jank, Galit Shmueli, Shanshan Wang

Efficient Kernel Feature Extraction for Massive Data Sets
Ivor W. Tsang, Andras Kocsor, James T. Kwok

Efficient Multidimensional Data Representations Based on Multiple Correspondence Analysis
Riadh Ben Messaoud, Omar Boussaid, Sabine Loudcher Rabaseda

Evolutionary Clustering
Deepayan Chakrabarti, Ravi Kumar, Andrew Tomkins

Identifying Bridging Rules Between Conceptual Clusters
Shichao Zhang, Feng Chen, Xindong Wu

Incremental Approximate Matrix Factorization for Speeding up Support Vector Machines
Gang Wu, Edward Y. Chang, Yen-Kuang Chen, Christoper Hughes

Integration of Semantic-based Bipartite Graph Representation and Mutual Refinement Strategy for Biomedical Literature Clustering
Illhoi Yoo, Xiaohua Hu, Il-Yeol Song

K-means Clustering versus Validation Measures: A Data Distribution Perspective
Hui Xiong, Junjie Wu, Jian Chen

Linear Prediction Models with Graph Regularization for Web-page Categorization
Tong Zhang, Alexandrin Popescul, Byron Dom

Mining for Misconfigured Machines in Grid Systems
Noam Palatin, Assaf Schuster, Ran Wolff

Mining long-term search history to improve search accuracy
Bin Tan, Xuehua Shen, ChengXiang Zhai

Mining Progressive Confident Rules
Minghua Zhang, Wynne Hsu, Mong Li Lee

Mining Relational Data through Correlation-based Multiple View Validation
Hongyu Guo, Herna L. Viktor

Model Compression: Making Big, Slow Models Practical
Cristian Bucila, Rich Caruana, Alexandru Niculescu-Mizil

MONIC - Modeling and Monitoring Cluster Transitions
Myra Spiliopoulou, Irene Ntoutsi, Yannis Theodoridis, Rene Schult

Naive filterbots for robust cold-start recommendation
Seung-Taek Park, David Pennock, Omid Madani, Nathan Good, Dennis DeCoste

On Privacy Preservation against Adversarial Data Mining
Charu Aggarwal, Jian Pei, Bo Zhang

Outlier Detection by Active Learning
Naoki Abe, John Langford, Bianca Zadrozny

Outlier Detection By Sampling With Accuracy Guarantees
Mingxi Wu, Christopher Jermaine

Polynomial Association Rules with Applications to Logistic Regression
Szymon Jaroszewicz

Query-Time Entity Resolution
Indrajit Bhattacharya, Lise Getoor, Louis Licamele

Recommendation Method for Extending Subscription Periods
Tomoharu Iwata, Kazumi Saito, Takeshi Yamada

Reducing the Human Overhead in Text Categorization
Arnd Christian Konig, Eric Brill

Sampling from Large Graphs
Jure Leskovec, Christos Faloutsos

Semi-Supervised Time Series Classification
Li Wei, Eamonn Keogh

Single-Pass Online Learning: Performance
Voting Schemes and Online Feature Selection, Vitor Carvalho, William Cohen

Statistical Entity-Topic Models
David Newman, Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers

Structure and Evolution of Online Social Networks
Ravi Kumar, Jasmine Novak, Andrew Tomkins

Summarizing Itemset Patterns Using Probabilistic Models
Chao Wang, Srinivasan Parthasarathy

Suppressing Model Overfitting in Mining Concept-Drifting Data Streams
Haixun Wang, Jian Yin, Jian Pei, Philip Yu, Jeffrey Xu Yu

Utility-Based Anonymization Using Local Recodings
Jian Xu, Wei Wang, Jian Pei, Xiaoyuan Wang, Baile Shi, Ada Fu

Visual Data Mining using Principled Projection Algorithms and Information Visualization Techniques
Dharmesh Maniyar, Ian Nabney

