Abstract:
Finding frequent patterns is the cornerstone of many classical data mining tasks such as
associations, correlations, sequences, episodes, classification, and clustering. It is also
essential to a wide range of emerging applications, like Web-log mining, click-stream
mining, network traffic analysis, stock market analysis, and sensor networks. Majority of
existing frequent pattern mining approaches require many input parameters to be set by
the users. The most widely-used parameter is the minimum support threshold to extract
statistically significant patterns and to prune out the insignificant patterns. The selection
of minimum support is somewhat arbitrary and there is no mechanism to ensure that this
may not inadvertently remove many of the interesting patterns. Calculating support
counts for the candidate itemsets consumes most of the execution time of exiting
techniques. A frequent pattern mining method without minimum support threshold avoids
costly candidate-generation-and-test techniques completely to give major gains in terms
of performance and efficiency. This study presents a novel method to discover maximal
frequent itemsets using a single database pass. Initially all 2-itemsets are generated and
then association ratio among them is computed. Then an association ratio graph is
constructed to facilitate the maximal frequent itemsets generation. Efficient algorithms
are described for using this compact graph data structures to discover top-most and top-k
maximal frequent itemsets without user specified minimum support threshold. This
method employs the breadth-first search approach to construct the all-path-source-to-
destination tree and then finds the top few maximal frequent itemsets by traversing all
source to destination paths. Results are presented demonstrating the performance
advantages to be gained from the use of this approach.
The frequent pattern mining framework can also be applied to solve other interesting data
mining applications. The task of semantic image retrieval can be turned into a frequent
pattern problem by representing image data as association ratio graph. Employing this
graph-based structure, an effective semantic image retrieval architecture system is
proposed for mining multimedia data efficiently.