Home
Scholarly Works
A distributed frequent itemset mining algorithm...
Journal article

A distributed frequent itemset mining algorithm using Spark for Big Data analytics

Abstract

Frequent itemset mining is an essential step in the process of association rule mining. Conventional approaches for mining frequent itemsets in big data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm (DFIMA) which can significantly reduce the amount of candidate itemsets by applying a matrix-based pruning approach. The proposed algorithm has been implemented using Spark to further improve the efficiency of iterative computation. Numeric experiment results using standard benchmark datasets by comparing the proposed algorithm with the existing algorithm, parallel FP-growth, show that DFIMA has better efficiency and scalability. In addition, a case study has been carried out to validate the feasibility of DFIMA.

Authors

Zhang F; Liu M; Gui F; Shen W; Shami A; Ma Y

Journal

Cluster Computing, Vol. 18, No. 4, pp. 1493–1501

Publisher

Springer Nature

Publication Date

December 1, 2015

DOI

10.1007/s10586-015-0477-1

ISSN

1386-7857

Contact the Experts team