An Empirical Comparison of Community Detection Techniques for Amazon Dataset
Lecture Notes on Data Engineering and Communications Technologies
Detecting clusters or communities in large graphs from the real world, such as the Amazon dataset, information networks, and social networks, is of considerable interest. Extracting sets of nodes connected to the goal function and “appearing” to be appropriate communities for the application of interest requires approximation methods or heuristics. Several network community identification approaches are analyzed and compared to determine their relative performance in this research. We investigate a variety of well-known performance metrics used to formalize the idea of a good community and several approximation strategies intended to optimize these objective functions. Most widely used community detection algorithms include: Louvain, Girvan-Newman (GNM), Label Propagation (LPA), and Clauset Newman (CNM). Researchers proved that louvain gives the best overall performance in terms of modularity as well as F1-Score. This work investigates a dynamic, publicly accessible Amazon item dataset, Amazon co-purchase network dataset. In this work, four community detection algorithms are incorporated to Amazon dataset and evaluated for metrics: Modularity F1-score. GNM has the advantage of giving the best modularity but it’s not an efficient technique for large datasets as its complexity lies in the range of O(m2n). All other algorithms have nearly the same range of modularity but Louvain has the best performance in terms of F1-score.
Open Access Status
This publication is not available as open access