Optimization of K-Means Algorithm for Big Data Clustering Using Computational Distribution Approach

Authors

  • Deassy Ratna Juwita Sari Universitas Galuh, Ciamis, Indonesia
  • Nana Yudi Permana Universitas Galuh, Ciamis, Indonesia

DOI:

https://doi.org/10.35335/jict.v14i2.138

Keywords:

Data Clustering, K-Means Algorithm, Computational Distribution Approach, Efficiency, Accuracy

Abstract

In the growing digital era, big data clustering becomes a major challenge in data analysis, especially with the well-known K-Means Algorithm that has limitations in dealing with large-scale data. This study aims to optimize the K-Means Algorithm for big data clustering with a computational distribution approach, to improve clustering efficiency and accuracy. We use the computational distribution approach to process data in parallel across multiple computing nodes, optimize memory usage, develop an intelligent cluster center selection algorithm, and optimize communication between nodes. The implementation of this optimization method successfully improves the efficiency and accuracy of big data clustering, reduces execution time and memory consumption. The practical implications include better business decision making and more effective marketing strategies based on more precise customer data analysis.

References

Chen, Z., Wang, C., Zhang, J., & He, S. (2019). A Distributed K-Means Clustering Algorithm Based on Spark. IEEE Access, 7, 101302-101310. DOI: 10.1109/ACCESS.2019.2937748

Dang, X., Ghanem, M. M., & Ye, X. (2015). A Scalable Distributed K-Means Algorithm for Big Data. IEEE Transactions on Parallel and Distributed Systems, 26(1), 51-61. DOI: 10.1109/TPDS.2014.2303327

Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226-231. DOI: 10.1145/3001460.3001507

Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A., ... & Muhammad, K. (2014). A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), 267-279. DOI: 10.1109/TETC.2014.2330512

Gholami, M., Karray, F., & Kamel, M. S. (2018). An Efficient K-Means Clustering Algorithm Using MapReduce for Big Data. IEEE Transactions on Parallel and Distributed Systems, 29(5), 1031-1043. DOI: 10.1109/TPDS.2017.2783343

Gupta, P., Gupta, P., & Jindal, A. (2020). A Comparative Study of K-Means and Hierarchical Clustering for Big Data Analysis. IEEE Access, 8, 37042-37053. DOI: 10.1109/ACCESS.2020.2979791

Huang, Z. (1998). Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery, 2(3), 283-304. DOI: 10.1023/A:1009769707641

Jain, A. K. (2010). Data Clustering: 50 Years Beyond K-Means. Pattern Recognition Letters, 31(8), 651-666. DOI: 10.1016/j.patrec.2009.09.011

MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14), 281-297. DOI: 10.1214/aoms/1177698950

Mahato, S., & Sahu, S. K. (2019). Distributed K-Means Clustering Algorithm for Big Data Using MapReduce. IEEE Access, 7, 103825-103837. DOI: 10.1109/ACCESS.2019.2938915

Park, H. S., & Jun, C. H. (2009). A Simple and Fast Algorithm for K-Means Clustering. Expert Systems with Applications, 36(3), 3336-3341. DOI: 10.1016/j.eswa.2008.02.023

Rodrigues, F., Pereira, B., & Pinto, F. (2017). Scalable and Efficient Clustering for Big Data Analytics. IEEE Transactions on Big Data, 3(3), 278-290. DOI: 10.1109/TBDATA.2016.2594112

Sculley, D. (2010). Web-Scale K-Means Clustering. Proceedings of the 19th International Conference on World Wide Web, 1177-1178. DOI: 10.1145/1772690.1772862

Shekhar, S., & Singh, A. (2021). Distributed Computing for Big Data Analytics: A Comprehensive Review. IEEE Transactions on Big Data, 7(2), 484-503. DOI: 10.1109/TBDATA.2020.3011995

Shinde, G. M., & Patil, M. S. (2018). A Comparative Study of K-Means and Hierarchical Clustering Techniques for Big Data. 2018 International Conference on Computing, Power and Communication Technologies (GUCON), 296-301. DOI: 10.1109/GUCON.2018.8556931

Steinbach, M., Karypis, G., & Kumar, V. (2000). A Comparison of Document Clustering Techniques. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 525-526.

Downloads

Published

2023-10-31

How to Cite

Sari, D. R. J., & Permana, N. Y. . (2023). Optimization of K-Means Algorithm for Big Data Clustering Using Computational Distribution Approach . Jurnal ICT : Information and Communication Technologies, 14(2), 49–53. https://doi.org/10.35335/jict.v14i2.138