Optimization of K-Means Algorithm for Big Data Clustering Using Computational Distribution Approach
DOI:
https://doi.org/10.35335/jict.v14i2.138Kata Kunci:
Data Clustering, K-Means Algorithm, Computational Distribution Approach, Efficiency, AccuracyAbstrak
In the growing digital era, big data clustering becomes a major challenge in data analysis, especially with the well-known K-Means Algorithm that has limitations in dealing with large-scale data. This study aims to optimize the K-Means Algorithm for big data clustering with a computational distribution approach, to improve clustering efficiency and accuracy. We use the computational distribution approach to process data in parallel across multiple computing nodes, optimize memory usage, develop an intelligent cluster center selection algorithm, and optimize communication between nodes. The implementation of this optimization method successfully improves the efficiency and accuracy of big data clustering, reduces execution time and memory consumption. The practical implications include better business decision making and more effective marketing strategies based on more precise customer data analysis.
Referensi
Chen, Z., Wang, C., Zhang, J., & He, S. (2019). A Distributed K-Means Clustering Algorithm Based on Spark. IEEE Access, 7, 101302-101310. DOI: 10.1109/ACCESS.2019.2937748
Dang, X., Ghanem, M. M., & Ye, X. (2015). A Scalable Distributed K-Means Algorithm for Big Data. IEEE Transactions on Parallel and Distributed Systems, 26(1), 51-61. DOI: 10.1109/TPDS.2014.2303327
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226-231. DOI: 10.1145/3001460.3001507
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A., ... & Muhammad, K. (2014). A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), 267-279. DOI: 10.1109/TETC.2014.2330512
Gholami, M., Karray, F., & Kamel, M. S. (2018). An Efficient K-Means Clustering Algorithm Using MapReduce for Big Data. IEEE Transactions on Parallel and Distributed Systems, 29(5), 1031-1043. DOI: 10.1109/TPDS.2017.2783343
Gupta, P., Gupta, P., & Jindal, A. (2020). A Comparative Study of K-Means and Hierarchical Clustering for Big Data Analysis. IEEE Access, 8, 37042-37053. DOI: 10.1109/ACCESS.2020.2979791
Huang, Z. (1998). Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery, 2(3), 283-304. DOI: 10.1023/A:1009769707641
Jain, A. K. (2010). Data Clustering: 50 Years Beyond K-Means. Pattern Recognition Letters, 31(8), 651-666. DOI: 10.1016/j.patrec.2009.09.011
MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14), 281-297. DOI: 10.1214/aoms/1177698950
Mahato, S., & Sahu, S. K. (2019). Distributed K-Means Clustering Algorithm for Big Data Using MapReduce. IEEE Access, 7, 103825-103837. DOI: 10.1109/ACCESS.2019.2938915
Park, H. S., & Jun, C. H. (2009). A Simple and Fast Algorithm for K-Means Clustering. Expert Systems with Applications, 36(3), 3336-3341. DOI: 10.1016/j.eswa.2008.02.023
Rodrigues, F., Pereira, B., & Pinto, F. (2017). Scalable and Efficient Clustering for Big Data Analytics. IEEE Transactions on Big Data, 3(3), 278-290. DOI: 10.1109/TBDATA.2016.2594112
Sculley, D. (2010). Web-Scale K-Means Clustering. Proceedings of the 19th International Conference on World Wide Web, 1177-1178. DOI: 10.1145/1772690.1772862
Shekhar, S., & Singh, A. (2021). Distributed Computing for Big Data Analytics: A Comprehensive Review. IEEE Transactions on Big Data, 7(2), 484-503. DOI: 10.1109/TBDATA.2020.3011995
Shinde, G. M., & Patil, M. S. (2018). A Comparative Study of K-Means and Hierarchical Clustering Techniques for Big Data. 2018 International Conference on Computing, Power and Communication Technologies (GUCON), 296-301. DOI: 10.1109/GUCON.2018.8556931
Steinbach, M., Karypis, G., & Kumar, V. (2000). A Comparison of Document Clustering Techniques. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 525-526.


Jurnal ICT : Information and Communication Technologies is licensed under a