浏览全部资源
扫码关注微信
1.人工智能与数字经济广东省实验室(深圳),广东 深圳 518107
2.深圳大学计算机与软件学院,广东 深圳 518060
[ "何玉林(1982- ),男,博士,人工智能与数字经济广东省实验室(深圳)研究员、高级工程师。主要研究方向为数据挖掘与机器学习算法、大数据处理与分析方法、大数据近似计算技术、多样本统计分析理论与方法。" ]
[ "黄哲学(1959- ),男,博士,人工智能与数字经济广东省实验室(深圳)教授、博士生导师。主要研究方向为数据挖掘、机器学习、大数据处理与分析、大数据系统计算技术。" ]
[ "尹剑飞(1974- ),男,博士,深圳大学计算机与软件学院副教授。主要研究方向为数据挖掘、机器学习、多目标优化、人工智能。" ]
收稿日期:2024-08-05,
修回日期:2024-10-30,
纸质出版日期:2024-12-15
移动端阅览
何玉林,杨锦,黄哲学等.基于标签迭代的聚类集成算法[J].智能科学与技术学报,2024,06(04):466-479.
HE Yulin,YANG Jin,HUANG Zhexue,et al.Label iteration-based clustering ensemble algorithm[J].Chinese Journal of Intelligent Science and Technology,2024,06(04):466-479.
何玉林,杨锦,黄哲学等.基于标签迭代的聚类集成算法[J].智能科学与技术学报,2024,06(04):466-479. DOI: 10.11959/j.issn.2096-6652.202443.
HE Yulin,YANG Jin,HUANG Zhexue,et al.Label iteration-based clustering ensemble algorithm[J].Chinese Journal of Intelligent Science and Technology,2024,06(04):466-479. DOI: 10.11959/j.issn.2096-6652.202443.
现有的“数据相同,算法不同”式的聚类集成算法训练策略普遍存在处理大规模数据性能受限以及共识函数适应性不强的缺点。为此,对“数据不同,算法相同”式的聚类集成算法训练策略进行了研究,构建了一种基于标签迭代的聚类集成(LICE)算法。首先,该算法在原始数据集的随机样本划分(RSP)数据块上训练若干基聚类器。接着,利用最大平均差异准则对聚类簇数相同的基聚类结果进行融合,并基于标签确定的RSP数据块训练一个启发式分类器。之后,迭代式地利用启发式分类器对标签不确定的RSP数据块中的样本点进行标签预测,利用分类标签与聚类标签一致的样本点强化启发式分类器的性能。最后,通过一系列可信的实验对LICE算法的可行性和有效性进行验证,结果显示在代表性数据集上,LICE算法对应的标准互信息、调整兰德系数、Fowlkes-Mallows指数以及纯度在第5次迭代时相比于迭代起始分别平均提升了17.23%、16.75%、31.29%和12.37%。与7种经典的聚类集成算法相比,在选用的数据集上,这4个指标的值分别平均提升了11.76%、16.50%、9.36%和14.20%。实验证实了LICE算法是一种高效合理的、能够处理大数据聚类问题的聚类集成算法。
The existing training strategies for clustering ensemble algorithm are generally conducted based on the same data and different base clustering algorithms and commonly have the limitations of low performance for large-scale data and weak adaptability of consensus function. To address these problems
this paper proposed a label iteration-based clustering ensemble (LICE) algorithm which was developed based on the training strategy for clustering ensemble algorithm of different data and same base clustering algorithm. Firstly
multiple base clusterings were trained based on the random sample partition (RSP) data blocks. Secondly
the base clustering results with same cluster numbers were fused with maximum mean discrepancy criterion and then a heuristic classifier was trained based on the RSP data blocks with labels. Thirdly
the sample points without labels were labeled with heuristic classifier which was iteratively enhanced with the labeled sample points having the consistent labeling for clustering and classification. Finally
a series of persuasive experiments were conducted to validate the feasibility and effectiveness of LICE algorithm. The experimental results showed that the normalized mutual information
adjusted Rand index
Fowlkes-Mallows index and purity of LICE algorithm increased by 17.23%
16.75%
31.29%
and 12.37% on average at the 5th iteration compared to the initial iteration and these four indexes increased by 11.76%
16.50%
9.36%
and 14.20% on average for the representative datasets in comparison with seven state-of-the-art clustering ensemble algorithms and thus demonstrate that LICE algorithm is an efficient and reasonable clustering ensemble algorithm with the potential to handle large-scale data clustering problems.
LIU H Q, ZHANG Q, ZHAO F. Interval fuzzy spectral clustering ensemble algorithm for color image segmentation[J]. Journal of Intelligent and Fuzzy Systems, 2018, 35(5): 5467-5476.
柯善军, 聂成洋, 王钰苗, 等. 基于集成学习与聚类联合标注的多模态个体情绪识别[J]. 智能科学与技术学报, 2024, 6(1): 76-87.
KE S J, NIE C Y, WANG Y M, et al. Multimodal individual emotion recognition with joint labeling based on integrated learning and clustering[J]. Chinese Journal of Intelligent Science and Technology, 2024, 6(1): 76-87.
STOLZ T, HUERTAS M E, MENDOZA A. Assessment of air quality monitoring networks using an ensemble clustering method in the three major metropolitan areas of Mexico[J]. Atmospheric Pollution Research, 2020, 11(8): 1271-1280.
何玉林, 黄哲学. 大规模数据集聚类算法的研究进展[J]. 深圳大学学报(理工版), 2019, 36(1): 4-17.
HE Y L, HUANG Z X. A review on clustering algorithms for large-scale data sets[J]. Journal of Shenzhen University (Science and Engineering), 2019, 36(1): 4-17.
SALLOUM S, HUANG J Z, HE Y L. Random sample partition: a distributed data model for big data analysis[J]. IEEE Transactions on Industrial Informatics, 2019, 15(11): 5846-5854.
GRETTON A, BORGWARDT K M, RASCH M J, et al. A kernel two-sample test[J]. The Journal of Machine Learning Research, 2012, 13(1): 723-773.
HUANG D, WANG C D, LAI J H. Locally weighted ensemble clustering[J]. IEEE Transactions on Cybernetics, 2018, 48(5): 1460-1473.
FRED A L N, JAIN A K. Combining multiple clusterings using evidence accumulation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850.
FERN X Z, BRODLEY C E. Solving cluster ensemble problems by bipartite graph partitioning[C]//Proceedings of the Twenty-First International Conference on Machine Learning. New York: ACM Press, 2004: 36.
STREHL A, GHOSH J. Cluster ensembles - a knowledge reuse framework for combining partitionings[J]. Journal of Machine Learning Research, 2002, 3: 583-617.
JIA Y, TAO S, WANG R, et al. Ensemble clustering via co-association matrix self-enhancement[J ] . IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(8): 11168-11179.
JAIN A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters, 2010, 31(8): 651-666.
BAI L, LIANG J Y, CAO F Y. A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters[J]. Information Fusion, 2020, 61: 36-47.
MINAEI-BIDGOLI B, TOPCHY A, PUNCH W F. Ensembles of partitions via data resampling[C]//Proceedings of the International Conference on Information Technology: Coding and Computing. Piscataway: IEEE Press, 2004: 188-192.
FISCHER B, BUHMANN J M. Path-based clustering for grouping of smooth curves and texture segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(4): 513-518.
TOPCHY A, JAIN A K, PUNCH W. Combining multiple weak clusterings[C]//Proceedings of the Third IEEE International Conference on Data Mining. Piscataway: IEEE Press, 2003: 331-338.
FERN X Z, BRODLEY C E. Random projection for high dimensional data clustering: a cluster ensemble approach[C]// Proceedings of the Twentieth International Conference on International Conference on Machine Learning.New York: ACM Press, 2003: 186-193.
ILC N. Weighted cluster ensemble based on partition relevance analysis with reduction step[J]. IEEE Access, 2020, 8: 113720-113736.
BAGHERINIA A, MINAEI-BIDGOLI B, HOSSEINZADEH M, et al. Reliability-based fuzzy clustering ensemble[J]. Fuzzy Sets and Systems, 2021, 413: 1-28.
LIANG Y N, REN Z G, WU Z Z, et al. Scalable spectral ensemble clustering via building representative co-association matrix[J]. Neurocomputing, 2020, 390: 158-167.
SHI Y F, YU Z W, CHEN C L P, et al. Consensus clustering with co-association matrix optimization[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(3): 4192-4205.
BIAN Z K, QU J, ZHOU J, et al. Weighted adaptively ensemble clustering method based on fuzzy co-association matrix[J]. Information Fusion, 2024, 103: 102099.
ZHOU P, WANG X, DU L, et al. Clustering ensemble via structured hypergraph learning[J]. Information Fusion, 2022, 78: 171-179.
DUDOIT S, FRIDLYAND J. Bagging to improve the accuracy of a clustering procedure[J]. Bioinformatics, 2003, 19(9): 1090-1099.
KUHN H W. The Hungarian method for the assignment problem[J]. Naval Research Logistics Quarterly, 1955, 2(1/2): 83-97.
ZHOU Z H, TANG W. Clusterer ensemble[J]. Knowledge-Based Systems, 2006, 19(1): 77-83.
KHEDAIRIA S, KHADIR M T. A multiple clustering combination approach based on iterative voting process[J]. Journal of King Saud University - Computer and Information Sciences, 2022, 34(1): 1370-1380.
CRISTOFOR D, SIMOVICI D. Finding median partitions using information-theoretical-based genetic algorithms[J]. Journal of Universal Computer Science, 2002, 8(2): 153-172.
LI T Y, SHU X Y, WU J, et al. Adaptive weighted ensemble clustering via kernel learning and local information preservation[J]. Knowledge-Based Systems, 2024, 294: 111793.
ZHANG H S, WANG Y, CHEN Y P, et al. Consistency-oriented clustering ensemble via data reconstruction[J]. Applied Intelligence, 2024, 54(20): 9641-9654.
周志华. 集成学习: 基础与算法[M]. 北京: 电子工业出版社, 2020: 14-16.
ZHOU Z H. Ensemble learning[M]. BeiJing: Publishing House of Electronics Industry, 2020: 14-16.
0
浏览量
7
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构