A Novel Hybrid Re-Sampling Algorithm and Its Application in Predicting Rockburst
-
摘要: 针对岩爆现象发生的不均衡及发生机理受多因素影响的问题,在分析重取样技术的基础上,设计并实现了自适应选择近邻的混合重取样算法,并将其用于岩爆危险性预测.该方法结合过取样和欠取样方法的优势,改进了SMOTE过取样算法在产生合成样本过程中存在的盲目性及只能复制生成数值属性的问题,新算法能根据实例样本集内部分布的真实特性,自适应调整近邻选择策略,对不同属性的数据采取不同的复制方法生成新的少数类实例,控制和提高合成样本的质量;并通过对合成之后的数据集,用改进的邻域清理方法进行适当程度欠取样,去掉多数类中的冗余实例和边界上的噪音数据,减少其规模,在一定程度上达到相对均衡,从而,可有效地处理非均衡数据分类问题,提高分类器的性能.该算法在VCR采场岩爆实例上进行实验,预测的结果与实际情况完全一致,表明在工程实例岩爆危险性实例数据非均衡情况下实施混合重取样方案是可行的,预测准确率高,具有良好的工程应用前景.采用该方法可找到岩爆发生的主控因素,为深部开采工程的合理设计与安全施工提供科学依据.Abstract: Because of poor understanding about the mechanism of rockbust and about the effect factors, the statistic data of large amounts of rockburst are typical imbalanced data sets (IDS). On the basis of analyzing re-sampling technology, a novel hybrid re-sampling technique based on Automated Adaptive Selection of the Number of Nearest Neighbors (ADSNN-Hybrid RS) is proposed and applied to study the prediction of rockburst. This method takes advantage of both technology of improved Synthetic Minority Over-sampling Technique (SMOTE) method and Neighborhood Cleaning Rule (NCR) data cleaning method. In the procedure of over-sampling with the SMOTE method, blindfold new synthetic minority class examples by randomly interpolating pairs of closest neighbors were added into the minority class; and data sets with nominal features can not be dealt with. These two problems were solved by the automated adaptive selection of nearest neighbors and adjusting the neighbor selective strategy. As a consequence, the quality of the new samples can be well controlled. In the procedure of under-sampling, by using the improved under-sampling technique of neighborhood cleaning rule, borderline majority class examples and the noisy or redundant data were removed. The main motivation behind these methods is not only to balance the training data, but also to remove noisy examples lying on the wrong side of the decision border. The removal of noisy examples might aid in finding better-defined class clusters, therefore, allow the creation of simpler models with better generalization capabilities. In turn, it promises effective processing of IDS and a considerably enhanced classifier performance. The VCR rockburst data sets were employed as a sample IDS for classification and prediction. By adding extra artificial minority class samples as the expanded training set, experiment was conducted, which yields exactly consistent prediction results with the actual situation. The ADSNN-Hybrid RS and classification scheme we developed is feasible and reasonable for applications of IDS from engineering. Thus this method can be readily implemented to determine the controlling factors of engineering. Such a prediction can provide reasonable and sufficient guidance to design a safe construction scheme in deep mining engineering.
-
Key words:
- rockburst /
- disasters /
- imbalanced dataset /
- prediction /
- SMOTE /
- under-sampling
-
表 1 分类结果
Table 1. Classification results
=== Detailed Accuracy By Class === TP rate FP rate Precision Recall F-measure Class 1 0 1 1 1 发生岩爆 1 0 1 1 1 不发生岩爆 === Confusion matrix === a b<--classified as 3 0|a=发生岩爆 0 2|b=不发生岩爆 表 2 VCR采场岩爆预测结果
Table 2. Rockburst prediction results at VCR mining stope
样本编号 特征矢量输入 预测输出 实际情况 100 10010100010100100000000010010001 01 不发生岩爆 101 10010100100100100000000010001010 01 不发生岩爆 102 01001010100100100010000000010001 10 发生岩爆 103 10010100100100001000010000010100 10 发生岩爆 104 01010001100100100000000001010010 10 发生岩爆 -
Chawla, N.V., Bowyer, K.W., Hall, L.O., et al., 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(3): 321-357. Chawla, N.V., Lazarevic, A., Hall, L.O., et al., 2003. SMOTEboost: improving prediction of the minority class in boosting. Lecture Notes in Computer Science, 2838: 107-119. doi. 10.1007/b13634 http://nd.edu/~dial/papers/ECML03.pdf Chen, H.J., Li, N.H., Nie, D.X., et al., 2002. A model for prediction of rockburst by artificial neural network. Chinese Journal of Geotechnical Engineering, 24(2): 229-232 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-YTGC200202023.htm Estabrooks, A., 2000. A combination scheme for inductive learning from imbalanced data sets. Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada. http://www.researchgate.net/publication/244448006_A_Combination_Scheme_for_Inductive_Learning_from_Imbalanced_Data_Sets Feng, X.T., 2000. Introduction to intelligent rock mechanics. Science Press, Beijing (in Chinese). Ge, Q.F., Feng, X.T., 2008. Classification and prediction of rockburst using AdaBoost combination learning method. Rock and Soil Mechanics, 29(4): 943-948 (in Chinese with English abstract). http://www.researchgate.net/publication/292298061_Classification_and_prediction_of_rockburst_using_AdaBoost_combination_learning_method Han, H., Wang, W.Y., Mao, B.H., 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Lecture Notes in Computer Science, 3644(1): 878-887. doi. 10.1007/11538059_91 http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=09AE1A9AE1C42CFB8FAD9D86B2547B3B?doi=10.1.1.308.9315&rep=rep1&type=pdf Hart, P., 1968. The condensed nearest neighbor rule(Corresp. ). IEEE Transactions on Information Theory, 14(3): 515-516. doi: 10.1109/TIT.1968.1054155 Jiang, T., Huang, Z.Q., Zhao, Y.Y., et al., 2003. Application of grey system optimal theory model in forecasting rockburst. Journal of North China Institute of Water Conservancy and Hydroelectric Power, 24(2): 37-40 (in Chinese with English abstract). http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=hbslsdxyxb200302012 Kubat, M., Matwin, S., 1997. Addressing the curse of imbalanced training sets: one-sided selection. Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann Publishers, Inc., 179-186. http://ci.nii.ac.jp/naid/10012743635 Laurikkala, J., 2001. Improving identification of difficult small classes by balancing class distribution. Lecture Notes in Computer Science, 2101: 63-66. doi. 10.1007/3-540-48229-6_9 http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=0E369DACF967C4C1A9853A603C6AB1E7?doi=10.1.1.309.2519&rep=rep1&type=pdf Quinlan, J.R., 1993. C4.5: programs for machine learning. Morgan Kaufmann. doi. 10.1007/BF00993309 Stanfill, C., Waltz, D., 1986. Toward memory-based reasoning. Communications of the ACM, 29(12): 1213-1228. doi: 10.1145/7902.7906 Tomek, I., 1976. Two modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics, 6(6): 769-772. http://ieeexplore.ieee.org/document/4309452/references Yang, Y.C., Zhu, J., 2001. An matter-elements model and its application to classified prediction of rockburst. Systems Engineering—Theory & Practice, 21(8): 125-129 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-XTLL200108023.htm Yang, Z.M., Qiao, L.Y., Peng, X.Y., 2007. Research on detamining method for imbalanced dataset based on improved SMOTE. Acta Electronica Sinica, 35(12A): 22-26 (in Chinese with English abstract). http://www.cqvip.com/QK/71135X/201107/27251556.html Zhao, H.B., 2005. Classification of rockburst using support vector machine. Rock and Soil Mechanics, 26(4): 642-644(in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-YTLX200504029.htm 陈海军, 郦能惠, 聂德新, 等. 2002. 岩爆预测的人工神经网络模型. 岩土工程学报, 24(2): 229-232. doi: 10.3321/j.issn:1000-4548.2002.02.023 冯夏庭. 2000. 智能岩石力学导论. 北京: 科学出版社. https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX902.024.htm 葛启发, 冯夏庭. 2008. 基于AdaBoost组合学习方法的岩爆分类预测研究. 岩土力学, 29(4): 943-948. doi: 10.3969/j.issn.1000-7598.2008.04.017 姜彤, 黄志全, 赵彦彦, 等. 2003. 灰色系统最优归类模型在岩爆预测中的应用. 华北水利水电学院学报, 24(2): 37-40. https://www.cnki.com.cn/Article/CJFDTOTAL-HBSL200302012.htm 杨莹春, 诸静. 2001. 物元模型及其在岩爆分级预报中的应用. 系统工程理论与实践, 21(8): 125-129. https://www.cnki.com.cn/Article/CJFDTOTAL-XTLL200108023.htm 杨智明, 乔立岩, 彭喜元. 2007. 基于改进SMOTE的不平衡数据挖掘方法研究. 电子学报, 35(12A): 22-26. 赵洪波. 2005. 岩爆分类的支持向量机方法. 岩土力学, 26(4): 642-644. https://www.cnki.com.cn/Article/CJFDTOTAL-YTLX200504029.htm