• 中国出版政府奖提名奖

    中国百强科技报刊

    湖北出版政府奖

    中国高校百佳科技期刊

    中国最美期刊

    留言板

    尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

    姓名
    邮箱
    手机号码
    标题
    留言内容
    验证码

    基于贝叶斯集成学习算法的土体先期固结压力预测模型

    李超 汪磊 陈洋 李天义

    李超, 汪磊, 陈洋, 李天义, 2023. 基于贝叶斯集成学习算法的土体先期固结压力预测模型. 地球科学, 48(5): 1780-1792. doi: 10.3799/dqkx.2022.450
    引用本文: 李超, 汪磊, 陈洋, 李天义, 2023. 基于贝叶斯集成学习算法的土体先期固结压力预测模型. 地球科学, 48(5): 1780-1792. doi: 10.3799/dqkx.2022.450
    Li Chao, Wang Lei, Chen Yang, Li Tianyi, 2023. Prediction Model of Soils' Preconsolidation Pressure Based on Bayesian Ensemble Learning Algorithm. Earth Science, 48(5): 1780-1792. doi: 10.3799/dqkx.2022.450
    Citation: Li Chao, Wang Lei, Chen Yang, Li Tianyi, 2023. Prediction Model of Soils' Preconsolidation Pressure Based on Bayesian Ensemble Learning Algorithm. Earth Science, 48(5): 1780-1792. doi: 10.3799/dqkx.2022.450

    基于贝叶斯集成学习算法的土体先期固结压力预测模型

    doi: 10.3799/dqkx.2022.450
    基金项目: 

    国家自然科学基金项目 12172211

    国家重点研发计划项目 2019YFC1509800

    详细信息
      作者简介:

      李超(1999-),男,硕士,主要从事膨胀土方面的研究.ORCID:0000-0002-3597-0301. E-mail:M400121101@sues.edu.cn

      通讯作者:

      汪磊, ORCID:0000-0001-9423-7866. E-mail:wanglei_sjtu@sjtu.edu.cn

    • 中图分类号: P64

    Prediction Model of Soils' Preconsolidation Pressure Based on Bayesian Ensemble Learning Algorithm

    • 摘要: 准确评估土体的先期固结压力(PS)是岩土工程实践中的一个重要问题.采用集成学习算法(XGBoost、RF)来捕捉各个土体参数之间的关系,建立先期固结压力预测模型.使用贝叶斯优化方法来确定模型的最优参数,并通过与SVR、KNN和MLP三种非集成算法进行对比,统计分析了不同模型在相关系数R2、均方根误差RMSE和绝对平均误差MAPE三种误差指标下的表现;最后在5折交叉验证下,评估各个模型的预测精度及泛化性.结果表明基于XGBoost的预测精度最高,其RMSE及MAPE分别为20.80 kPa和18.29%;其次是RF,分别为24.532 kPa和19.15%.同时在PS作为回归变量的情况下,其特征重要性为:USS > VES > w > LL > PL.因此,在小规模数据集情况下,集成学习算法在预测精度及泛化性上要优于其他算法,且可作为岩土参数敏感性分析的有效方法.

       

    • 图  1  XGBoost及RF算法流程

      Fig.  1.  Flow chart of XGBoost and RF

      图  2  5折交叉验证示意

      Fig.  2.  5-fold cross validation diagram

      图  3  参数频率分布直方图

      Fig.  3.  Historgrams of the frequency distribution of feature and label

      图  4  预测模型回归值与真实值比较

      Fig.  4.  Comparison between regression values and true values of prediction model

      图  5  超参数优化前后模型得分

      Fig.  5.  Hyperparameter optimization model score

      图  6  预测值误差分布

      Fig.  6.  Error distribution of predicted value

      图  7  优化前后测试集回归表现

      Fig.  7.  Regression performance before and after hyperparameter optimization

      图  8  特征重要性分析

      Fig.  8.  The features relative importance ranking

      图  9  五折交叉折验证模型表现

      Fig.  9.  5-fold cross-fold validation model performances

      表  1  贝叶斯优化算法的伪代码

      Table  1.   Pseudocode of Bayesian optimization algorithm

      1: for t =1,2…, do:
      2: 最大化收益函数,得到下一个评估点: $ {x}_{t}=\mathrm{a}\mathrm{r}\mathrm{g}\mathrm{m}\mathrm{a}{\mathrm{x}}_{x\in X}f\left(x\right) $;
      3: 计算目标函数值: $ {y}_{t}=f\left({x}_{t}\right)+{\epsilon }_{t} $;
      4: 更新数据集:$ {D}_{1:t}=\left\{{D}_{1:t-1},\left({x}_{t};{y}_{t}\right)\right\} $,并且更新概率代理模型;
      5: end for
      下载: 导出CSV

      表  2  变量基本信息

      Table  2.   Basic statistics of the five feature variables and label

      参数 最小值 最大值 平均值 中位数 标准差 峰度 偏度
      LL(%) 22 201.8 68.37 68.75 23.83 4.01 1.17
      PL(%) 2.7 73.9 28.49 27 7.69 5.77 1.11
      w(%) 17.3 180.1 76.47 75 23.29 1.4 0.52
      VES(kPa) 6.9 212.9 48.72 43.05 27.29 5.4 1.7
      PS(kPa) 15.2 315.6 79.82 64.9 48.48 5.09 1.92
      USS(kPa) 5 75 19.2 16.85 10.03 2.68 1.37
      下载: 导出CSV

      表  3  各个变量斯皮尔曼相关系数

      Table  3.   Spearman correlation coefficients of five feature variables and label

      LL PL w VES PS USS
      LL 1
      PL 0.66 1
      w 0.843 0.629 1
      VES ‒0.338 ‒0.305 ‒0.447 1
      PS ‒0.308 ‒0.207 ‒0.461 0.708 1
      USS ‒0.105 ‒0.101 0.268 0.529 0.747 1
      下载: 导出CSV

      表  4  集成算法模型最优超参数

      Table  4.   Optimal hyperparameter of ensemble algorithm model

      XGBoost超参数 最佳参数 RF超参数 最佳参数
      n_estimators 550 n_estimators 1 700
      learning_rate 0.14 max_feature auto
      max_depth 3 max_depth 9
      min_split_gain 0.101 min_sample_leaf 1
      min_split_weight 3.414 min_sample_split 2
      lambda 2.651
      alpha 4.522
      下载: 导出CSV

      表  5  非集成算法模型最优超参数

      Table  5.   Optimal hyperparameter of machine learning algorithm model

      SVR超参数 最佳参数 KNN超参数 最佳参数 MLP超参数 最佳参数
      kernel rbf n_neighbors 3 hidden_layer_sizes (50, 50, 50, 50, 50)
      C 2 000 p 1 max_iter 100
      gamma 0.001 weights uniform solver lbfgs
      下载: 导出CSV

      表  6  误差分析汇总表

      Table  6.   Summary of error analysis

      模型 R2 RMSE MAPE
      训练集 测试集 训练集 测试集 训练集 测试集
      XGBoost 0.945 0.782 11.552 20.8 12.224 18.295
      RF 0.959 0.696 9.896 24.532 8.370 19.154
      SVM 0.932 0.633 12.840 26.967 6.640 23.047
      KNN 0.876 0.687 17.387 24.911 14.893 20.905
      MLP 0.809 0.696 21.565 24.536 18.813 20.584
      式13 0.09 46.254 33.129
      式14 0.139 44.982 32.589
      下载: 导出CSV
    • Benesty, J., Chen, J. D., Huang, Y. T., 2008. On the Importance of the Pearson Correlation Coefficient in Noise Reduction. IEEE Transaction on Audio Speech Language Processing, 16: 757-765. https://doi.org/10.1109/TASL.2008.919072
      Bergstra, J., Yamins, D., Cox, D., 2013. Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms. Python in Science Conference, Texas, 13-19. https://doi.org/10.25080/Majora-8b375195-003
      Breiman, L., 1996. Bagging Predictors. Mach Learn, 24: 123-140. https://doi.org/10.1007/BF00058655
      Breiman, L., 2001. Random Forest. Mach Learn, 45(1): 5-32. https://doi.org/10.1023/A:1010933404324
      Casagrande, A., 1936. The Determination of Pre-Consolidation Load and Its Practical Significance. Proc. of First Lcmfe, (3): 60-64. http://www.researchgate.net/publication/309546294_The_determination_of_pre-consolidation_load_and_its_practical_significance
      Chang, L. Y., Wang, J. C., Zhu, X. R., 2009. Nonparametric Fitting Model for Determining Soil Preconsolidation Pressure. Rock and Soil Mechanics, 30(5): 1337-1342 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/ http://search.cnki.net/down/default.aspx?filename=YTLX200905028&dbcode=CJFD&year=2009&dflag=pdfdown
      Chen, T. Q., Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco, 785-794. https://doi.org/10.1145/2939672.2939785
      Cortes, C., Vapnik, V., 1995. Support-Vector Networks. Mach Learn, 20: 273-297. https://doi.org/10.1007/BF00994018
      D'Ignazio, M., Phoon, K. K., Tan, S. A., et al., 2016. Correlations for Undrained Shear Strength of Finnish Soft Clays. Candian Geotechnical Journal, 53: 1628-1645. https://doi.org/10.1139/cgj-2016-0037
      Dong, H. Y., Wang, Y. D., Li, L. H., 2021. A Review of Random Forest Optimization Algorithms. China Computer & Communication, 33(17): 34-37 (in Chinese with English abstract).
      Gardner, M. W., 1998. Artificial Neural Networks (the Multilayer Perceptron)—A Review of Applications in the Atmospheric Sciences. Atmos Environment, 32: 2627-2636. https://doi.org/10.1016/S1352-2310(97)00447-0
      Hastie, T., Friedman, J. H., Tibshirani, R., 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Math Inter, 27(2): 83-85. https://doi.org/10.1007/BF02985802
      Li, H., Zhang, Z. E., Zhao, Z. Z., 2019. Data-Mining for Processes in Chemistry, Materials, and Engineering. Processes, 7(3): 151. https://doi.org/10.3390/pr7030151
      Li, S., Chen, J., Liu, C., et al., 2021. Mineral Prospectivity Prediction via Convolutional Neural Networks Based on Geological Big Data. Journal of Earth Science, 32(2): 327-347. https://doi.org/10.1007/s12583-020-1365-z
      Li, W. B., Fan, X. M., Huang, F. M., et al., 2021. Uncertainties of Landslide Susceptibility Modeling under Different Environmental Factor Connections and Prediction Models. Earth Science, 46(10): 3777-3795 (in Chinese with English abstract). http://www.sciencedirect.com/science/article/pii/S0341816221001090
      Nascimento, D. S. C., Coelho, A. L. V., Canuto, A. M. P., 2014. Integrating Complementary Techniques for Promoting Diversity in Classifier Ensembles: A Systematic Study. Neurocomputing, 138: 347-357. https://doi.org/10.1016/j.neucom.2014.01.027
      Shah, M. I., Javed, M. F., Abunama, T., 2021. Proposed Formulation of Surface Water Quality and Modelling Using Gene Expression, Machine Learning, and Regression Techniques. Environmental Science and Pollution Research International, 28(11): 13202-13220. https://doi.org/10.1007/s11356-020-11490-9
      Sun, D. L., Wen, H. J., Wang, D. Z., et al., 2020. A Random Forest Model of Landslide Susceptibility Mapping Based on Hyperparameter Optimization Using Bayes Algorithm. Geomorphology, 362: 107201. https://doi.org/10.1016/j.geomorph.2020.107201
      Vyas, R., Goel, P., Tambe, S. S., 2015. Genetic Programming Applications in Chemical Sciences and Engineering. In: Gandomi, A. H., Alavi, A. H., Ryan, C., eds., Handbook of Genetic Programming Applications. Springer International Publishing, Cham, 99-140. https://doi.org/10.1007/978-3-319-20883-1_5
      Wong, T. T., 2015. Performance Evaluation of Classification Algorithms by K-Fold and Leave-One-out Cross Validation. Pattern Recognition, 48(9): 2839-2846. doi: 10.1016/j.patcog.2015.03.009
      Wu, R. Z., Hu, X. D., Mei, H. B., et al., 2021. Spatial Susceptibility Assessment of Landslides Based on Random Forest: A Case Study from Hubei Section in the Three Gorges Reservoir Area. Earth Science, 46(1): 321-330 (in Chinese with English abstract).
      Xia, Y. F., Liu, C. Z., Li, Y. Y., et al., 2017. A Boosted Decision Tree Approach Using Bayesian Hyper-Parameter Optimization for Credit Scoring. Expert System with Applications, 78: 225-241. doi: 10.1016/j.eswa.2017.02.017
      Zhang, M. L., Zhou, Z. H., 2007. ML-KNN: A Lazy Learning Approach to Multi-label Learning. Pattern Recognition, 40: 2038-2048. https://doi.org/10.1016/j.patcog.2006.12.019
      Zhang, W. G., Wu, C. Z., Zhong, H., et al., 2021. Prediction of Undrained Shear Strength Using Extreme Gradient Boosting and Random Forest Based on Bayesian Optimization. Geoscience Frontiers, 12: 469-477. https://doi.org/10.1016/j.gsf.2020.03.007
      Zhang, W. G., Zhang, R. H., Wu, C. Z., et al., 2022. Assessment of Basal Heave Stability for Braced Excavations in Anisotropic Clay Using Extreme Gradient Boosting and Random Forest Regression. Underground Space, 7: 233-241. https://doi.org/10.1016/j.undsp.2020.03.001
      Zhang, W. Z., 2014. Research on Consolidation Characteristics of Ultra Soft Soil (Dissertation). Tianjin University, Tianjin, 41-42 (in Chinese with English abstract).
      Zhou, J., 2014. Research on Preconsolidation Pressure of Soil (Dissertation). Wuhan University of Technology, Wuhan, 12-15 (in Chinese with English abstract).
      Zhou, J., Qiu, Y. G., Zhu, S. L., 2021. Estimation of the TBM Advance Rate under Hard Rock Conditions Using XGBoost and Bayesian Optimization. Underground Spaceace, 6: 206-515. https://doi.org/10.1016/j.undsp.2020.05.008
      Zhou, Z. H., 2016. Machine Learning. Tsinghua University Press, Beijing, 171-196 (in Chinese).
      常林越, 王金昌, 朱向荣, 2009. 确定土体前期固结压力的非参数化拟合模型. 岩土力学, 30(5): 1337-1342. https://www.cnki.com.cn/Article/CJFDTOTAL-YTLX200905028.htm
      董红瑶, 王弈丹, 李丽红, 2021. 随机森林优化算法综述. 信息与电脑, 33(17): 34-37. https://www.cnki.com.cn/Article/CJFDTOTAL-XXDL202117011.htm
      李文彬, 范宣梅, 黄发明, 等, 2021. 不同环境因子联接和预测模型的滑坡易发性建模不确定性. 地球科学, 46(10): 3777-3795. doi: 10.3799/dqkx.2021.042
      吴润泽, 胡旭东, 梅红波, 等, 2021. 基于随机森林的滑坡空间易发性评价: 以三峡库区湖北段为例. 地球科学, 46(1): 321-330. doi: 10.3799/dqkx.2020.032
      张文振, 2014. 吹填超软土的固结特性试验分析(硕士学位论文). 天津: 天津大学, 41-42.
      周军, 2014. 土先期固结压力问题的研究. (硕士学位论文). 武汉: 武汉理工大学, 12-15.
      周志华, 2016, 机器学习. 北京: 清华大学出版社, 171-196.
    • 加载中
    图(9) / 表(6)
    计量
    • 文章访问数:  502
    • HTML全文浏览量:  491
    • PDF下载量:  47
    • 被引次数: 0
    出版历程
    • 收稿日期:  2022-11-07
    • 网络出版日期:  2023-06-06
    • 刊出日期:  2023-05-25

    目录

      /

      返回文章
      返回