• 中国出版政府奖提名奖

    中国百强科技报刊

    湖北出版政府奖

    中国高校百佳科技期刊

    中国最美期刊

    留言板

    尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

    姓名
    邮箱
    手机号码
    标题
    留言内容
    验证码

    考虑机器学习建模中训练/测试集时空划分原则的滑坡易发性预测建模

    黄发明 欧阳慰平 蒋水华 范宣梅 连志鹏 周创兵

    黄发明, 欧阳慰平, 蒋水华, 范宣梅, 连志鹏, 周创兵, 2024. 考虑机器学习建模中训练/测试集时空划分原则的滑坡易发性预测建模. 地球科学, 49(5): 1607-1618. doi: 10.3799/dqkx.2022.357
    引用本文: 黄发明, 欧阳慰平, 蒋水华, 范宣梅, 连志鹏, 周创兵, 2024. 考虑机器学习建模中训练/测试集时空划分原则的滑坡易发性预测建模. 地球科学, 49(5): 1607-1618. doi: 10.3799/dqkx.2022.357
    Huang Faming, Ouyang Weiping, Jiang Shuihua, Fan Xuanmei, Lian Zhipeng, Zhou Chuangbing, 2024. Landslide Susceptibility Prediction Considering Spatio-Temporal Division Principle of Training/Testing Datasets in Machine Learning Models. Earth Science, 49(5): 1607-1618. doi: 10.3799/dqkx.2022.357
    Citation: Huang Faming, Ouyang Weiping, Jiang Shuihua, Fan Xuanmei, Lian Zhipeng, Zhou Chuangbing, 2024. Landslide Susceptibility Prediction Considering Spatio-Temporal Division Principle of Training/Testing Datasets in Machine Learning Models. Earth Science, 49(5): 1607-1618. doi: 10.3799/dqkx.2022.357

    考虑机器学习建模中训练/测试集时空划分原则的滑坡易发性预测建模

    doi: 10.3799/dqkx.2022.357
    基金项目: 

    国家自然科学基金面上项目 42377164

    详细信息
      作者简介:

      黄发明(1988-),男,副教授,博士生导师,主要从事滑坡灾害成因与风险评价研究.ORCID:0000-0002-4428-7133. E-mail:faminghuang@ncu.edu.cn

      通讯作者:

      连志鹏,E-mail:lianzhipeng1985@163.com

    • 中图分类号: P64

    Landslide Susceptibility Prediction Considering Spatio-Temporal Division Principle of Training/Testing Datasets in Machine Learning Models

    • 摘要: 滑坡易发性预测时大多按空间随机来划分模型训练/测试数据集,但随机划分方式难免将不确定性因素引入建模中.因为理论上滑坡易发性是基于过去的滑坡来预测将来发生滑坡的空间概率,具有显著的时间顺序特征而非单纯的空间随机,可见有必要探索基于滑坡发生的时间顺序划分模型训练/测试集.以浙江文成县为例获取11类环境因子及128个时间准确的滑坡;之后将联接了环境因子的滑坡-非滑坡样本分别按照滑坡时间顺序和空间随机的原则,划分为两类不同训练/测试集;其划分比例分别设定为9∶1、8∶2、7∶3、6∶4和5∶5等以避免不同比例影响研究结果,由此得到10种组合工况下的训练/测试集;最后再训练测试支持向量机(SVM)、多层感知器(MLP)和随机森林(RF)等模型以预测滑坡易发性并分析其不确定性.结果表明:(1)训练/测试集按时间顺序划分的SVM、MLP和RF模型预测的滑坡易发性的不确定性略低于按空间随机性划分的模型,验证了按时间顺序划分的可行性;(2)训练/测试集按时间顺序划分实际上是其在空间随机划分下的一种更符合滑坡发生实际情况的“确定性”特征,当然对缺乏滑坡发生时间的数据集开展空间随机划分也是可行的.

       

    • 图  1  研究区概况

      Fig.  1.  Overview of the study area

      图  2  地形地貌因子

      Fig.  2.  Topography and geomorphic factor diagram

      图  3  水文环境、基础地质和地表覆盖因子

      Fig.  3.  Hydrological environment, basic geology and land cover factors

      图  4  时间顺序(a~c)和空间随机(d~f)两种工况在9∶1和7∶3两种训练/测试集比例下的SVM预测易发性

      Fig.  4.  The susceptibility distribution of SVM model under the time series (a‒c) and spatial random (d‒f) conditions and the two training/testing set ratios of 9∶1 and 7∶3

      图  5  时间顺序(a~c)和空间随机(d~f)两种划分在9∶1和7∶3两种训练/测试集比例下MLP预测易发性图

      Fig.  5.  Susceptibility distribution of MLP model under the time series (a‒c) and spatial random (d‒f) conditions and the two training/testing set ratios of 9∶1 and 7∶3

      图  6  时间顺序(a~c)和空间随机(d~f)两种工况在9∶1和7∶3两种训练/测试集比例下的RF模型易发性图

      Fig.  6.  Susceptibility distribution of RF model under the time series (a‒c) and spatial random (d‒f) conditions and the two training/testing set ratios of 9∶1 and 7∶3

      图  7  各工况在9∶1、7∶3和5∶5三种训练/测试集比例下的RF模型ROC曲线

      Fig.  7.  ROC curves of RF models at 9∶1, 7∶3 and 5∶5 training/testing datasets for each working condition

      表  1  按时间顺序原则划分的机器学习模型的滑坡训练集和测试集

      Table  1.   Landslide training set and test set of machine learning model divided according to chronological principle

      训练集/测试集 时间节点(年) 训练集中滑坡栅格数 测试集中滑坡栅格数 滑坡‒非滑坡样本中的训练/测试集划分数
      9∶1 2011 760 105 1 520∶210
      8∶2 2009 655 207 1 310∶414
      7∶3 2006 617 239 1 234∶478
      6∶4 2005.7 507 347 1 014∶694
      5∶5 2004.8 427 432 854∶864
      下载: 导出CSV

      表  2  时间顺序和空间随机两种划分工况SVM模型预测滑坡易发性的频率比分析

      Table  2.   Frequency ratios of LSP by SVM under two division conditions of time series and spatial random

      训练/ 测试集比例 易发性等级 时间工况全区栅格比例(%) 时间工况坡内栅格比例(%) 时间工况频率比(FR) 随机工况全区栅格比例(%) 随机工况坡内栅格比例(%) 随机工况频率比(FR)
      9∶1 极低 36.69 1.27 0.035 38.29 1.16 0.030
      21.93 5.78 0.264 21.22 6.71 0.316
      15.42 9.13 0.592 14.97 8.32 0.556
      13.55 24.39 1.801 13.37 24.62 1.842
      极高 12.41 59.42 4.787 12.14 59.19 4.874
      8∶2 极低 34.53 1.51 0.044 36.42 1.86 0.051
      22.12 5.80 0.262 22.38 6.61 0.296
      16.51 11.37 0.689 15.65 9.05 0.578
      14.36 23.20 1.616 13.48 26.80 1.988
      极高 12.48 58.12 4.655 12.08 55.68 4.611
      7∶3 极低 34.63 1.29 0.037 34.46 1.75 0.051
      21.68 5.61 0.259 22.31 6.78 0.304
      16.57 11.33 0.684 16.39 7.71 0.471
      14.29 21.85 1.529 14.12 24.65 1.746
      极高 12.82 59.93 4.675 12.73 59.11 4.643
      6∶4 极低 33.59 1.87 0.056 35.70 1.99 0.056
      22.17 5.27 0.238 21.58 7.14 0.331
      17.04 12.65 0.742 16.18 9.48 0.586
      14.72 23.65 1.607 14.08 24.36 1.729
      极高 12.48 56.56 4.533 12.46 57.03 4.576
      5∶5 极低 33.82 2.91 0.086 34.36 2.44 0.071
      21.96 8.27 0.376 21.52 7.68 0.357
      17.31 15.83 0.915 17.28 15.13 0.876
      14.71 24.10 1.639 15.07 27.60 1.831
      极高 12.20 48.89 4.009 11.77 47.15 4.004
      下载: 导出CSV

      表  3  各工况下SVM、MLP、RF模型ROC曲线的AUC值

      Table  3.   AUC values of ROC curves of SVM, MLP and RF models under various working conditions

      机器学习模型 AUC (括号外为时间顺序划分而括号内为空间随机划分工况)
      9∶1 8∶2 7∶3 6∶4 5∶5
      SVM 0.847 (0.845) 0.809 (0.781) 0.824 (0.835) 0.816 (0.811) 0.807 (0.818)
      MLP 0.818 (0.802) 0.812 (0.794) 0.818 (0.821) 0.816 (0.809) 0.788 (0.791)
      RF 0.872 (0.911) 0.853 (0.855) 0.868 (0.901) 0.858 (0.849) 0.821 (0.880)
      下载: 导出CSV

      表  4  各工况下SVM、MLP、RF模型的平均值和标准差

      Table  4.   Mean value and standard deviation of SVM, MLP and RF models under various working conditions

      训练集/测试集 MEAN(括号外为时间顺序划分而括号内为空间随机划分) SD (括号外为时间顺序划分而括号内为空间随机划分)
      SVM MLP RF SVM MLP RF
      9∶1 0.307 (0.301) 0.332 (0.306) 0.302 (0.319) 0.277 (0.276) 0.287 (0.259) 0.230 (0.210)
      8∶2 0.316 (0.302) 0.315 (0.315) 0.318 (0.325) 0.272 (0.270) 0.299 (0.257) 0.213 (0.210)
      7∶3 0.317 (0.314) 0.315 (0.352) 0.314 (0.331) 0.273 (0.271) 0.279 (0.244) 0.225 (0.208)
      6∶4 0.315 (0.322) 0.318 (0.342) 0.328 (0.344) 0.269 (0.267) 0.264 (0.241) 0.216 (0.209)
      5∶5 0.321 (0.321) 0.316 (0.331) 0.325 (0.329) 0.252 (0.265) 0.260 (0.247) 0.203 (0.195)
      下载: 导出CSV
    • Cao, W. G., Pan, D., Xu, Z. J., et al., 2023. Landslide Hazard Susceptibility Mapping in Henan Province: Comparison of Multiple Machine Learning Models. Bulletin of Geological Science and Technology, 1-11 (in Chinese with English abstract).
      Chen, W., Peng, J. B., Hong, H. Y., et al., 2018. Landslide Susceptibility Modelling Using GIS-Based Machine Learning Techniques for Chongren County, Jiangxi Province, China. Science of the Total Environment, 626: 1121-1135. https://doi.org/10.1016/j.scitotenv.2018.01.124
      Guo, Y. H., Dou, J., Xiang, Z. L., et al., 2023. Evaluation of Susceptibility of Wenchuan Coseismic Landslide Using Gradient Lifting Decision Trees and Random Forests Based on Optimal Negative Sample Sampling Strategy. Geological Science and Technology Bulletin, 1-20 (in Chinese with English abstract).
      Huang, F. M., Chen, B., Mao, D. X., et al., 2023. Landslide Susceptibility Prediction Modeling and Interpretability Based on Self-Screening Deep Learning Model. Earth Science, 48(5): 1696-1710 (in Chinese with English abstract).
      Huang, F. M., Chen, J. W., Tang, Z. P., et al., 2021. Uncertainties of Landslide Susceptibility Prediction Due to Different Spatial Resolutions and Different Proportions of Training and Testing Datasets. Chinese Journal of Rock Mechanics and Engineering, 40(6): 1155-1169 (in Chinese with English abstract).
      Huang, F. M., Hu, S. Y., Yan, X. Y., et al., 2022a. Landslide Susceptibility Prediction Modeling Based on Machine Learning and Identification of Main Control Factors. Bulletin of Geological Science and Technology, 41(2): 79-90 (in Chinese with English abstract).
      Huang, F. M., Li, J. F., Wang, J. Y., et al., 2022b. Landslide Susceptibility Prediction Modeling Law Considering Suitability of Linear Environmental Factors and Different Machine Learning Models. Bulletin of Geological Science and Technology, 41(2): 44-59 (in Chinese with English abstract).
      Huang, F. M., Ye, Z., Jiang, S. H., et al., 2021. Uncertainty Study of Landslide Susceptibility Prediction Considering the Different Attribute Interval Numbers of Environmental Factors and Different Data-Based Models. CATENA, 202: 105250. https://doi.org/10.1016/j.catena.2021.105250
      Hussin, H. Y., Zumpano, V., Reichenbach, P., et al., 2016. Different Landslide Sampling Strategies in a Grid-Based Bi-Variate Statistical Susceptibility Model. Geomorphology, 253: 508-523. https://doi.org/10.1016/j.geomorph.2015.10.030
      Khanna, K., Martha, T. R., Roy, P., et al., 2021. Effect of Time and Space Partitioning Strategies of Samples on Regional Landslide Susceptibility Modelling. Landslides, 18(6): 2281-2294. https://doi.org/10.1007/s10346-021-01627-3
      Li, W. B., Fan, X. M., Huang, F. M., et al., 2021. Uncertainties of Landslide Susceptibility Modeling under Different Environmental Factor Connections and Prediction Models. Earth Science, 46(10): 3777-3795 (in Chinese with English abstract).
      Li, Y. W., Xu, L. R., Zhang, L. L., et al., 2023. Study on Development Patterns and Susceptibility Evaluation of Coseismic Landslides within Mountainous Regions Influenced by Strong Earthquakes. Earth Science, 48(5): 1960-1976 (in Chinese with English abstract).
      Lombardo, L., Tanyas, H., 2020. Chrono-Validation of Near-Real-Time Landslide Susceptibility Models via Plug-in Statistical Simulations. Engineering Geology, 278: 105818. https://doi.org/10.1016/j.enggeo.2020.105818
      Shirzadi, A., Solaimani, K., Roshan, M. H., et al., 2019. Uncertainties of Prediction Accuracy in Shallow Landslide Modeling: Sample Size and Raster Resolution. CATENA, 178: 172-188. https://doi.org/10.1016/j.catena.2019.03.017
      Wang, L. L., 2016. Feature Processing Methods in the Assessment of the Vulnerability of Rainfall-Type Landslides. Zhejiang University, Hangzhou (in Chinese with English abstract).
      Wu, R. Z., Hu, X. D., Mei, H. B., et al., 2021. Spatial Susceptibility Assessment of Landslides Based on Random Forest: A Case Study from Hubei Section in the Three Gorges Reservoir Area. Earth Science, 46(1): 321-330 (in Chinese with English abstract).
      Zhang, H., Gu, Q. Y., Sun, C. B., et al., 2022. Landslide Susceptibility Mapping in Hilly and Gentle Slope Region Based on Interpretable Machine Learning. Journal of Chongqing Normal University (Natural Science), 39(3): 78-92 (in Chinese with English abstract).
      Zhu, J. X., Zhang, L. Z., Zhou, X. Y., et al., 2014. Characteristics of Temporal Scale of Regional Landslides Susceptibility Assessment. Soil and Water Conservation in China, (6): 18-21, 69 (in Chinese with English abstract).
      曹文庚, 潘登, 徐郅杰, 等, 2023. 河南省滑坡灾害易发性制图研究: 多种机器学习模型的对比. 地质科技通报, 1-11. https://www.cnki.com.cn/Article/CJFDTOTAL-FJDZ202303011.htm
      郭衍昊, 窦杰, 向子林, 等, 2023. 基于优化负样本采样策略的梯度提升决策树与随机森林的汶川同震滑坡易发性评价. 地质科技通报, 1-20.
      黄发明, 陈彬, 毛达雄, 等, 2023. 基于自筛选深度学习的滑坡易发性预测建模及其可解释性. 地球科学, 48(5): 1696-1710. doi: 10.3799/dqkx.2022.247
      黄发明, 陈佳武, 唐志鹏, 等, 2021. 不同空间分辨率和训练测试集比例下的滑坡易发性预测不确定性. 岩石力学与工程学报, 40(6): 1155-1169. https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX202106008.htm
      黄发明, 胡松雁, 闫学涯, 等, 2022a. 基于机器学习的滑坡易发性预测建模及其主控因子识别. 地质科技通报, 41(2): 79-90. https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202202008.htm
      黄发明, 李金凤, 王俊宇, 等, 2022b. 考虑线状环境因子适宜性和不同机器学习模型的滑坡易发性预测建模规律. 地质科技通报, 41(2): 44-59. https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202202005.htm
      李文彬, 范宣梅, 黄发明, 等, 2021. 不同环境因子联接和预测模型的滑坡易发性建模不确定性. 地球科学, 46(10): 3777-3795. doi: 10.3799/dqkx.2021.042
      李永威, 徐林荣, 张亮亮, 等, 2023. 强震山区地震诱发滑坡发育规律与易发性评估. 地球科学, 48(5): 1960-1976. doi: 10.3799/dqkx.2022.224
      王丽丽, 2016. 降雨型滑坡地质灾害易发性评价中的特征处理方法. 杭州: 浙江大学.
      吴润泽, 胡旭东, 梅红波, 等, 2021. 基于随机森林的滑坡空间易发性评价: 以三峡库区湖北段为例. 地球科学, 46(1): 321-330. doi: 10.3799/dqkx.2020.032
      张虹, 辜庆渝, 孙诚彬, 等, 2022. 基于可解释性机器学习的丘陵缓坡地区滑坡易发性区划研究. 重庆师范大学学报(自然科学版), 39(3): 78-92. https://www.cnki.com.cn/Article/CJFDTOTAL-CQSF202203012.htm
      朱吉祥, 张礼中, 周小元, 等, 2014. 区域滑坡易发性评价的时间尺度特征分析. 中国水土保持, (6): 18-21, 69. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGSB201406008.htm
    • 加载中
    图(7) / 表(4)
    计量
    • 文章访问数:  658
    • HTML全文浏览量:  106
    • PDF下载量:  86
    • 被引次数: 0
    出版历程
    • 收稿日期:  2022-07-07
    • 网络出版日期:  2024-06-04
    • 刊出日期:  2024-05-25

    目录

      /

      返回文章
      返回