Spike Waveform Recognition for Strong-Motion Records Based on LightGBM-SVM Stacking Algorithm
-
摘要: 强震动记录中的尖刺是一种常见异常波形,其产生机理尚不清晰,需积累大量数据深入研究,因此尖刺识别具有重要意义.提出了一种基于波形比例尺自适应预处理方法,用于提取并强化幅值变化特征,结合时间尺度判别标准,降低幅值差异对人工标注的影响.同时提出了一种特征表征方法,将一维数据按采样点幅值的累积分布归一化为特征向量,以表征强震动记录的空间分布特征.对类别极不平衡数据集,训练多种机器学习模型,并对误识别情况进行分析.进一步采用贝叶斯优化的LightGBM-SVM堆叠算法实现尖刺波形识别,测试集马修斯相关系数(MCC)超过86%.结果表明,所提尖刺判别标准具有稳定性与普适性,可作为数据质量评估辅助工具,并为尖刺波形机理研究提供技术支撑.Abstract: Spike in strong-motion record is a common type of abnormal waveform. However, their generation mechanism remains unclear and requires the accumulation of large datasets for further study, making spike identification highly significant. This study proposes a preprocessing method based on adaptive waveform scaling to extract and enhance amplitude variation features, combined with time-scale discrimination criteria, thereby reducing the impact of amplitude differences on manual annotation accuracy. In addition, a novel feature representation approach is introduced, in which one-dimensional data are transformed into feature vectors by normalizing the cumulative distribution of sampling amplitudes, enabling the spatial distribution characteristics of strong-motion records to be represented. Using a highly imbalanced dataset, multiple machine learning models were trained, and cases of misclassification were analyzed. Furthermore, LightGBM-SVM stacking algorithm optimized with Bayesian optimization is adopted to achieve the recognition of spike waveforms, achieving a Matthews correlation coefficient (MCC) exceeding 86% on the test set. The results show that the proposed spike discrimination criterion achieved satisfactory performance, confirming its stability and generalizability. The method can serve as an auxiliary tool for spike waveform screening in data quality assessment and provide technical support for further investigations into the generation mechanism of spike waveforms.
-
Key words:
- spike waveform /
- strong-motion record /
- machine learning /
- LightGBM /
- SVM /
- stacking algorithm /
- seismology
-
表 1 数据集地震事件信息
Table 1. Seismic event information of the dataset
地区 日期 震级 记录数量 Darfield,新西兰 2010‒09‒04 Mw7.0 97 El Mayor-Cucapah,墨西哥 2010‒08‒04 Mw7.2 78 十胜,日本 2003‒09‒26 Mw7.1 67 十胜,日本 2003‒09‒26 Mw8.0 165 纪伊半岛,日本 2004‒09‒05 MJMA7.4 49 钏路,日本 2004‒11‒29 MJMA7.1 67 汶川,中国 2008‒05‒12 Ms8.0 150 岩手‒宫城,日本 2008‒06‒14 MJMA7.2 99 三陆冲,日本 2011‒03‒11 MJMA7.7 51 三陆冲,日本 2011‒03‒11 MJMA7.4 31 三陆冲,日本 2011‒03‒11 Mw9.0 128 宫城,日本 2011‒04‒07 MJMA7.1 212 三陆冲,日本 2012‒12‒07 MJMA7.3 131 芦山,中国 2013‒04‒20 Ms7.0 49 熊本,日本 2016‒04‒16 MJMA7.3 136 福岛,日本 2016‒11‒22 MJMA7.4 60 福岛,日本 2021‒02‒13 MJMA7.3 230 福岛,日本 2022‒03‒16 MJMA7.4 292 Pazarcik,土耳其 2023‒02‒06 Mw7.6 43 Elbistan,土耳其 2023‒02‒06 Mw7.8 113 能登半岛,日本 2024‒01‒16 Mw7.5 139 十胜,日本 2008‒09‒11 MJMA7.1 12 三陆冲,日本 2011‒03‒09 MJMA7.3 23 福岛,日本 2013‒10‒26 MJMA7.1 23 九寨沟,中国 2017‒08‒08 Ms7.0 10 表 2 各机器学习模型主要参数、优化范围及最优参数
Table 2. The main parameters, optimization range and optimal parameters of each machine learning model
模型 参数 超参数搜索范围 最优参数 LightGBM 决策树最大叶数 [5, 100] 12 决策树数量 [10, 1000] 947 学习率 [0.01, 0.5] 0.3 SVM 正则化参数 [1, 1000] 72 RBF核函数的参数 [‒5, ‒1] ‒1.09 DNN 全连接层神经元个数 [64, 128] 116 神经元丢弃率 [0.1, 0.5] 0.124 KNN 邻居数 [1, 19] 1 权重 {Uniform, Distance} Uniform CNN 卷积核数量 [16, 64] 17 卷积核尺寸 [3, 7] 6 训练批量大小 [16, 64] 40 学习率 [0.001, 0.01] 0.009 LR 正则化强度 [0.001, 1] 0.945 求解器类型 {Liblinear, Sag, Newton-Cg, Lbfgs} Sag 惩罚项类型 {L1, L2} L2 NB 平滑参数 [1E-9, 1E-6] 6E-9 表 3 机器学习模型对测试集混淆矩阵
Table 3. Confusion matrices of machine learning models for the test set
模型 TP TN FP FN LightGBM 32 577 3 1 SVM 32 576 3 2 DNN 31 576 4 2 CNN 32 576 3 2 LR 32 572 3 6 NB 30 558 5 20 KNN 26 576 9 2 Stacking 33 575 2 3 表 4 各机器学习模型对含尖刺记录识别性能MCC统计学描述
Table 4. Statistical description of the recognition performance of each machine learning model for records with spike (MCC)
模型 均值 标准差 最小值 最大值 中位值 DNN 0.882 0.050 0.800 0.946 0.879 LR 0.856 0.032 0.808 0.921 0.857 SVM 0.901 0.035 0.846 0.946 0.899 LightGBM 0.901 0.059 0.779 0.951 0.925 KNN 0.829 0.041 0.756 0.889 0.835 CNN 0.838 0.058 0.707 0.898 0.851 NB 0.648 0.034 0.592 0.713 0.638 Stacking 0.925 0.036 0.866 0.965 0.931 表 5 机器学习模型对测试集混淆矩阵
Table 5. Confusion matrices of machine learning models for the test set
模型 TP TN FP FN LightGBM 12 65 2 1 SVM 12 65 2 1 DNN 12 66 2 0 CNN 12 65 2 1 LR 12 65 2 3 NB 11 64 3 2 KNN 12 65 2 1 Stacking 12 65 2 1 -
Barnes, A. E., 2009. The Origin and Significance of Spikes in Complex Seismic Trace Attributes. SEG Technical Program Expanded Abstracts, 28(1): 1048-1052. https://doi.org/10.1190/1.3255029 Boore, D. M., Bommer, J. J., 2005. Processing of Strong-Motion Accelerograms: Needs, Options and Consequences. Soil Dynamics and Earthquake Engineering, 25(2): 93-115. https://doi.org/10.1016/j.soildyn.2004.10.007 Chen, H. Y., Li, X. Y., Feng, Z. B., et al., 2023. Shield Attitude Prediction Based on Bayesian-LGBM Machine Learning. Information Sciences, 632: 105-129. https://doi.org/10.1016/j.ins.2023.03.004 Chicco, D., Jurman, G., 2020. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genomics, 21: 1-13. https://doi.org/10.1186/s12864-019-6413-7 China Earthquake Administration, 2018. Specification for the Construction of Seismic Station-Strong Motion Station (DB/T17-2018). Seismological Press, Beijing (in Chinese). Douglas, J., 2003. What is a Poor Quality Strong-Motion Record? Bulletin of Earthquake Engineering, 1(1): 141-156. https://doi.org/10.1023/A:1024861528201 Ji, K., Ren, Y. F., Wen, R. Z., 2017. Site Classification for National Strong Motion Observation Network System (NSMONS) Stations in China Using an Empirical H/V Spectral Ratio Method. Journal of Asian Earth Sciences, 147: 79-94. https://doi.org/10.1016/j.jseaes.2017.07.032 Katsumata, A., 1996. Comparison of Magnitudes Estimated by the Japan Meteorological Agency with Moment Magnitudes for Intermediate and Deep Earthquakes. Bulletin of the Seismological Society of America, 86(3): 832-842. https://doi.org/10.1785/bssa0860030832 Kim, S., Lee, K., You, K., et al., 2020. Seismic Discrimination between Earthquakes and Explosions Using Support Vector Machine. Sensors, 20(7): 1879. https://doi.org/10.3390/s20071879 Li, C. G., Wang, H. W., Wen, R. Z., et al., 2025. Simulation of Broadband Ground Motion in Yangbi Earthquake by Integrating Spectral Element Method and Artificial Neural Networks. Earth Science, 51(1): 1-14 (in Chinese with English abstract). Li, S. Y., Jin, X., Liu, Q. F., et al., 2003. Prospect of Strong Motion Observation in China. Earthquake Engineering and Engineering Vibration, 23(2): 1-7 (in Chinese with English abstract). Li, X. J., Zhou, Z. H., Yu, H. Y., et al., 2008. Strong Motion Observations and Recordings from the Great Wenchuan Earthquake. Earthquake Engineering and Engineering Vibration, 7(3): 235-246. https://doi.org/10.1007/s11803-008-0892-x Li, Y. S., Huang, C., Yi, S. J., et al., 2017. Study on Seismic Fault and Source Rupture Tectonic Dynamic Mechanism of Jiuzhaigou MS7.0 Earthquake. Journal of Engineering Geology, 25(4): 1141-1150 (in Chinese with English abstract). Liu, D. Q., Yue, S. G., 2019. Event-Driven Continuous STDP Learning with Deep Structure for Visual Pattern Recognition. IEEE Transactions on Cybernetics, 49(4): 1377-1390. https://doi.org/10.1109/TCYB.2018.2801476 Long, T., Akbari, M., Fakharian, P., 2025. Prediction of Soil Liquefaction Using a Multi-Algorithm Technique: Stacking Ensemble Techniques and Bayesian Optimization. Journal of Soft Computing in Civil Engineering, 9(2): 32-55. https://doi.org/10.22115/scce.2024.453006.1860 Pakniat, S., Najafizadeh, J., Kadkhodaavval, M., 2025. Machine Learning for Earthquake Engineering Analysis: Comparing Regression Models to Predict Peak Ground Acceleration. World Journal of Advanced Research and Reviews, 26(2): 856-867. https://doi.org/10.30574/wjarr.2025.26.2.1714 Song, J. D., Zhu, J. B., Wei, Y. X., et al., 2023. Backtracking Verification of Machine Learning Earthquake Early Warning Magnitude Estimation and On-Site Threshold Alarm for Menyuan M6.9 Earthquake in Qinghai on January 8, 2022. Chinese Journal of Geophysics, 66(7): 2903-2919 (in Chinese with English abstract). Stanghellini, G., Bonazzi, C., 2002. Local-Trace Zeroing and Spike Zeroing: Two Short Automated Noise-Rejection Routines to Remove Noise and Spikes on Seismic Traces. Geophysics, 67(1): 188-196. https://doi.org/10.1190/1.1451510 Wang, H. J., Jin, P., Liu, G. Z., 2003. Automatic Spikes Detection in Seismogram. Acta Seismologica Sinica, 16: 348-355. https://doi.org/10.1007/s11589-003-0039-0 Wang, W. Y., Ji, K., Wen, R. Z., et al., 2020. Impact of Strong Ground Motion's Process Procedure on the Structural Nonlinear Time-History Analysis. Engineering Mechanics, 37(S): 42-50, 62. https://doi.org/10.6052/j.issn.1000-4750.2019.04.S003 Yao, X. Y., 2023. Study on the Elaborate Data Processing of Strong Motion Record and the Flatfile Parameters in China (Dissertation). Institute of Engineering Mechanics, China Earthquake Administration, Harbin (in Chinese with English abstract). Yao, X. X., Ren, Y. F., Kishida, T., et al., 2022. The Procedure of Filtering the Strong Motion Record: Denoising and Filtering. Engineering Mechanics, 39(S1): 320-329 (in Chinese with English abstract). Yao, X. Y., Zhou, Q. Z., Wang, C., et al., 2021. An Adaptive Seismic Signal Denoising Method Based on Variational Mode Decomposition. Measurement, 177: 109277. https://doi.org/10.1016/j.measurement.2021.109277 Yu, Q., Li, S. L., Tang, H. J., et al., 2022. Toward Efficient Processing and Learning with Spikes: New Approaches for Multispike Learning. IEEE Transactions on Cybernetics, 52(3): 1364-1376. https://doi.org/10.1109/TCYB.2020.2984888 Yu, Q., Wang, L. B., Dang, J. W., 2018. Efficient Multi-Spike Learning with Tempotron-Like LTP and PSD-Like LTD. In: Cheng, L., Leung, A., Ozawa, S., eds., Neural Information Processing. Springer, Cham, 545-554. Zhao, G. C., Xu, L. J., Lin, S. B., et al., 2024. An Easy-to-Update Pulse-like Ground Motion Identification Method Based on Siamese Convolutional Neural Networks. Journal of Earthquake Engineering, 28(1): 1-19. https://doi.org/10.1080/13632469.2023.2199433 Zhou, B. F., Song, Q., Ren, Y. F., et al., 2024. The Study on the Influence and Application of Foundation Pier Height on Strong Motion Records. Earth Science, 49(2): 414-424 (in Chinese with English abstract). Zhou, B. F., Yu, H. Y., Wen, R. Z., et al., 2017. Preliminary Study on Data Quality in Strong Motion Records. Seismological and Geomagnetic Observation and Research, 38(1): 69-75 (in Chinese with English abstract). Zhu, J. B., Liu, H. Y., Luan, S. C., et al., 2025. Prediction of On-Site Peak Ground Motion Based on Machine Learning and Transfer Learning. Earth Science, 50(5): 1842-1860 (in Chinese with English abstract). 李春果, 王宏伟, 温瑞智, 等, 2025. 融合谱元法与人工神经网络的漾濞地震宽频带地震动模拟. 地球科学, 51(1): 1-14. doi: 10.3799/dqkx.2025.085 李山有, 金星, 刘启方, 等, 2003. 中国强震动观测展望. 地震工程与工程振动, 23(2): 1-7. 李渝生, 黄超, 易树健, 等, 2017. 九寨沟7.0级地震的地震断裂及震源破裂的构造动力学机理研究. 工程地质学报, 25(4): 1141-1150. 宋晋东, 朱景宝, 韦永祥, 等, 2023. 2022年1月8日青海门源6.9级地震机器学习地震预警震级估计与现地阈值报警的回溯验证. 地球物理学报, 66(7): 2903-2919. 姚鑫鑫, 2023. 我国强震动记录精细化数据处理及Flatfile参数研究(博士学位论文). 哈尔滨: 中国地震局工程力学研究所. 姚鑫鑫, 任叶飞, 岸田忠大, 等, 2022. 强震动记录的数据处理流程: 去噪滤波. 工程力学, 39(S1): 320-329. 中国地震局, 2018. 地震台站建设规范——强震动台站(DB/T 17-2018). 北京: 地震出版社. 周宝峰, 宋泉, 任叶飞, 等, 2024. 基墩高度对强震动记录的影响及应用研究. 地球科学, 49(2): 414-424. doi: 10.3799/dqkx.2023.149 周宝峰, 于海英, 温瑞智, 等, 2017. 强震动记录数据质量初探. 地震地磁观测与研究, 38(1): 69-75. 朱景宝, 刘赫奕, 栾世成, 等, 2025. 基于机器学习和迁移学习的现地地震动峰值预测. 地球科学, 50(5): 1842-1860. doi: 10.3799/dqkx.2024.071 -




下载: