Abstract:
Spike in strong-motion record is a common type of abnormal waveform. However, their generation mechanism remains unclear and requires the accumulation of large datasets for further study, making spike identification highly significant. This study proposes a preprocessing method based on adaptive waveform scaling to extract and enhance amplitude variation features, combined with time-scale discrimination criteria, thereby reducing the impact of amplitude differences on manual annotation accuracy. In addition, a novel feature representation approach is introduced, in which one-dimensional data are transformed into feature vectors by normalizing the cumulative distribution of sampling amplitudes, enabling the spatial distribution characteristics of strong-motion records to be represented. Using a highly imbalanced dataset, multiple machine learning models were trained, and cases of misclassification were analyzed. Furthermore, LightGBM-SVM stacking algorithm optimized with Bayesian optimization is adopted to achieve the recognition of spike waveforms, achieving a Matthews correlation coefficient (MCC) exceeding 86% on the test set. The results show that the proposed spike discrimination criterion achieved satisfactory performance, confirming its stability and generalizability. The method can serve as an auxiliary tool for spike waveform screening in data quality assessment and provide technical support for further investigations into the generation mechanism of spike waveforms.