1. Introduction
1.1 Ground motion
The 9.12 Gyeongju earthquake (12 September 2016, ML 5.8) and the Pohang earthquake (15 November 2017, ML 5.4) have occurred in the Korean Peninsula, leading to the need to reassess the seismic vulnerability of buildings, bridges, and dams. Accurate seismic evaluation requires an appropriate GMPE (Ground Motion Prediction Equation) or site effects, which requires a large amount of properly processed ground motion data. Time series include background noise from small vibrations in the environment, noise from mechanical devices, etc. Therefore, ground motion data generated from time series requires appropriate processing to reduce back ground noise such as a high-pass filter. If ground motion data is generated without proper processing, we may get a different result from the physical properties.
Numerous methods have been studied globally for processing ground motion data. The PEER (Pacific Earthquake Engineering Research Center) in the United States provides tools for searching, selecting, and downloading ground motion data and comprehensive metadata for earthquakes including hypercenter and depth and other earthquake characteristics. The PEER database [1, 3] consists of the NGA-EAST and NGA-WEST.
The objective of NGA-EAST is to develop a new GMPE for the Central and Eastern North America (CENA) region. It consists of more than 27,000 ground motion data from earthquakes in the region. NGA-WEST includes a very large set of ground motions recorded in worldwide shallow crustal earthquakes in active tectonic regimes, which was started in March 2010. The ground motion data processing method of NGA-EAST is: 1) remove instrument response from time series data; 2) correct baseline by demean; 3) determine high-pass filter frequency by comparing FAS (Fourier amplitude spectrum) between noise and signal windows; 4) apply the causal Butterworth filter [2, 3]. RESORCE (Reference database for Seismic grOund-motion pRediction in Europe) is a single integrated accelerometer database for seismic research in Europe and surrounding areas. It has ground motion data from 1,540 seismic stations and 1,814 earthquakes. The processing method is: 1) demean and taper the beginning and end of data; 2) add zero pads to the end of data; 3) apply a 4-pole acausal Butterworth filter; 4) double integrate the filtered acceleration to obtain displacement; 5) fit a polynomial of order 6 to the displacement trace; 6) subtract the second derivative of polynomial from acceleration [4]. In Korea, there was a study to build a ground motion database that can be used for the development of the ground motion attenuation equation by collecting seismic data accumulated since the 2000s [5], and a study describes how to create the ground motion flat file, and how to process for time series [6]. The ground motion processing consisted of converting miniSEED to Seismic Analysis Code (SAC), adding metadata to the SAC, removing mean, applying to taper, removing instrument response, analyzing the FAS signal-to-noise ratio (SNR) to determine high-pass filter frequency, and applying a 5-pole acausal Butterworth filter.
1.2 Traditional methods
The most important part of processing ground motion is to find the appropriate high-pass filter frequency. There are three methods to do this: 1) comparison with the noise and the signal FAS; 2) comparison to the f2 trend line fitting; and 3) inspection of the displacement curve after filtering.
The first method is to calculate the signal-to-noise ratio of the FAS. Determine the high-pass filter frequency based on the point at which the signal-to-noise ratio is at least 2 to 3 times higher. The second is to compare the f2 trend line to the FAS of the signal. Typically, the frequency of background noise increases in the low-frequency band. Therefore, if the signal FAS does not fit the f2 trend line in the low-frequency band, it can be assumed that background noise is present [7]. Fig. 1 shows an example of the high-pass filter frequency determination methods by the first and second methods. FASseismic presents the result of FAS analysis on a signal part, while FASnoise and 3xFASnoise show the result of FAS analysis on a background noise part and amplified by a factor of 3. The f2 trend is the result of fitting the f2 trend line to the FAS of the signal. fcHP_traditional indicates the high-pass filter frequency determined by the first and second methods. The third method is to inspect the displacement from filtered acceleration. Make plots of the filtered displacements for a series of log-spaced frequencies. If the start and end of the filtered displacement are horizontal with a fitted straight line, high-pass filter frequency is considered qualified. Fig. 2 presents a suite of displacements from filtered accelerations.
1.3 Deep learning method
Traditional methods are subject to human error, time-consuming, and complicated. In this paper, we try to solve this problem using a deep learning approach. In recent years, there have been many studies on the use of deep learning in seismology such as seismic signal detection [8-10], seismic data interpolation [11], seismic parameter prediction [12], seismic noise analysis and reduction [13], and waveform modeling using neural operators [14].
Bo Liu [15] used a deep learning approach to determine the high-pass filter frequency for ground motion. In this study, he used a pre-trained CNN model to replace visual inspection to achieve the automatic judgment of the reasonableness of the filtered displacement time series. However, this study only evaluated qualified or unqualified of the filtered displacements and did not directly determine the high-pass filter frequency.
The purpose of this study is to determine the high-pass filter frequency using a deep learning approach. In order to extract features from the time series, we used the Mel-Spectrogram technique. Transfer learning and data augmentation techniques were applied to improve model training. We compared ResNet [16], DenseNet [17], EfficientNet [18], ViT [19], and DeiT [20], and three metrics((i.e., R2, MAE, and RMSE) were used to evaluate the results obtained using each model.
2. Data and methods
2.1 Data
In this study, we constructed a dataset using NGA-EAST and Korea Peninsula earthquakes. For NGA-EAST earthquakes, we adopted the NGA-EAST database from the PEER database. This database contains over 27,000 records from 82 earthquake events at 1271 recording stations. It includes seismic metadata such as origin time, location, and depth of the earthquake; station metadata including sensitivity and name; ground motion data; and high-pass filter frequency used to process ground motion. However, it does not provide a time series, so we collected these from the IRIS(Incorporated Research Institutions for Seismology). For Korea Peninsula earthquakes, we used the earthquake ground-motion database based on the Korean national seismic network [5]. This database covers 32,000 records from 140 earthquakes of magnitude 3.0 or greater that occurred in Korea between 2003 and 2019. It also did not provide a time series, so we collected these from NECIS (National Earthquake Comprehensive Information System). Fig. 3 shows the distribution of seismic events for the dataset. Fig. 4 shows the histogram of magnitude and epiccenter distance.
The magnitudes range from 2.2 to 5.8, with an average of 4.0. The epic-center distance ranges from 3.9 to 3,511km, with an average of 603km. Table 1 shows a summary of the dataset. The total sample was 42,980 and split into training (80%), validation (10%) and testing (10%) dataset.
2.2 Preprocessing and feature extraction
To extract features from time series, we first perform preprocessing. The process is as follows: 1) adjust the sampling rate to 100Hz; 2) derivative velocity to acceleration; 3) demean and remove the instrument response; 4) convert to physical unit using sensor sensitivity. We used Mel-Spectrogram for feature extraction, considering that traditional methods for determining high-pass filter frequency are mainly based on frequency-based analysis. Humans are more sensitive to differences between lower frequencies than higher frequencies. Mel-Spectrogram takes it into account by using the Mel scale instead of frequency. It is widely used in speech recognition. In seismology, it is used to identify noise [21] and earthquake magnitude prediction [22], etc.
Fig. 5 shows the feature extraction process. 1) divide the time series into overlapping windows; 2) perform FFT (Fast Fourier Transform) on each window; 3) apply Mel scale and convert to Db scale; 4) arrange and stack according to time. The results have a two-dimensional matrix. The x-axis, y-axis, and color map represent time, frequency, and dB, respectively. Due to variations in the length of time series for each sample, the x-axis length is varied. To improve the performance of model training, we adjusted the x-axis length to be the same using an interpolation technique. The final adjusted size is 128x128, which corresponds to the same sample size for the pre-training of the model.
2.3 Model
We employed ResNet, DenseNet, and EfficientNet, which are CNNbased models, along with ViT and DeiT, which are transformer-based models. Increasing the number of layers in a neural network can enhance its accuracy. However, it leads to overfitting and underfitting problems, making the model difficult to train. In order to overcome this problem, ResNet introduces the residual networks. These are easier to optimize because they provide a shortcut connection that skips one or more layers. DenseNet provides a dense block that allows each layer to access the features of all preceding layers. By connecting with all preceding layers, it can avoid vanishing gradients, and straightforward propagate features, and reduce the number of parameters. EffieicentNet proposes a new scaling method that uniformly scales all dimensions of depth, width, and resolution. By adopting the compound scaling method, it can efficiently reduce the number of parameters. ViT is a model based on the architecture of a transformer originally designed for natural language processing. It represents an input image as a series of image patches and uses a transformer encoder to extract contextual information. DeiT enables high performance with less data and computing resources. It provides a distillation technique where the following layer learns from the preceding layer through attention.
The input to the model is a 128×128 two-dimensional matrix, and the output is one digit of a high-pass filter frequency. Fig. 6 shows the architecture of the models. ResNet consists of a 7×7 conv layer, 48 connected residual blocks, and a pooling layer. 7×7 conv layer performs downsampling. The residual blocks consist of several convolution layers followed by batch normalization and ReLU activation. It allows the network to learn residual functions that map the input to the desired output. The pooling layer is followed by FC2048 and a linear layer to reduce the dimensionality. DenseNet is similar to the architecture of ResNet. The dense block consists of multiple convolution layers that are densely connected. EffiecientNet uses a 3×3 conv layer for downsampling and is followed by a series of MBConv layers and a pooling layer. The MBConv layer has a wide, narrow, and wide structure, and the output is added to the input, forming a residual shortcut. The pooling layer is followed by a linear layer. ViT consists of linear projection of flattened patches, transformer block, and linear layer. The input data is divided into several patches. The patches are flattened, fed to the linear projection, added their position information, and fed to the transformer encoder for training. DeiT is architecturally identical to ViT, except for the learng method and the number of layers. Table 2 describes the specifications of the models.
2.4 Data augmentation
The dataset size is smaller compared to the number of parameters in the models. Data augmentation techniques for time series include noise injection, gapping, flipping, shifting, cropping, slicing, wrapping, and mixup. In this study, we tested these techniques and applied mixup. Because mixup mixes not only samples but also labels, it has an effect like label smoothing. Hence mixup outperformed other techniques. Mixup [23] creates a weighted combination of random sample pairs from training data. For input sample Xi, Yi, weight λ, and new data Xn, Yn expressed as:
Fig. 7 shows a simple visualization of mixup. It reduces overfitting problem and improves both the robustness and the generalization of the trained model. Consequently, it decreases the likelihood of the model predicting unexpected results.
2.5 Transfer learning
Despite applying mixup to mitigate the effects of a small dataset, it was necessary to explore additional techniques to improve model performance. Transfer learning is a machine learning technique that uses a pre-trained model to learn a new, related task. In this study, we adopted a pre-trained model with Image-1K, which has 1,431,167 image data, and contains 1,000 categories. Fig. 8 shows the strategy of transfer learning. Given the limited similarity between the domains and the results of Bo Liu’s study, we applied a “train entire model” strategy.
2.6 Training
We conducted training with a mini-batch size of 64 and 50 epochs on a single Nvidia GeForce GTX 4090 GPU with 24-GB memory usage, employing AdamW [24] optimization with β1 = 0.9, β2 = 0.999, and weight decay 0.01, a variant of Adam. The initial learning rate was set to 0.001, and in cases where no improvement in performance was observed during training, we halved it. Furthermore, Python 3.8 and Pytorch 2.2 deep learning frameworks were used as the training environment. The model input for training is a 128×128 two-dimensional array generated from a time series using the Mel-Spectrogram technique, and the output is a 1×1 array describing the high-pass filter frequency by traditional methods. The R2, RMSE, and MAE were used as performance metrics to evaluate the model. The metrics represent the following:
3. Results
The test set was tested using the model with the lowest loss and highest accuracies during the training process. The results showed that ResNet had the highest performance with 0.986 R2 and 0.028 loss. Table 3 shows the accuracy and loss results of the models on the validation set. Fig. 9 shows the accuracy and loss curve in the train and validation set. The x-axis is the epoch, and the y-axis is the accuracy or loss. The solid line shows the accuracy and loss on the validation set, and the dashed line shows the accuracy and loss on the training set. Throughout the training process, the losses of all models consistently decreased while their accuracies increased. This indicates that the models were being effectively trained without overfitting or underfitting.
Fig. 10 and Table 4 show the performance of each model on the test set. The results obtained using ResNet had the highest R2 of 0.977 and the lowest MAE, and RMSE of 0.006, and 0.074, respectively. Comparing the CNN and Transformer techniques, the models using CNN (ResNet, DenseNet, EfficientNet) performed better, and the models using Transformer (ViT, DeiT) converged later. This result indicates that CNN has a strong inductive bias towards locality [19], which allows the model to be easily generalized with a relatively small amount of data. Fig. 11 shows the relationship between G-FLOPs, the number of parameters, and accuracy. The circle radius represents the relative number of parameters in the model, the x-axis represents G-FLOPs, and the y-axis represents accuracy. EfficientNet indicated significantly lower computational power compared to ResNet, with approximately 10.25 times fewer G-FLOPs. Despite this difference in computational demands, the difference in R2 values between the two models was only 0.009. It shows that EfficientNet provides high performance while efficiently using computational resources.
In Fig. 12, the orange solid line, fcHP_traditional, represents the frequency obtained by the traditional method, while the black dotted line, fcHP_deep, shows the frequency obtained by deep learning using ResNet. The time series used for comparison corresponds to a magnitude 3.3 earthquake near Yeonpyeong Island in April 2015, recorded at the YJD station located 93 km from the epicenter. The frequency derived from the traditional method was 0.95 Hz, whereas that obtained through deep learning was 1.05 Hz, resulting in a difference of 0.1 Hz.
Fig. 13 shows a histogram of the difference between traditional and deep learning methods in the test set. 90% of the samples have a difference within 0.63 Hz, and the prediction by the deep learning method tends to underestimate the traditional method.
4. Conclusions
The purpose of this study is to determine the high-pass filter frequency through deep learning approach. To achieve this, features were extracted using Mel-Spectrogram, data augmentation was performed using mixup, and a pre-trained model was utilized. Notably, this study represents the first study endeavor in Korea about high-pass filter frequency determination using deep learning approach. The findings of this study have the guarantee of automating ground motion processing, which could significantly contribute to ground motion prediction and earthquake disaster research in the future. However, to use it in the system, it will be necessary to consider a defense mechanism against adversarial threats caused by exceptional cases. Therefore, we believe that it is necessary to statistically analyze the results of traditional and deep learning methods and to analyze exceptional cases such as permanent displacement, and large earthquakes in future works. The conclusions of this study are as follows:
-
1) We evaluated using the model with the lowest loss and highest accuracies during the training process. Among the models, ResNet had the highest performance with R2 of 0.977 and the lowest MAE, and RMSE of 0.006, and 0.074, respectively. Comparing the CNN and Transformer techniques, the models using CNN (ResNet, DenseNet, EfficientNet) performed better, and the models using Transformer (ViT, DeiT) converged later. This result indicates that CNN has a strong inductive bias towards locality, which allows the model to be easily generalized with a relatively small amount of data.
-
2) EfficientNet indicated significantly lower computational power compared to ResNet, with approximately 10.25 times fewer G-FLOPs. It shows that EfficientNet provides high performance while efficiently using computational resources.
-
3) We compared traditional methods and the deep learning method (ResNet) on time series data of the Yeonpyeong earthquake and obtained similar results to the traditional method with a difference of 0.1 Hz.
-
4) Deep learning methods proved to be notably accurate and efficient in comparison to traditional methods. This makes them suitable for the automation and systemization of ground motion processing. With future research, it is expected that these methods can be further applied to automated ground motion systems, enabling the acquisition of consistently high-quality ground motion data.