SeizureNet: a model for robust detection of epileptic seizures based on convolutional neural network

: Epilepsy is a neurological disorder and generally detected by electroencephalogram (EEG) signals. The manual inspection of epileptic seizures is a time-consuming and laborious process. Extensive automatic detection algorithms were proposed by using traditional approaches, which show good accuracy for several specific EEG classification problems but perform poorly in others. To address this issue, the authors present a novel model, named SeizureNet, for robust detection of epileptic seizures using EEG signals based on convolutional neural network. Firstly, they utilise two convolutional neural networks to extract time-invariant features from single-channel EEG signals. Then, a fully connected layer is employed to learn high-level features. Finally, these features are supplied to a softmax layer to classify. They evaluated the model on a benchmark database provided by the University of Bonn and adopted a ten-fold cross-validation approach. The proposed model has achieved the accuracy of 98.50–100.00% in classifying non-seizure and seizure, 97.00–99.00% in classifying healthy, inter-ictal and ictal, and 95.84% in classifying among five-class EEG states.


Introduction
Epileptic seizures are caused by a disturbance in the electrical activity of the brain. Electroencephalogram (EEG) is an electrophysiological monitoring method to record the electrical activity of the brain. The EEG is the prime signal that has been widely used for detecting epilepsy [1][2][3]. The visual inspection of EEG is a time-consuming and laborious process. Hence, automatic EEG signal analysis for clinical screening is necessary for the diagnosis of epilepsy.
Recently, numerous research work has been carried out to automatic detection of epileptic seizures using EEG signals and evaluated on the University of Bonn EEG dataset, a widely used benchmark database for seizure recognition. The published work related to EEG-based epileptic seizure detection mainly involves three classification problems: two-class classification, three-class classification, and five-class classification. The details will be introduced in a later section. The approaches fall into two major groups: traditional methods and deep learning methods. Most conventional methods first extract features from raw EEG and then fed to the classifier for classification. The feature exaction techniques contain Fourier transform (FT) [4,5], discrete wavelet transform (DWT) [6][7][8][9][10], approximate entropy, [7,9], Tsallis entropy [11], local mean decomposition (LMD) [12], tunable-Q wavelet transform (TQWT) [13] and so on. Besides, several methods use multiple feature extraction techniques. Such as Bhattacharyya et al. [13] utilised TQWT and k-nearest neighbour entropy (KNNE) to exact the features of EEG. The commonly used classifier involves the artificial neural network [9], decision trees [4], k-nearest neighbour (KNN) [10], support vector machine (SVM) [8] and so on. Also, many techniques adopted an optimised classifier for classification. For example, Zhang and Chen [12] employed SVM optimised by genetic algorithm (GA-SVM) for classification.
The performance of these traditional techniques depends on handwrought feature extractors, as well as selected classifiers. To address this issue, many approaches based on deep learning technology were proposed. Such as Petrosian et al. [14] applied recurrent neural networks combined with signal wavelet decomposition to learn temporal patterns for epileptic seizure detection. Lin et al. [15] proposed a deep learning framework based on stacked sparse autoencoder to learn the sparse and highlevel representations of EEG signals. Hussein et al. [1] developed an optimised deep neural network based on long short-term memory to learn the temporal dependencies in EEG data for the robust detection of epileptic seizures.
Epilepsy detection based on convolutional neural networks (CNNs) has also attracted much attention. Acharya et al. [16] implemented a 13-layer deep CNN algorithm to detect normal, preictal, and seizure classes. Liu et al. [17] also developed their models based on CNN, which show good accuracy for two-class classification but perform poorly for three-class classification. Besides, Zhao et al. [18] designed a model based on CNN, which consists of three convolutional blocks and three fully-connected (FC) layers. And each convolutional block composes of five types of layers. They achieved good experimental results on two-class and three-class classifications.
Besides, several approaches that combine CNN with traditional technologies have been proposed for seizure detection. Ullah et al. [2] implemented a system that is an ensemble of pyramidal onedimensional (1D) CNN models, which combined CNN with a majority vote (MV). San-Segundo et al. [19] presented a model based on FT and CNN. Gao et al. [20] proposed a method based on ApEn, recurrence quantification analysis, and CNN. Türk and Özerdem [21] obtained 2D frequency-time scalograms of EEG records by using continuous wavelet transform (CWT), then fed into CNN.
To the best of our knowledge, so far no one CNN model, which using EEG signals for epilepsy detection, performs well in all three classification problems in the Bonn EEG dataset. As such, we try to develop a novel model based on CNN, named SeizureNet, to address this issue. Different from the above-mentioned CNN models, the proposed model employs two CNNs to extract timeinvariant features from EEG signals. Experimental results indicate that the proposed model performs well in all three classification problems.
This paper is organised as follows. Section 2 describes the benchmark EEG dataset, which is used in this work, and the methodology. Section 3 introduces the experiments and the results obtained, comparing the proposed method with the state-of-the-art techniques. Finally, Section 4 gives our conclusion.

Description of EEG dataset
In this study, we use the dataset of the University of Bonn [22], which is a publicly available online database and widely used for epileptic seizure recognition. The whole database consists of five subsets (denoted as A -E), each set contains 100 single-channel EEG segments of 23.6-s duration. Sets A (eyes open) and B (eyes closed) were recorded from five healthy volunteers when relaxed in an awake state. Sets C, D, and E were taken from five patients with epilepsy. Thereinto, Sets C and D are obtained during seizure-free intervals. Set C was taken from the hippocampal formation of the opposite hemisphere of the brain. Set D was recorded from within the epileptogenic zone. Set E is the recording of seizure activity. All EEG signals were recorded with the same 128-channel amplifier system with a sampling rate of 173.61 Hz. The samples of EEG signals are shown in Fig. 1.
The seizure detection on the Bonn EEG database mainly involves three classification problems: two-class, three-class, and five-class classification problems. Most of the two-class seizure recognition problems focus on classifying between non-seizure and seizure. The three-class EEG classification mainly focuses on the grouping of three different EEG categories (healthy, inter-ictal, and ictal). The five-class classification involves distinguishing among five EEG states (sets A, B, C, D, and E).

Architecture of the proposed model
The architecture of the proposed model is shown in Fig. 2. The overall architecture is as follows. Firstly, the single-channel EEG segment is normalised to zero mean and unit variance. Secondly, two CNNs are utilised to extract the features from EEG signals. Each CNN composes of five convolutional layers and three maxpooling layers. A pooling layer immediately follows only the first convolutional layer. Then there are two convolutional layers followed by a max-pooling layer, and this structure is repeated twice. Feature maps extracted by the two CNNs are concatenated. Afterwards, an FC layer is applied to learn high-level features. To prevent the overfitting, a dropout layer is applied after the FC layer. Finally, the most robust EEG features are supplied to a softmax layer for classifying.
Each convolutional layer performs three operations in turn: 1D convolution, batch normalisation (BN), and applying the rectified linear unit (ReLU) activation. At the first convolutional layers, we utilise two CNNs with small and large receptive fields to extract time-invariant features from single-channel EEG segment. This idea is inspired by the DeepSleepNet [23], which is an excellent model for automatic sleep stage scoring based on raw singlechannel EEG signals. The authors introduce that the small filter is better to capture temporal information, while the larger filter is better to capture frequency information.
The specification details of the model are also shown in Fig. 2. For example, '89 conv, 16, /5' indicates that the size of the receptive field is 1 × 89, the number of kernels is 16, and the stride size equals to 5. '4 max-pool, /4' indicates that the pooling size is   1 × 4, using max-pooling, and the stride size equals to 4. '24 fc' means 24 neurons. '0.2 dropout' indicates that a random dropout rate is 20%. '2/3/5 softmax' means using softmax for two-class or three-class or five-class classification.

Operations of convolutional layer
The CNN [24] is one of the most popular algorithms for deep learning. Each convolutional layer of the proposed model performs three operations: 1D convolution, BN, and ReLU activation. The 1D convolution operation is employed to filter 1D signals for learning discriminative features, and is as follows: The technique of BN is proposed by Ioffe and Szegedy [25] to standardise the inputs to a network, applied to either the activations of a prior layer of data directly. BN takes a step towards reducing the internal covariate shift, and in doing so, dramatically accelerates the training of deep neural networks. BN transform is as follows: where B = {x 1…m } is values of x over a mini-batch, and are parameters which need to be learned. Convolutional and BN layers in the neural networks are usually followed by a non-linear activation function. The ReLU function is the most commonly used activation function in neural networks, especially in CNN. For a model that adopts ReLU is easier to train and often achieves better performance. Equation (6) shows the ReLU function

Feature fusion and classification
A max-pooling layer performs down-sampling by dividing the input into several pooling regions and computing the maximum of each area. The objective is to reduce its dimensionality. The feature maps extracted by two CNNs are concatenated and fully connected with all the neurons of the FC layer. The FC layer is a usual manner of learning non-linear combinations of these high-level features. The FC layer is followed by a dropout layer. Dropout is a simple technique to prevent neural networks from overfitting [26].
The key idea of dropout is to randomly drop units (along with their connections) from the neural network during training. The technology can significantly reduce overfitting and gives major improvements over other regularisation methods. As shown in Fig. 2, we add a softmax layer at the last of the model to generate label predictions. Softmax is a generalisation of logistic regression in handling multiple classes, which computes the probability distribution of the k output classes. In our EEG classification problems, k is the total number of categories (2 or 3 or 5). Suppose the ith sample is denoted by x (i) , and its label is represented by y (i) . The softmax function, denoted by h θ x i , is defined as follows: where θ 1 , θ 2 , …, θ k are the parameters of the model. Output values of p are between 0 and 1 and their sum equals to 1.

Performance measures
For evaluation, we adopted ten-fold cross-validation in this study. The EEG signals are randomly divided into ten folds. Nine folds of EEG signals are adopted to learn the model while the remaining one-fold is used for testing. This strategy is repeated ten times by shifting the training and test dataset. The average performance is calculated for 10 evaluations. We performed experiments on 14 different data combinations (involving classification problems of two-class, three-class and five-class) to more fully evaluate the performance of the proposed model. Three statistical metrics such as accuracy (Acc), sensitivity (Sen) and specificity (Spe) are applied. The metric definitions are given as follows: where TP is true positive; TN is true negative; FP is false positive; and FN is false negative.

Training of CNN
Training of the proposed model needs the weight parameters to be learned from the EEG signals. For learning these parameters, we utilised the conventional back-propagation method with binary cross-entropy loss function and stochastic gradient descent approach with Adam optimiser. The hyper-parameters of optimiser are listed as follows: the learning rate is 0.0005; beta1 is 0.9; beta2 is 0.999. In dropout, a probability value of 0.2 is used. In all the experiments, the model was trained with 200 iterations, and each batch size is 50. The proposed model was implemented in Keras, a powerful deep learning library, which runs on top of TensorFlow. We trained the SeizureNet on a workstation with Intel Core i9-9820X CPU @ 3.30 GHz, 64 GB RAM, Nvidia GeForce RTX 2080 Ti Graphics Card. from patients with epilepsy. Therefore, the performance metrics on the C versus D versus E case are all the worst.

Performance of the proposed model
The five-class classification problem is more complicated and challenging to solve compared to the two-class and three-class classification problems but has an advantage in numerous clinical applications. The accuracy, sensitivity, and specificity of the model on classifying five-class classification problem are 95.84, 89.40, 97.35%, respectively.

Disscussion
Numerous approaches have been proposed for the automatic detection of epileptic seizures. Among them, most methods only discuss several specific two-class or three-class EEG classification problems. Moreover, only a few methods have been presented for the five-class EEG category problem, which is a more challenging problem. The performance comparison with state-of-the-art methods is given in Fig. 3. To compare the differences between them in more detail, the accuracy of these algorithms are shown in Table 2.
These state-of-the-art methods can be divided into three categories: traditional techniques, CNN combined with conventional approaches, and CNN methods. In Fig. 3, they are shown in green, blue, and red, respectively. Overall, these algorithms perform well in several two-class and three-class classification problems, slightly worse in the five-class EEG classification problem.
In several two-class classification problems, such as for A versus E, C versus E, and D versus E cases, the accuracy of the proposed model obtained an excellent result of tied first. For the AB versus E case, the performance of the proposed method is best, with an accuracy of 100%. For the B versus E case, the accuracy of the presented model is slightly lower than other approaches, which is 99.00%. In other cases of two-class classification, the proposed model performance is at a moderate level.
We also examine the effectiveness of the proposed model to differentiate between three distinct classes of EEG activities: healthy, inter-ictal, and ictal. For the A versus C versus E case, the accuracy of the proposed model achieved a good result of tied first. For the B versus D versus E case, the proposed approach performs best with higher than other methods about 1%. In the other threeclass classification problems, the model performs moderately.
Besides, the C versus D versus E case is challenging to recognise. They are taken from epilepsy patients, and C and D are recorded during the seizure interval. Notably, our accuracy is significantly higher than the method presented by Türk and Özerdem [21], which is comparable to our model in other two-class and three-class EEG classification problems. This indicates that our model has a better generalisation ability.
The five-class EEG classification problem is complicated and difficult, which needs to recognise the differentiation between EEG segments belonging to the same class (sets A and B are both healthy, and sets C and D are both inter-ictal). The accuracy of the proposed model reaches 95.84%, higher than the other models about 2.2%.
Although the proposed model has obtained the encouraging seizure detection results, it should be noted that this study has examined only on the Bonn EEG dataset. Moreover, the accuracy of the C versus D versus E case and the A versus B versus C versus D versus E case need to be further improved.

Conclusion
This study introduces a new model for the robust detection of epileptic seizures using EEG signals based on CNN. The novelty of this proposed model is that two CNNs are utilised to learn the features from single-channel EEG epochs. Compared to previous works, the proposed model performs well in various EEG classification problems on the University of Bonn EEG database. Our future research directions may include but are not limited to the application of the proposed technique for diagnosing other disorders.

Acknowledgments
The authors acknowledge the support of the Education and Scientific Research Project for Young and Middle-aged Teachers of Fujian Province, China (JAT191153) and the High-level Basebuilding Project for Industrial Technology Innovation (1021GN204005-A06).