Volume 5, Issue 3 p. 181-194
ORIGINAL RESEARCH
Open Access

Stock index forecasting using DACLAMNN: A new intelligent highly accurate hybrid ACLSTM/Markov neural network predictor

Ashkan Safari

Ashkan Safari

The Faculty of Electrical, and Computer Engineering, University of Tabriz, Tabriz, Iran

Search for more papers by this author
Mohammad Ali Badamchizadeh

Corresponding Author

Mohammad Ali Badamchizadeh

The Faculty of Electrical, and Computer Engineering, University of Tabriz, Tabriz, Iran

Correspondence

Mohammad Ali Badamchizadeh.

Email: [email protected]

Search for more papers by this author
First published: 29 September 2023

Abstract

The authors present the investigation of a new hybrid predictive model of Duplex Attention-based Coupled LSTM Markov Averaged Neural Network, known as DACLMANN. The financial field, particularly the stock market, heavily relies on accurate predictive models. DACLMANN comprises four essential components: two LSTM blocks, an Averagiser and a Markov Neural Network block. The first LSTM block is composed of two hidden layers, each containing 50 neurons and a dense layer with 25 neurons. The second LSTM block consists of two hidden layers, each with 100 neurons, and a dense layer with 50 neurons. The Averagiser plays a crucial role by averaging the closing prices and predicted values from the first LSTM block, resulting in a 90% gain. These averaged values are then fed into the second LSTM block for further prediction. Finally, the predictions undergo evaluation using the Markov model, yielding the final prediction. To assess the performance of DACLMANN, it was tested on 22 years of stock prices for the AMZN index. The evaluation metrics used by the authors include an R2 of 0.76, mean absolute error of 6.81216, root mean square error of 8.6040, Precision of 1, Accuracy of 1, Recall of 1 and F1 of 1. Additionally, DACLMANN achieved a Mean Absolute Percentage Error of less than 0.043% and an RMSPE of less than 2.1%. These results not only demonstrate the effectiveness of the proposed model but also authenticate the prediction outcomes. DACLMANN offers several advantages over traditional predictive models in the stock market. By combining the strengths of Duplex Attention-based Coupled LSTM, Averagiser, and Markov Neural Network, DACLMANN leverages the power of deep learning, attention mechanisms, and sequential modelling. This hybrid approach enables DACLMANN to capture intricate patterns and dependencies present in stock market data, leading to more accurate and reliable predictions. The robust evaluation metrics further validate the superiority of DACLMANN in predicting stock prices.

1 INTRODUCTION

Recent advances in technology and their contribution to finance have caused a significantly increasing impact on financial markets, especially international stock markets and on the world economy. Accordingly, stock market investigation recreates a critical role to achieve knowledge about the stock market, inventing trading strategies and the stocks ingrained value determination [1]. In addition, the stock index and stock market movement predictions are important skills among successful traders'. Therefore, it is the central point of computational quantitative finance including the market’s correlation with the related news [2], Bitcoin (BTC), existing non-linearities and stock movements that are in trending investigation. Besides classical analytical techniques, artificial intelligence-based modellings are accurate enough to be worthy of investigation. In Ref. [3], the authors present a multi-modality graph neural network model for financial time series prediction in order to learn from the multimodal inputs such as news, related events or even historical price series. An algorithm called S_I_LSTM is developed by the authors in Ref. [4] that gathers multiple data and its performance is based on sentiment analysis. In S_I_LSTM, the data sources, such as news, historical data or events, are collected and analysed by a CNN-based sentiment method, and then the result of technical indicators, sentiment analyses and price series are combined to set up the final prediction. The authors in Ref. [5] implied on the stock market characterisation framework, market style clustering and their effect on the market prediction. In the mentioned clustering framework, stock time series data diverged into different lengths windows. Then, hierarchical clustering operated on windows clustering and market style categorisation. As the next step, a distance measurement was utilised to differentiate among rotating patterns of the market styles to verify the market styles usability. In addition, a stock price prediction framework was presented to predict future stock price trends based on data of the market styles. A sentiment index predicting the stock trends relies on the weighted textual contents and financial irregularities, and its combinations with models, such as support vector machine, decision tree, gradient-boosting decision tree, Naive Bayes, K-nearest neighbour, random forest and logistic regression models, are expressed in Ref. [6]. Although the mentioned models and algorithms are used in stock market forecasting, LSTM-based neural network modellings are the techniques that can handle every challenging relation of stock price series to find the correlation in order to have a near to real accurate prediction. Ref. [7] utilised an LSTM predictive model to have a prediction on stock prices with the window size of 60 days, and the metric of Mean Absolute Percentage Error (MAPE). A secondary decomposition, multi-factor analysis and attention-based long short-term memory (ALSTM) are compensated in Ref. [8]. In the mentioned research, two decomposition algorithms preprocessed the initial stock price series to grasp non-linear features and achieve noise filtering. The multi-factor analysis is familiarised as the original data information accompaniment. In the prediction stage, the attention layer is added to the LSTM model to have the related information weight increased. Ref. [9] presented infiltrating future return predictions into portfolio construction strategies framework that the LSTM model trained using monthly stock price data, and the resultant predictions were utilised in portfolio weighting. A CNN and LSTM-based stock prediction framework called stock sequence array convolutional LSTM is employed in Ref. [10]. In this framework, the historical data and its leading indicators’ sequence array are constructed, and then, the array is implemented as the CNN input image. Next, the algorithm extracts certain feature vectors via the convolutional layer and the pooling layer as the LSTM input vector. Ref. [11] and Ref. [12] developed an attention-based LSTM hybrid predictive model to obtain index forecasting, based on stock price data. The mentioned ALSTM modellings achieved the RMSPE of %4 and %8.49, respectively. Also, Ref. [13] and Ref. [14] described the quantum-based intelligent systems and Quantum Neural Networks (QNN) application in prediction and different forecasting fields. A hybrid neural network, known as HTPNN, is developed in Ref. [15] to predict stock volatility. In this model, the distributed model maps the news to a space vector. Furthermore, the space auto-encoder optimally reduces the dimensions of word vector matrix to retain only useful text information. Then, HTPNN utilises deep convolutional layers and LSTM to capture text features and the pattern of stock prices, respectively. To have the most relevant news, DCLs should be adopted in HTPNN. In the next phase, it combines the news and prices to perform the prediction. HTPNN is examined on Dow Jones industrial index, which provides an accuracy of % 69.51 in prediction. Ref. [16] made a deep q-network that performs prediction using stock chart images. Deep Q networks are neural networks utilising deep Q-learning to provide variable models. In Ref. [16], DQN is combined with the convolutional neural network approximator to convert stock chart images as input of the predictor system. The mentioned model gathers the related image data for CNN to convert them to a 32 × 32 × 1 non-coloured input. The CNN model utilised includes six hidden layers. The first four hidden layers are convolutional layers observed by a rectifier non-linearity unit (ReLU), and the rest of the hidden layers are FC layers. In FC layers, ReLU is executed only after the fifth layer. Each of the foremost four hidden layers has sixteen filters in the size of 5 × 5 × 1, sixteen filters in 5 × 5 × 16, thirty-two 5 × 5 × 16 filters, as well as thirty-two 5 × 5 × 32 filters, all with stride one and zero padding, respectively. Also, an ensemble of deep learning model merges two recurrent neural networks, followed by a fully connected neural network, which is expressed in Ref. [17]. This method uses S&P 500 as a test case, including 121 trading date as 83 days of training data, 19 days of validation data and 19 days of test data, respectively, which results in %57.55 reduction in the predicting error. The Hybrid GARCH-LSTM model is among other methods that is utilised in stock prediction [18]. This model is based on a non-linear filtering method to mitigate the volatility engagement property. In order to filter, root-type functions are utilised altering the left-biased and original volatility pointed distribution to a right-shifted volume-upped distribution. The mentioned model engages LSTM as the fundamental implementation model, and the realised volatility of S&P 500 is predicted using the model. The author in Ref. [19] proposes an attention-based graph learning kernel network known as AGKN, a framework blending target stock correlated firm information applied for the price prediction. This model has a stock-axis attention module to pull dynamic and asymmetric correlations through the kernel method and a graph-based learning module integrated accurate information. As a final step, a transformer encoder is implemented to acquire information from further levels for correlation aggregation and prediction. Moreover, the frequency decomposition-based algorithms are investigated in some researches. CEEMD-CNN-LSTM and EMD-CNN-LSTM are expressed in Ref. [20] to obtain forecasting on stock index. CEEMD-CNN-LSTM and EMD-CNN-LSTM are based on CNN, LSTM and empirical mode decomposition (EMD). Also, it concludes that the mentioned CNN or LSTM-based methods provide better performance using CEEMD regarding EMD. An LSTM-P neural network model is utilised in Ref. [21] to make a prediction on Bitcoin and Gold prices between the years 2016 and 2021. Long Short-Term Memory Projection is an LSTM variant that adds a projection layer to additionally optimise the speed and performance of LSTM network. In this model, the data and high frequency noise components are applied by a wavelet transform-based noise reduction method. Then, the LSTM-P model is trained and it predicts the stock price. Ref. [22] has prediction in stock index performing by complete ensemble EMD with adaptive noise (CEEMDAN) decomposing stock index into intrinsic mode functions (IMFs). In this model, the augmented Dickey–Fuller method considers each IMFs stability and trend term. The autoregressive moving average (ARMA) model utilised on stationary time series and an LSTM model extracts unstable time series abstract features. As a last step, the results of each block are combined with each other to perform the final forecasting. The models whale optimisation algorithm, LightGBM and CEEMDAN combine with each other to set up a hybrid algorithm WOA-LightGBM-CEEMDAN which is expressed in Ref. [23]. The model utilises grey correlation analysis of the hog futures price index system to identify influencing factors. Then, the WOA algorithm optimises the LightGBM model parameters, and the residual sequence is disintegrated and habilitated by using the CEEMDAN method to build a joint WOA-LightGBM-CEEMDAN model. Ref. [24] employs CuDNNLSTM algorithms including random forest and LSTM methodologies for the index S&P 500 between the years 1993 and 2018. Random forest of CuDNNLSTM includes 1000 decision trees in the forest, each with the maximum depth of 10. Also, for the LSTM of the model, it is configured by loss function of categorical cross-entropy, RMSProp optimiser (with the learning rate of η = 0.001), batch size of 512, early stopping in patience of 10 epochs, as well as monitoring the validation loss with the validation split value of 0.2. SFLA is among other algorithms using the shuffled frog leaping algorithm to furnish a competitive random search [25]. SFLA is a population-based metaheuristic algorithm combining the memetic benefits with particle swarm optimisation. Also, LSTM–SFLA made a correction approach based on the mutation and crossover to have better performance than a basic SFLA. Two models are presented in Ref. [26] including fast recurrent neural networks and the second model of FastRNNs, CNNs and Bi-Directional LSTM which are utilised for stock index prediction and forecast abrupt changes in the stock prices, respectively. Also, the stock data time interval is considered 1 minute in the aforementioned model. A hybrid algorithm, based on one dimensional CNN and LSTM, is defined in Ref. [27]. BiCuDNNLSTM-1dCNN is a hybrid DL model based on bidirectional CuDNNLSTM and CNN. It includes a bidirectional CUDA deep neural network long short-term memory and a one-dimensional CNN for timely and efficient prediction of stock prices. Ref. [28] utilises LSTM-decision support system for stock market swing trading. This system generates a report, which comprises the predicted values of the company stock for the next 30 days and other technical indicators such as MFI, relative RSI and other indicators. Ref. [28] indicated that the aforementioned systems perform with a root mean square error (RMSE) of 4.13. Deriving the auto regressive fractionally integrated moving average-LSTM (ARFIMA-LSTM) model, Ref. [29] deploys index price prediction on the stock market. ARFIMA-LSTM is a hybrid recurrent network that captures the linear tendencies and the LSTM network that perform the non-linear tendencies. As implied in Ref. [29], the mentioned model has an RMSE of 0.053. To deal with non-linearity in stock markets, Ref. [30] presented a prediction rule ensembles (PRE) technique and deep neural network (DNN)-based model. As a first step, the stock indicators are utilised, and then by using the PRE, the rules with the low RMSEs are selected. As the next step, the DNN network is tuned, based on the data and based on the results of PRE and DNN, the final prediction is made. As noted in Ref. [30], the least RMSE of the mentioned model is 5.60. Moreover, a CNN-based method with company-specific headlines model is utilised in Ref. [31] to perform forecasting. The utilised model performed in the states self-learnt, static and non-static of words, as well as the single-width and multi-width for the convolutional layers. Also, Ref. [31] presented the optimal configuration with a multi-width non-static implementation that performs the classification accuracy of %61.4. A support vector regression and FA methodologies are utilised by the authors in Ref. [32] to forecast stock index using a hybrid model of MFA-SVR. Ref. [33] and Ref. [34] developed Fuzzy-based forecasting methodologies. Also, deep reinforcement learning-based optimal scheduling methodologies can be used in continuous action control to obtain an optimal scheduling strategy [35].

Overall, the new intelligent algorithm DACLMANN can operate smoothly upon a rich amount of data with the RMSPE of %2.1 and MAPE of %0.043. The proposed model deploys two LSTM blocks, each with different properties of hidden and dense layers with variable number of neurons tested on AMZN datasets of historical prices between the years 2000 and 2022. Finally, the paper is constructed as follows:

The principles and DACLMANN operating framework are provided in Sections 2, and 3, respectively. Section 4 illustrates the experiments and results and the comparison is denoted in Section 5. Sections 6 and 7 declare the future works and conclusion, respectively.

2 PRINCIPLES

2.1 Neural networks

Neural networks are biologically inspired programming paradigm that facilitate a computer to learn and be trained from real-world observational data. They are series of algorithms that initiate to recognise the affinities in a dataset via a process that parodies the way the human brain performs. Accordingly, neural networks refer to systems of organic or artificial neurons. The neural network has multiple types of layers. Moreover, based on the following layers, they form a wide range of neural networks such as LSTM, Markov Chain, Hopfield, Recurrent Neural Network or the Feed Forward Neural Networks. Also, these kind of networks are among global approximators [36]. The model developed in the paper utilises LSTM neural network.

2.2 LSTM

LSTM networks are a type of recurrent neural network adept to learn order dependency in sequence prediction concerns. LSTM can process complete data series as well as the single data points. LSTM networks are commonly used in classifying, processing and assembling predictions based on time series data. Accordingly, they can be used in a wide range of fields including stock markets, which are based on time series. The structure of the LSTM network is presented in Figure 1.

Where, xt is the input. ft, and it are the forget gate and input gate, respectively. Cell update and cell state are symbolised by C ˜ t ${\widetilde{C}}_{t}$ , and Ct. Also, Ot, and ht are the output gate and the output, respectively.

Details are in the caption following the image

An LSTM network structure.

2.3 Markov Chain neural network

Markov chain is a stochastic model that presents the possible event series in which the possibility of each event depends only on the state reached in the previous event. Also, it is a mathematical system that encounters transitions from one state to another depending on certain probabilistic rules. The previous state knowledge is necessary to determine the probability distribution of the current state, and the same condition for the next state, based on the current state. The Markov network is a collection of random variables consisting of the Markov property described by an undirected diagram. This concept is known as Markov property formulated as (1):
P X n = i n X n 1 = i n 1 = P X n = i n X 0 = i 0 , X 1 = i 1 , , X n 1 = i n 1 $P\left({X}_{n}={i}_{n}\vert {X}_{n}-1={i}_{n}-1\right)=P\left({X}_{n}={i}_{n}\vert {X}_{0}={i}_{0},{X}_{1}={i}_{1},\text{\ldots },{X}_{n}-1={i}_{n}-1\right)$ (1)
where P presents the probability. Random variables and possible states are denoted by Xn and in, respectively. The proposed model uses Markov chain to accurise the further prediction states, based on the former, and current prediction states.

3 DACLMANN: STRUCTURE AND FRAMEWORK

DACLMANN stands for Duplex Attention-based Coupled LSTM Markov Averaged Neural Network. Thus, it utilises two different LSTM blocks, Markov neural networks and a specialised averagiser. Figure 2 shows the overview of how DACLMANN is developed and utilised in the paper.

Details are in the caption following the image

The overall process of DACLMANN.

In Figure 2, the user enters the initial and final dates as input data. These inputs are then transferred to the database using Yahoo APIs and the received data is processed accordingly. Yahoo APIs are used in this model to obtain price data lists based on the entered dates. The gathered data undergoes processing through the data processing unit (DPU) and is standardised in subsequent steps. The processed data is then fed into the first LSTM block. The predicted data from the first LSTM block is transferred to the averagiser and the second LSTM block. Finally, the predicted data is processed through the Markov chain neural network and displayed to the user. The algorithm of DACLMANN and the averagiser are depicted in Figure 3. In Figure 3a, the LSTM initialises weighted neurons and continues correcting the weights until the output matches the target. The predicted data is then processed by the averagiser and transferred to the next LSTM block. The weighting process is repeated on the model, and the predictions pass through the Markov neural network before being shown to the user. Figure 3b illustrates the operation of the averagiser. The predicted data from the first LSTM block is averaged with the initial close price dataset, with a gain factor of 0.9. The gain factor of 0.9 is determined through trial and error to achieve the best prediction performance. The attention mechanism is implemented using dot product attention. It calculates attention weights by taking the dot product of the LSTM output with itself. Then, the attention weights are normalised using the softmax activation. The context vector is obtained by taking the dot product of the attention weights and the LSTM output. Finally, the context vector and the LSTM output are concatenated and passed through the dense layers to make predictions. The resulting new dataset is subsequently transferred to the second LSTM block.

Details are in the caption following the image

(a) The algorithm of DACLMANN (b) Averagiser.

The properties of the LSTM blocks are summarised in Table 1 below:

TABLE 1. Properties of utilised LSTM models in DACLMANN.
LSTM block Properties
Input Hidden Dense Output
Layer Neuron Layer Neuron Layer Neuron Layer Neuron
LSTM1 1 2 2 50 1 25 1 1
LSTM2 1 2 2 100 1 50 1 1
Weights LSTM1 - 40800 | %32.3 5050 | %4.0 -
Weights LSTM2 - 80400 | %63.7 51 | %0.0 -
DACLMANN is formulated in three sequences of LSTM1, Averagiser, and LSTM2 as:
i 1 t = σ W 1 i h 1 t 1 + U 1 i x 1 t + b 1 i ${i}_{1t}=\sigma \left({W}_{1i}{h}_{1t-1}+{U}_{1i}{x}_{1t}+{b}_{1i}\right)$ (2)
f 1 t = σ W 1 f h 1 t 1 + U 1 f x 1 t + b 1 f ${f}_{1t}=\sigma \left({W}_{1f}{h}_{1t-1}+{U}_{1f}{x}_{1t}+{b}_{1f}\right)$ (3)
O 1 t = σ W 1 o h 1 t 1 + U 1 o x 1 t + b 1 o ${O}_{1t}=\sigma \left({W}_{1o}{h}_{1t-1}+{U}_{1o}{x}_{1t}+{b}_{1o}\right)$ (4)
C ˜ 1 t = σ W 1 h 1 t 1 + U 1 x 1 t + b 1 ${\widetilde{C}}_{1t}=\sigma \left({W}_{1}{h}_{1t-1}+{U}_{1}{x}_{1t}+{b}_{1}\right)$ (5)
C 1 t = f 1 t C 1 t 1 + i 1 t C ˜ 1 t ${C}_{1t}=\left({f}_{1t}\odot {C}_{1t-1}\right)+\left({i}_{1t}\odot {\widetilde{C}}_{1t}\right)$ (6)
h 1 t = O t tanh C 1 t ${h}_{1t}={O}_{t}\odot \mathrm{tanh}\left({C}_{1t}\right)$ (7)
Where, i1t and f1t are the input gate and forget gate, respectively. Output gate and memory cell candidate are denoted by O1t and C ˜ 1 t ${\widetilde{C}}_{1t}$ . C1t, b1 and h1t present the memory cell, bias and output of LSTM1, respectively. As the next sequence, h1t is averaged with the initial closing price selected as the test set with the gain of 0.9. The gain of 0.9 is determined using trial and error in which the system has its nearest prediction and best performance. As well, x2t is the input of LSTM2. The predicted data gathered from the former sequence averaged with the initial data was selected as the test set with a gain of 0.9. As the next step, the results will be replaced by the initial data in the entire data set, which results in a new data set x2t.
x 2 t = h 1 t + x 1 t 2 × 0.9 ${x}_{2t}=\left(\frac{{h}_{1t}+{x}_{1t}}{2}\right)\times 0.9$ (8)
i 2 t = σ W 2 i h 2 t 1 + U 2 i h 1 t + x 1 t 2 × 0.9 + b 2 i ${i}_{2t}=\sigma \left({W}_{2i}{h}_{2t-1}+{U}_{2i}\left(\left(\frac{{h}_{1t}+{x}_{1t}}{2}\right)\times 0.9\right)+{b}_{2i}\right)$ (9)
f 2 t = σ W 2 f h 2 t 1 + U 2 f h 1 t + x 1 t 2 × 0.9 + b 2 f ${f}_{2t}=\sigma \left({W}_{2f}{h}_{2t-1}+{U}_{2f}\left(\left(\frac{{h}_{1t}+{x}_{1t}}{2}\right)\times 0.9\right)+{b}_{2f}\right)$ (10)
O 2 t = σ W 2 o h 2 t 1 + U 2 o h 1 t + x 1 t 2 × 0.9 + b 2 o ${O}_{2t}=\sigma \left({W}_{2o}{h}_{2t-1}+{U}_{2o}\left(\left(\frac{{h}_{1t}+{x}_{1t}}{2}\right)\times 0.9\right)+{b}_{2o}\right)$ (11)
C ˜ 2 t = σ W 2 h 2 t 1 + U 2 h 1 t + x 1 t 2 × 0.9 + b 2 ${\widetilde{C}}_{2t}=\sigma \left({W}_{2}{h}_{2t-1}+{U}_{2}\left(\left(\frac{{h}_{1t}+{x}_{1t}}{2}\right)\times 0.9\right)+{b}_{2}\right)$ (12)
C 2 t = f 2 t C 2 t 1 + i 2 t C ˜ 2 t ${C}_{2t}=\left({f}_{2t}\odot {C}_{2t-1}\right)+\left({i}_{2t}\odot {\widetilde{C}}_{2t}\right)$ (13)
h 2 t = O 2 t tanh C 2 t ${h}_{2t}={O}_{2t}\odot \mathrm{tanh}\left({C}_{2t}\right)$ (14)
Where, i2t and f2t are the input gate and forget gate, respectively. Output gate and memory cell candidate are denoted by O2t and C ˜ 2 t ${\widetilde{C}}_{2t}$ . C2t, b2 and h2t present the memory cell, bias and final output of LSTM2, respectively.

In the next phase, the final predicted data transform to the Markov neural network. Figure 4 illustrates the Markov neural network algorithm.

Details are in the caption following the image

Markov chain neural network algorithm.

Based on Figure 4, R is the order of Markov chain. Pr and H are the closing price and the length of the price list, respectively. Finally, P is the probability distribution for N distinct prices. This algorithm operates based on Equation (1) and forecasts the future prediction state depending on the data available in the current prediction state. The Pseudocode of DACLMANN is presented below.

Algorithm 1. Pseudocode of DACLMANN

  • Input: I P r c 1 = i p r c 11 , i p r c 12 , . . . , i p r c 1 Testset _ Length $\mathbf{I}\mathbf{P}\mathbf{r}{\mathbf{c}}_{1}=\left[ipr{c}_{11},ipr{c}_{12},...,ipr{c}_{1\text{Testset}\_\text{Length}}\right]$

  • Given Parameters: W 1 f , U 1 f , b 1 f , W 1 c ˜ , U 1 c ˜ , b 1 c ˜ , W 1 i , U 1 i , b 1 i , W 1 o , U 1 o , b 1 o ${W}_{1f},{U}_{1f},{b}_{1f},{W}_{1\widetilde{c}},{U}_{1\widetilde{c}},{b}_{1\widetilde{c}},{W}_{1i},{U}_{1i},{b}_{1i},{W}_{1o},{U}_{1o},{b}_{1o}$

  • Initialise W 1 o , C 1 o = 0 ${W}_{1o},{C}_{1o}=\overrightarrow{0}$

  • for t = 1,…, Testset_Length do

  •   Calculate f 1 t , C ˜ 1 t , i 1 t ${f}_{1t},{\widetilde{C}}_{1t},{i}_{1t}$

  •   Update Cell State C1t

  •   Calculate O1t,h1t

  • end for

  • Output: h 1 = h 11 , h 12 , . . . , h 1 T estset _ Length , h 1 i R n ${\mathbf{h}}_{1}=\left[{h}_{11},{h}_{12},...,{h}_{1T\text{estset}\_\text{Length}}\right],{h}_{1i}\in {R}^{n}$

  • for t = 1,…, Test_Length do

  •   Calculate I P r c 2 = h 1 [ t ] × I P r c 1 [ t ] 2 ( 0.9 ) $\mathbf{I}\mathbf{P}\mathbf{r}{\mathbf{c}}_{2}=\frac{{\mathbf{h}}_{1}[t]\times \mathbf{I}\mathbf{P}\mathbf{r}{\mathbf{c}}_{1}[t]}{2}(0.9)$

  • end for

  • Input: I P r c 2 = i p r c 21 , i p r c 22 , . . . , i p r c 2 Testset _ Length $\mathbf{I}\mathbf{P}\mathbf{r}{\mathbf{c}}_{2}=\left[ipr{c}_{21},ipr{c}_{22},...,ipr{c}_{2\text{Testset}\_\text{Length}}\right]$

  • Given Parameters: W 2 f , U 2 f , b 2 f , W 2 c ˜ , U 2 c ˜ , b 2 c ˜ , W 2 i , U 2 i , b 2 i , W 2 o , U 2 o , b 2 o ${W}_{2f},{U}_{2f},{b}_{2f},{W}_{2\widetilde{c}},{U}_{2\widetilde{c}},{b}_{2\widetilde{c}},{W}_{2i},{U}_{2i},{b}_{2i},{W}_{2o},{U}_{2o},{b}_{2o}$

  • Initialise W 2 o , C 2 o = 0 ${W}_{2o},{C}_{2o}=\overrightarrow{0}$

  • for t = 1,…, Testset_Length do

  •   Calculate f 2 t , C ˜ 2 t , i 2 t ${f}_{2t},{\widetilde{C}}_{2t},{i}_{2t}$

  •   Update Cell State C2t

  •   Calculate O2t,h2t

  • end for

  • Output: h 2 = h 21 , h 22 , . . . , h 2 Testset _ Length , h 2 i R n ${\mathbf{h}}_{2}=\left[{h}_{21},{h}_{22},...,{h}_{2\text{Testset}\_\text{Length}}\right],{h}_{2i}\in {R}^{n}$

  • for Counter:= 0 to R-1 do

  •   C[Counter]:= Pr[H-R + Counter]

  • Endfor

  • for h:= R to H-1 do

  •    IS_CONTEXT:= TRUE

  •    for Counter:= 0 to R-1 do

  •     if E[h-R + Counter] ! = C[Counter] then

  •     IS_CONTEXT:= false

  •     break

  •    end if

  • end for

  •   if IS_CONTEXT then

  •      P[Pr[h]]:= P[Pr[h]] + 1

  •   end if

  • end for

  • PREDICTION:= 0

  • MAX:= P[0]

  • for i:= 1 to N-1 do

  •   if P[i] > MAX then

  •      MAX:= P[i]

  •      PREDICTION:= i

  •   endif

  • end for

  • if MAX>0 then

  •    return PREDICTION

  • end if

  • return −1

  • end

4 EXPERIMENTS AND RESULTS

In order to test DACLMANN, the proposed model is used in international stock index “AMZN” forecasting. During the test, 33636 data imported to the model and filtered to 5607 data as a dataset to the model. 1120 of 5607 data were selected for tests and trains from the year 2000 to 2022. The other information of the experiment is available in the Table 2 below:

TABLE 2. The characteristics of the dataset used in DACLMANN.
Data information
Index name AMZN
Total number of data 33636
Number of filtered data 5607
Number of tested and predicted data 1121
Date
Initial date 1/3/2000
Final date 4/12/2022
Duration 22 years
Traded dates 5607
Traded dates selected for training 1121
Mean of the prices during 22 years [$]
Mean of opening price 634.2495
Mean of high price 641.2394
Mean of low price 626.4783
Mean of closing price 607.9739
Mean of hybridised closing price 563.15
Mean of volume 6300565
Variance of the prices during 22 years
Variance of opening price 918997.1
Variance of high price 938976.6
Variance of low price 896540.7
Variance of closing price 801841.6
Variance of hybridised closing price 646523.1
Variance of volume 2.12E+13

Respectively, the data including Adj closing price, Open price, High price, Low price and Traded volumes between the years 2000 and 2022 are visualised and presented in Figure 5.

Details are in the caption following the image

AMZN (a) Adjusted Close (b) Open (c) High, and (d) Low Prices in 22 years.

Accordingly, the proposed algorithm analysed the data, based on the closing prices, and the traded volumes with the mean of 6300565 in 22 years. The results of averaged closing prices with the predictions of LSTM1 block are denoted as hybridised closing prices that provide the mean and variance of 563.15 $ and 646523.1 from 2000 to 2022, respectively. Depending on the gathered results in Table 2, Figure 6 presents the data/time step window.

Details are in the caption following the image

The data/time step window.

As Figure 6 implies, the gathered data is filtered to the closing price as the input of the LSTM1 block. The filtering process is based on the five types of data classifications gathered by the model. Then, the Closing Price is filtered to be transformed to the next prediction processes. The trading data between 2017 and 2022 is selected as the train set. The output of LSTM1 is processed with averagiser to be an input of LSTM2 block. Accordingly, the same size of data analysed with LSTM neural networks including 5607 trading date with 1121 trading data as the test dataset. The result of the model is illustrated in Figure 7.

Details are in the caption following the image

(a), (b) and (c) The prediction result using DACLMANN.

As illustrated in Figure 7, the model used the data between 2000 and 2017 as the train set and from 2017 to 2022 as the test set to perform the prediction. Consequently, the prediction tracks the real value which presents the low error and high accuracy of the model developed in the model. As the next step, the final predicted data is imported to the developed Markov neural network model, and the result is presented in Table 3.

TABLE 3. The results of Markov NN model.
Prior sate State
Downside Upside
Downside 0.586326 0.413674
Upside 0.442393 0.557607
Accordingly, the Markov chain model presents the following states:
  1. If DACLMANN predicts downside in price, it will likely occur with the probability of 0.586326

  2. If DACLMANN predicts upside in price, it will occur with the probability of 0.557607

To measure the accuracy of the developed model, the key performance indicators of Precision, Accuracy, Recall, F1 Score, R2, MAE, RMSE, as well as MAPE (%), and RMSPE (%) are utilised.

4.1 Root mean square error (RMSE)

RMSE is a commonly used metric to measure the average magnitude of the errors between predicted and actual values in regression problems. It is calculated by taking the square root of the average of squared differences between predicted and actual values. RMSE provides a measure of how well a regression model fits the data, with lower values indicating better performance. RMSE is expressed in the same units as the target variable. RMSE is formulated as follows:
RMSE = 1 n Pr c Observed Pr c Predicted 2 $\text{RMSE}=\sqrt{\frac{1}{n}\sum {\left(\mathit{Pr}{c}_{\text{Observed}}-\mathit{Pr}{c}_{\text{Predicted}}\right)}^{2}}$ (15)
Where, n is the number of data, and PrcObserved with PrcPredicted are the observed and predicted values of stock price, respectively.

4.2 R2 (coefficient of determination)

R2, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance in the dependent variable (target variable) that is predictable from the independent variables (features) in a regression model. It ranges from 0 to 1, where 0 indicates that the model does not explain the variability of the target variable, and 1 indicates that the model perfectly predicts the target variable as defined below:
R 2 = 1 Pr c Observed Pr c Pr e dicted 2 Pr c Observed Pr c Mean 2 ${R}^{2}=1-\left(\frac{\sum {\left(\mathit{Pr}{c}_{\text{Observed}}-\mathit{Pr}{c}_{\mathrm{Pr}\mathrm{e}\text{dicted}}\right)}^{2}}{\sum {\left(\mathit{Pr}{c}_{\text{Observed}}-\mathit{Pr}{c}_{\text{Mean}}\right)}^{2}}\right)$ (16)

In which, Pr c Mean $\mathit{Pr}{c}_{\text{Mean}}$ is the average of the index prices.

4.3 Mean absolute error (MAE)

MAE is another metric used to evaluate the performance of regression models. It measures the average absolute difference between predicted and actual values. Unlike RMSE, MAE does not penalise large errors heavily, as it only considers the absolute differences. MAE is also expressed in the same units as the target variable as given below:
MAE = 1 n | Pr c Observed Pr c Pr e dicted | $\text{MAE}=\frac{1}{n}\sum \vert \left(\mathit{Pr}{c}_{\text{Observed}}-\mathit{Pr}{c}_{\mathrm{Pr}\mathrm{e}\text{dicted}}\right)\vert $ (17)

4.4 Accuracy

Accuracy is a commonly used metric to evaluate the performance of classification models. It represents the ratio of correct predictions to the total number of predictions. It is calculated by dividing the number of correct predictions by the total number of predictions and multiplying by 100 to obtain a percentage. Accuracy is suitable when the classes in the dataset are balanced, meaning they have roughly the same number of instances. Accuracy is modelled as follows:
Accuracy = Pr c TP + Pr c TN Pr c TP + Pr c TN + Pr c FP + Pr c FN $\text{Accuracy}=\frac{\mathit{Pr}{c}_{\text{TP}}+\mathit{Pr}{c}_{\text{TN}}}{\mathit{Pr}{c}_{\text{TP}}+\mathit{Pr}{c}_{\text{TN}}+\mathit{Pr}{c}_{\text{FP}}+\mathit{Pr}{c}_{\text{FN}}}$ (18)
where, Pr c TP $\mathit{Pr}{c}_{\text{TP}}$ , and Pr c TN $\mathit{Pr}{c}_{\text{TN}}$ are the correctly predicted positive and negative instances, respectively. Also, the incorrectly predicted positive and negative instances are symbolised by Pr c FP $\mathit{Pr}{c}_{\text{FP}}$ , and Pr c FN $\mathit{Pr}{c}_{\text{FN}}$ .

4.5 Precision

Precision is a metric used in binary classification to measure the proportion of true positive predictions (correctly predicted positive instances) out of the total predicted positive instances. Precision focuses on the accuracy of positive predictions. It is calculated by dividing the number of true positives by the sum of true positives and false positives, as given below:
Precision = Pr c TP Pr c TP + Pr c FP $\text{Precision}=\frac{\mathit{Pr}{c}_{\text{TP}}}{\mathit{Pr}{c}_{\text{TP}}+\mathit{Pr}{c}_{\text{FP}}}$ (19)

4.6 Recall

Recall, also known as sensitivity or true positive rate, is a metric used in binary classification to measure the proportion of true positive predictions out of the total actual positive instances. Recall focuses on the ability of a model to find all positive instances. It is calculated by dividing the number of true positives by the sum of true positives and false negatives. Recall is calculated as given below:
Recall = Pr c TP Pr c TP + Pr c FN $\text{Recall}=\frac{\mathit{Pr}{c}_{\text{TP}}}{\mathit{Pr}{c}_{\text{TP}}+\mathit{Pr}{c}_{\text{FN}}}$ (20)

4.7 F1 score

The F1 score is a metric that combines precision and recall into a single value. It is the harmonic mean of precision and Recall, providing a balanced measure of a model's performance. The F1 score ranges from 0 to 1, where 1 indicates the best possible performance, which is given below:
F 1 = 2 Precision × Recall Precision + Recall ${F}_{1}=2\frac{\text{Precision}\times \text{Recall}}{\text{Precision}+\text{Recall}}$ (21)

4.8 Mean absolute percentage error (MAPE)

It is a metric used to measure the accuracy of a forecast or prediction model. It calculates the average percentage difference between predicted and actual values. A lower MAPE indicates higher accuracy as expressed below:
MAPE ( % ) = 1 n i = 1 n | Pr c i , actual Pr c i , predicted Pr c i , actual | $\text{MAPE}\,(\%)=\frac{1}{n}\sum\limits _{i=1}^{n}\vert \frac{\mathit{Pr}{c}_{i,\text{actual}}-\mathit{Pr}{c}_{i,\text{predicted}}}{\mathit{Pr}{c}_{i,\text{actual}}}\vert $ (22)

4.9 Root mean square percentage error

RMSPE stands for Root Mean Square Percentage Error. It is another metric that is used to evaluate the accuracy of a forecast or prediction model. RMSPE measures the percentage difference between predicted and actual values. However, instead of calculating the mean absolute percentage difference, RMSPE calculates the root mean square of the percentage differences.
RMSPE ( % ) = i = 1 n Pr c i , actual Pr c i , predicted 2 n × Pr c i , actual $\text{RMSPE}\,(\%)=\sqrt{\frac{\sum\limits _{i=1}^{n}{\left(\mathit{Pr}{c}_{i,\text{actual}}-\mathit{Pr}{c}_{i,\text{predicted}}\right)}^{2}}{n\times \mathit{Pr}{c}_{i,\text{actual}}}}$ (23)
where, n is the total amount of data. Prci,actual and Prci,predicted symbolise the actual price and the predicted price, respectively. Based on Equation (16), two factors are effective in the model accuracy, which are the number of gathered data and how near the predicted value is to the actual one which makes the RMSPE to tend to zero. In this paper, DACLMANN processes 33636 number of data as well as the results (Figure 7) that show the predicted value is near to the actual price. As a result, the MAPE (%) and RMSPE (%) of DACLMANN are about %0.043 and %2.1, respectively, which means for every 100 data analyses, 98 data tracked the developed model that leads to an accurate prediction. In addition, prediction, accuracy, recall and F1 score achieved the same value of 1.0. Moreover, 6.81216, 8.06040 and 0.76 were calculated for MAE, RMSE and R2, respectively.

Also, the same dataset is analysed with the LSTM model, and the results are illustrated in Figures 9 and 10.

As Figures 8-10 present, the more data the LSTM analyses, the less RMSPE it performs. However, DACLMANN performs in a different way which makes its RMSPE to remain nearly constant around %2.1 and MAPE around %0.043. Depending on Figure 8, the real values are not fully tracked by the predictions, which means RMSPE will be higher. The measure RMSPE of LSTM model is %6 such that in every 100 data, 94 of them will be nearly tracked by the prediction. Figure 11 presents the performance of the model in different gains.

Details are in the caption following the image

Result of the dataset analysed by LSTM in 22 years (2000–2022).

Details are in the caption following the image

Result of the dataset analysed by LSTM in 15 years (2007–2022).

Details are in the caption following the image

Result of the dataset analysed by LSTM in a year (2021–2022).

Details are in the caption following the image

Result of the dataset analysed by DACLMANN in gains of (a) 0.8, (b) 0.7, (c) 0.6, (d) 0.5, (e) 0.4, and (f) 0.3.

5 COMPARISON

In order to compare intelligent models, and evaluate their performance, the metrics of MAPE (%), MAE, RMSPE (%), RMSE and R2 are defined, and the process is performed between eight models and DACLMANN. Moreover, the comparison results authenticate that the developed model in the paper has less MAPE than the other previously published developed models. The comparison is available in Table 4.

TABLE 4. Comparison between the proposed model and the other approaches.
Developed model MAPE (%) MAE RMSPE (%) RMSE R2
LSTM-P [21] 4.81 - - 0.03 -
LSTM-SFLA [25] 2.62 - - 2.73 -
LSTM-decision support [28] 1.21 3.24 - 4.13 -
EGARCH [37] 16.99 - - 5.073 -
VAR [38] 7.65 - - - -
AEI-DNET [39] 0.40 10.01 - 12.09 -
StockNet-C [40] 0.82 69.93 - 0.08 -
CEEMDAN-LSTM [41] 0.16 3.21 - 4.82 -
DACLMANN Gain of 0.9 0.043 6.81216 2.1 8.6040 0.76
DACLMANN Gain of 0.8 2.49 6.05 3.23 7.64 0.76
DACLMANN Gain of 0.7 2.78 5.86 3.66 7.50 0.76
DACLMANN Gain of 0.6 2.71 4.91 3.57 6.27 0.71
DACLMANN Gain of 0.5 3.39 5.12 4.35 6.41 0.57
DACLMANN Gain of 0.4 2.40 2.40 3.11 3.74 0.77
DACLMANN Gain of 0.3 3.31 3.1 3.86 3.67 0.61

6 FUTURE WORKS

Although LSTM-based classical models work in an acceptable range, they can perform with a limited amount of gathered big data. Therefore, if they receive a significant amount of big data, they will have lack of memory or execution time, which make them perform inefficiently. Respectively, a superior technology with a higher processing speed and the ability of high amount of big data analysis known as “Quantum Technology” is required to cover the challenge. As intelligent systems are utilised in stock market analysis, QNN and Quantum Long/Short Term Memory can be developed. Thus, they are quantum-based which means they have an ultrafast processing speed with big data analysis ability that will tend to a near zero RMSPE that is acceptable and dependable in market analysis.

7 CONCLUSION

An intelligent system Duplex Attention-based Coupled LSTM Markov Averaged Neural Network, known as DACLMANN, was developed and investigated in this paper. The performance of DACLMANN was evaluated using the international stock market, with the AMZN index serving as a test bench for the model. The evaluation metrics for DACLMANN were as follows: R2 of 0.76, MAE of 6.81216, RMSE of 8.6040, as well as Precision of 1, Accuracy of 1, with Recall of 1 and F1 of 1. DACLMANN utilised 22 years of data for analysis and achieved an MAPE of less than 0.043% and an RMSPE than 2.1%. These results validate the accuracy of the algorithm regarding other intelligent and classical methodologies. The paper's innovative contributions include the development of an intelligent hybrid LSTM-based predictive algorithm with a forecasting performance of more than 99% accuracy, a nearly constant RMSPE and MAPE across different amounts of gathered data. Additionally, a new data filtering method was introduced, and the prediction distribution was found to be similar to the target distribution.

CONFLICT OF INTEREST STATEMENT

The authors declare that they have no conflict of interest.

DATA AVAILABILITY STATEMENT

The datasets generated during the current study are available from the authors on reasonable request via email.