LSTM for Timeseries Forecasting
Anton A. Nesterov | an (at) vski.sh |
Version | 1.0 |
In this notebook we'll explore a Long Short-Term Memory (LSTM) for forecasting tasks. We'll use Bitcoin price for BTC-EUR pair as an example.
How LSTMs work
LSTM uses the cell state and three gates that control the flow of information. These gates allow the network to learn which information is important to keep or discard over long sequences, thus overcoming the vanishing gradient problem seen in simple RNNs.
Forget Gate
Decides what information to discard from the cell state. It uses the previous hidden state ()
Input Gate
Decides what new information to store in the cell state. This involves a sigmoid
layer to decide which values to update and a tanh
layer to create a vector of new candidate values.
Output Gate
Decides what the next hidden state should be. The output is a filtered version of the cell state.
Python Implementation
We will use TensorFlow and Keras to build the model. Keras has a readu to use LSTM layer class.
! pip install -q numpy pandas matplotlib scikit-learn tensorflow
import warnings ; warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import AdamW
from IPython.display import display, Markdown
print_df = lambda df: display(Markdown(df.to_markdown()))
The Dataset
We have bitcoin price data for 2018-2025 aggregated by 1 day interval.
Download Dataset
df = pd.read_csv('bitcoin.csv')
df['timeOpen'] = pd.to_datetime(df['timeOpen'], unit='ms')
print_df( df.head(2) )
timeOpen | timeClose | timeHigh | timeLow | priceOpen | priceHigh | priceLow | priceClose | volume | |
---|---|---|---|---|---|---|---|---|---|
0 | 2025-09-07 12:00:00 | 1757332799999 | 1757330880000 | 1757246580000 | 110221 | 111591 | 110212 | 111168 | 24618007520 |
1 | 2025-09-06 12:00:00 | 1757246399999 | 1757172540000 | 1757231700000 | 110651 | 111275 | 110024 | 110225 | 21500719036 |
df = df.sort_values('timeOpen')
print_df( df.head(2) )
timeOpen | timeClose | timeHigh | timeLow | priceOpen | priceHigh | priceLow | priceClose | volume | |
---|---|---|---|---|---|---|---|---|---|
2550 | 2018-08-22 12:00:00 | 1535025599999 | 1534946642000 | 1535019540000 | 6486.25 | 6816.79 | 6310.11 | 6376.71 | 4668110000 |
2549 | 2018-08-23 12:00:00 | 1535111999999 | 1535105640000 | 1535025840000 | 6371.34 | 6546.54 | 6371.34 | 6534.88 | 3426180000 |
df.plot(x='timeOpen', y="priceClose")
plt.show()
Univariate LSTM
LSTMs require the data to be in a specific format: [samples, time_steps, features]
. Univariate means that we use one feature for prediction. In this case, for exmple, we just extrapolate closing prices using LSTM:
closing_prices = df['priceClose'].values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(closing_prices)
def create_dataset(dataset, time_step=60):
dataX, dataY = [], []
for i in range(len(dataset) - time_step - 1):
a = dataset[i:(i + time_step), 0]
dataX.append(a)
dataY.append(dataset[i + time_step, 0])
return np.array(dataX), np.array(dataY)
X, y = create_dataset(scaled_data)
X = X.reshape(X.shape[0], X.shape[1], 1)
training_size = int(len(X) * 0.8)
test_size = len(X) - training_size
X_train, X_test = X[0:training_size], X[training_size:len(X)]
y_train, y_test = y[0:training_size], y[training_size:len(y)]
print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
X_train shape: (1992, 60, 1), y_train shape: (1992,)
model = Sequential([
LSTM(units=50, return_sequences=True, input_shape=(60, 1)),
Dropout(0.2),
LSTM(units=50, return_sequences=True),
Dropout(0.2),
LSTM(units=50),
Dropout(0.2),
Dense(units=1)
])
model.compile(optimizer=AdamW(learning_rate=0.001), loss='mean_squared_error')
history = model.fit(
X_train,
y_train,
validation_data=(X_test, y_test),
epochs=50,
batch_size=64,
verbose=1
)
Visualizing Predictions
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)
train_predict = scaler.inverse_transform(train_predict)
test_predict = scaler.inverse_transform(test_predict)
y_train_actual = scaler.inverse_transform(y_train.reshape(-1, 1))
y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))
time_step = 60
plt.figure(figsize=(16, 8))
# Plot original data
plt.plot(scaler.inverse_transform(scaled_data), label='Original Price', color='black')
# Plot training predictions
# We need to shift the training predictions for plotting
train_predict_plot = np.empty_like(scaled_data)
train_predict_plot[:, :] = np.nan
train_predict_plot[time_step:len(train_predict) + time_step, :] = train_predict
plt.plot(train_predict_plot, label='Train Predictions', color='orange')
# Plot testing predictions
# We need to shift the test predictions for plotting
test_predict_plot = np.empty_like(scaled_data)
test_predict_plot[:, :] = np.nan
test_predict_plot[len(train_predict) + time_step:len(scaled_data) - 1, :] = test_predict
plt.plot(test_predict_plot, label='Test Predictions', color='green')
# Final plot settings
plt.title(f'BTC-EUR Price Prediction')
plt.xlabel('Time (Days)')
plt.ylabel('Price (USD)')
plt.legend()
plt.grid(True)
plt.show()
Multivariate LSTM
Extrapolating price with a LSTM usually works better than a regression model. However market prices can rarely be predicted just from one variable, in a production model we would have more variables that are believed to impact the price. Just for this example, let's try to predict closing price with the other values we have.
Beware - this is just an example, in reality we would have other market data including sentiment analysis and even other indicators that may impact the praice.
multivariate_df = df[['priceOpen', 'priceHigh', 'priceLow', 'volume', 'priceClose']]
print_df( multivariate_df.head(3) )
priceOpen | priceHigh | priceLow | volume | priceClose | |
---|---|---|---|---|---|
2550 | 6486.25 | 6816.79 | 6310.11 | 4.66811e+09 | 6376.71 |
2549 | 6371.34 | 6546.54 | 6371.34 | 3.42618e+09 | 6534.88 |
2548 | 6551.52 | 6719.96 | 6498.64 | 4.09782e+09 | 6719.96 |
num_features = multivariate_df.shape[1]
multivariate_df.shape
(2551, 5)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(multivariate_df)
scaled_data[1, 4]
np.float64(0.027459766439369378)
training_size = int(len(scaled_data) * 0.8)
train_data = scaled_data[0:training_size, :]
test_data = scaled_data[training_size:len(scaled_data), :]
def create_multivariate_dataset(dataset, target_index, time_step=60):
dataX, dataY = [], []
for i in range(len(dataset) - time_step - 1):
a = dataset[i:(i + time_step), :]
dataX.append(a)
dataY.append(dataset[i + time_step, target_index])
return np.array(dataX), np.array(dataY)
time_step = 60
X_train, y_train = create_multivariate_dataset(train_data, 4, time_step)
X_test, y_test = create_multivariate_dataset(test_data, 4, time_step)
print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
X_train shape: (1979, 60, 5), y_train shape: (1979,)
model2 = Sequential([
LSTM(units=50, return_sequences=True, input_shape=(60, 5)),
Dropout(0.2),
LSTM(units=50, return_sequences=True),
Dropout(0.2),
LSTM(units=50),
Dropout(0.2),
Dense(units=1)
])
model2.compile(optimizer=AdamW(learning_rate=0.001), loss='mean_squared_error')
history2 = model2.fit(
X_train,
y_train,
validation_data=(X_test, y_test),
epochs=50,
batch_size=64,
verbose=1
)
Visualizing Predictions
test_predict = model2.predict(X_test)
prediction_copies = np.repeat(test_predict, num_features, axis=-1)
test_predict_actual = scaler.inverse_transform(prediction_copies)[:, 4]
y_test_reshaped = y_test.reshape(-1, 1)
y_test_copies = np.repeat(y_test_reshaped, num_features, axis=-1)
y_test_actual = scaler.inverse_transform(y_test_copies)[:, 4]
plt.figure(figsize=(16, 8))
plt.plot(y_test_actual, label='Actual Price', color='blue')
plt.plot(test_predict_actual, label='Predicted Price', color='red', linestyle='--')
plt.title(f'BTC-EUR Price Prediction (Multivariate)')
plt.xlabel('Time (Days in Test Set)')
plt.ylabel('Price (EUR)')
plt.legend()
plt.grid(True)
plt.show()
[1m15/15[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
Further Improvements
Excluding feature engineering there are two approaches that never fail:
Hyperparameter Tuning - experiment with different numbers of LSTM layers, units per layer, dropout rates, and learning rates to find the best combination for your data.
Ensemble Models - Train multiple models and average their predictions to potentially get a more accurate and robust forecast.