LSTM for Timeseries Forecasting


Anton A. Nesterov	an (at) vski.sh
Version	1.0

In this notebook we'll explore a Long Short-Term Memory (LSTM) for forecasting tasks. We'll use Bitcoin price for BTC-EUR pair as an example.

How LSTMs work

LSTM uses the cell state and three gates that control the flow of information. These gates allow the network to learn which information is important to keep or discard over long sequences, thus overcoming the vanishing gradient problem seen in simple RNNs.

Forget Gate
Decides what information to discard from the cell state. It uses the previous hidden state ( $h_{t-1}$ )

F_t = sigmoid(W_f · [h_{t-1}, x_t] + b_f)

Input Gate
Decides what new information to store in the cell state. This involves a sigmoid layer to decide which values to update and a tanh layer to create a vector of new candidate values.

I_t = sigmoid(W_i · [h_{t-1}, x_t] + b_i)

C_t = tanh(W_C · [h_{t-1}, x_t] + b_C)

Output Gate
Decides what the next hidden state should be. The output is a filtered version of the cell state.

O_t = sigmoid(W_o · [h_{t-1}, x_t] + b_o)

h_t = O_t * tanh(C_t)

Python Implementation

We will use TensorFlow and Keras to build the model. Keras has a readu to use LSTM layer class.

! pip install -q numpy pandas matplotlib scikit-learn tensorflow

import warnings ; warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import AdamW
 
from IPython.display import display, Markdown
print_df = lambda df: display(Markdown(df.to_markdown()))

The Dataset

We have bitcoin price data for 2018-2025 aggregated by 1 day interval.
Download Dataset

df = pd.read_csv('bitcoin.csv')
df['timeOpen'] = pd.to_datetime(df['timeOpen'], unit='ms')
 
print_df( df.head(2) )

	timeOpen	timeClose	timeHigh	timeLow	priceOpen	priceHigh	priceLow	priceClose	volume
0	2025-09-07 12:00:00	1757332799999	1757330880000	1757246580000	110221	111591	110212	111168	24618007520
1	2025-09-06 12:00:00	1757246399999	1757172540000	1757231700000	110651	111275	110024	110225	21500719036

df = df.sort_values('timeOpen')
print_df( df.head(2) )

	timeOpen	timeClose	timeHigh	timeLow	priceOpen	priceHigh	priceLow	priceClose	volume
2550	2018-08-22 12:00:00	1535025599999	1534946642000	1535019540000	6486.25	6816.79	6310.11	6376.71	4668110000
2549	2018-08-23 12:00:00	1535111999999	1535105640000	1535025840000	6371.34	6546.54	6371.34	6534.88	3426180000

df.plot(x='timeOpen', y="priceClose")
plt.show()

png

Univariate LSTM

LSTMs require the data to be in a specific format: [samples, time_steps, features]. Univariate means that we use one feature for prediction. In this case, for exmple, we just extrapolate closing prices using LSTM:

closing_prices = df['priceClose'].values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(closing_prices)

def create_dataset(dataset, time_step=60):
  dataX, dataY = [], []
  for i in range(len(dataset) - time_step - 1):
    a = dataset[i:(i + time_step), 0]
    dataX.append(a)
    dataY.append(dataset[i + time_step, 0])
  return np.array(dataX), np.array(dataY)
 
X, y = create_dataset(scaled_data)
 
X = X.reshape(X.shape[0], X.shape[1], 1)
 
training_size = int(len(X) * 0.8)
test_size = len(X) - training_size
X_train, X_test = X[0:training_size], X[training_size:len(X)]
y_train, y_test = y[0:training_size], y[training_size:len(y)]
 
print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")

X_train shape: (1992, 60, 1), y_train shape: (1992,)

model = Sequential([
    LSTM(units=50, return_sequences=True, input_shape=(60, 1)),
    Dropout(0.2),
    LSTM(units=50, return_sequences=True),
    Dropout(0.2),
    LSTM(units=50),
    Dropout(0.2),
    Dense(units=1)
])
 
model.compile(optimizer=AdamW(learning_rate=0.001), loss='mean_squared_error')

history = model.fit(
    X_train, 
    y_train, 
    validation_data=(X_test, y_test), 
    epochs=50, 
    batch_size=64, 
    verbose=1
)

Visualizing Predictions

train_predict = model.predict(X_train)
test_predict = model.predict(X_test)

train_predict = scaler.inverse_transform(train_predict)
test_predict = scaler.inverse_transform(test_predict)
y_train_actual = scaler.inverse_transform(y_train.reshape(-1, 1))
y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))

time_step = 60
 
plt.figure(figsize=(16, 8))
 
# Plot original data
plt.plot(scaler.inverse_transform(scaled_data), label='Original Price', color='black')
 
# Plot training predictions
# We need to shift the training predictions for plotting
train_predict_plot = np.empty_like(scaled_data)
train_predict_plot[:, :] = np.nan
train_predict_plot[time_step:len(train_predict) + time_step, :] = train_predict
plt.plot(train_predict_plot, label='Train Predictions', color='orange')
 
# Plot testing predictions
# We need to shift the test predictions for plotting
test_predict_plot = np.empty_like(scaled_data)
test_predict_plot[:, :] = np.nan
test_predict_plot[len(train_predict) + time_step:len(scaled_data) - 1, :] = test_predict
plt.plot(test_predict_plot, label='Test Predictions', color='green')
 
# Final plot settings
plt.title(f'BTC-EUR Price Prediction')
plt.xlabel('Time (Days)')
plt.ylabel('Price (USD)')
plt.legend()
plt.grid(True)
plt.show()

png

Multivariate LSTM

Extrapolating price with a LSTM usually works better than a regression model. However market prices can rarely be predicted just from one variable, in a production model we would have more variables that are believed to impact the price. Just for this example, let's try to predict closing price with the other values we have.
Beware - this is just an example, in reality we would have other market data including sentiment analysis and even other indicators that may impact the praice.

multivariate_df = df[['priceOpen', 'priceHigh', 'priceLow', 'volume', 'priceClose']]
print_df( multivariate_df.head(3) )

	priceOpen	priceHigh	priceLow	volume	priceClose
2550	6486.25	6816.79	6310.11	4.66811e+09	6376.71
2549	6371.34	6546.54	6371.34	3.42618e+09	6534.88
2548	6551.52	6719.96	6498.64	4.09782e+09	6719.96

num_features = multivariate_df.shape[1]
multivariate_df.shape

(2551, 5)

scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(multivariate_df)
scaled_data[1, 4]

np.float64(0.027459766439369378)

training_size = int(len(scaled_data) * 0.8)
train_data = scaled_data[0:training_size, :]
test_data = scaled_data[training_size:len(scaled_data), :]

def create_multivariate_dataset(dataset, target_index, time_step=60):
  dataX, dataY = [], []
  for i in range(len(dataset) - time_step - 1):
    a = dataset[i:(i + time_step), :]
    dataX.append(a)
    dataY.append(dataset[i + time_step, target_index])
  return np.array(dataX), np.array(dataY)
 
time_step = 60
X_train, y_train = create_multivariate_dataset(train_data, 4, time_step)
X_test, y_test = create_multivariate_dataset(test_data, 4, time_step)
 
print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")

X_train shape: (1979, 60, 5), y_train shape: (1979,)

model2 = Sequential([
  LSTM(units=50, return_sequences=True, input_shape=(60, 5)),
  Dropout(0.2),
  LSTM(units=50, return_sequences=True),
  Dropout(0.2),
  LSTM(units=50),
  Dropout(0.2),
  Dense(units=1)
])
 
model2.compile(optimizer=AdamW(learning_rate=0.001), loss='mean_squared_error')

history2 = model2.fit(
  X_train, 
  y_train, 
  validation_data=(X_test, y_test), 
  epochs=50, 
  batch_size=64, 
  verbose=1
)

Visualizing Predictions

test_predict = model2.predict(X_test)
 
prediction_copies = np.repeat(test_predict, num_features, axis=-1)
test_predict_actual = scaler.inverse_transform(prediction_copies)[:, 4]
 
y_test_reshaped = y_test.reshape(-1, 1)
y_test_copies = np.repeat(y_test_reshaped, num_features, axis=-1)
y_test_actual = scaler.inverse_transform(y_test_copies)[:, 4]
 
plt.figure(figsize=(16, 8))
plt.plot(y_test_actual, label='Actual Price', color='blue')
plt.plot(test_predict_actual, label='Predicted Price', color='red', linestyle='--')
plt.title(f'BTC-EUR Price Prediction (Multivariate)')
plt.xlabel('Time (Days in Test Set)')
plt.ylabel('Price (EUR)')
plt.legend()
plt.grid(True)
plt.show()

[1m15/15[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step

png

Further Improvements

Excluding feature engineering there are two approaches that never fail:

Hyperparameter Tuning - experiment with different numbers of LSTM layers, units per layer, dropout rates, and learning rates to find the best combination for your data.

Ensemble Models - Train multiple models and average their predictions to potentially get a more accurate and robust forecast.