Predicting Stock Prices with PyTorch Transformer, a demo of using dummy data

Predicting stock prices is a challenging task that has attracted the attention of researchers and practitioners alike. With the advent of deep learning techniques, many models have been proposed to tackle this problem. One such model is the Transformer, which has achieved state-of-the-art results in many natural language processing tasks. In this blog post, we will walk you through an example of using a PyTorch Transformer to predict the next 5 days of stock prices given the previous 10 days.

Getting Started

First, let’s import the necessary libraries:

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

Generating Dummy Stock Price Data

For this example, we will generate some dummy stock price data:

num_days = 200
stock_prices = np.random.rand(num_days) * 100

Preprocessing the Data

We will prepare the input and target sequences for our model:

input_seq_len = 10
output_seq_len = 5
num_samples = num_days - input_seq_len - output_seq_len + 1

src_data = torch.tensor([stock_prices[i:i+input_seq_len] for i in range(num_samples)]).unsqueeze(-1).float()
tgt_data = torch.tensor([stock_prices[i+input_seq_len:i+input_seq_len+output_seq_len] for i in range(num_samples)]).unsqueeze(-1).float()

Creating a Custom Transformer Model

We will create a custom Transformer model for stock price prediction:

class StockPriceTransformer(nn.Module):
def __init__(self, d_model, nhead, num_layers, dropout):
super(StockPriceTransformer, self).__init__()
self.input_linear = nn.Linear(1, d_model)
self.transformer = nn.Transformer(d_model, nhead, num_layers, dropout=dropout)
self.output_linear = nn.Linear(d_model, 1)

def forward(self, src, tgt):
src = self.input_linear(src)
tgt = self.input_linear(tgt)
output = self.transformer(src, tgt)
output = self.output_linear(output)
return output

d_model = 64
nhead = 4
num_layers = 2
dropout = 0.1

model = StockPriceTransformer(d_model, nhead, num_layers, dropout=dropout)

Training the Model

We will set up the training parameters, loss function, and optimizer:

epochs = 100
lr = 0.001
batch_size = 16

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

Now, we will train the model with a training loop:

for epoch in range(epochs):
for i in range(0, num_samples, batch_size):
src_batch = src_data[i:i+batch_size].transpose(0, 1)
tgt_batch = tgt_data[i:i+batch_size].transpose(0, 1)

output = model(src_batch, tgt_batch[:-1])
loss = criterion(output, tgt_batch[1:])

print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item()}")

Predicting the Next 5 Days of Stock Prices

Finally, we will predict the next 5 days of stock prices using the trained model:

src = torch.tensor(stock_prices[-input_seq_len:]).unsqueeze(-1).unsqueeze(1).float()
tgt = torch.zeros(output_seq_len, 1, 1)

with torch.no_grad():
for i in range(output_seq_len):
prediction = model(src, tgt[:i+1])
tgt[i] = prediction[-1]

output = tgt.squeeze().tolist()
print("Next 5 days of stock prices:", output)

In this prediction loop, we use the autoregressive decoding approach (model(src, tgt[:i+1])) to generate the output sequence step by step, as the output at each step depends on the previous outputs.


In this blog post, we demonstrated how to predict stock prices using a PyTorch Transformer model. We generated dummy stock price data, preprocessed it, created a custom Transformer model, trained the model, and predicted the next 5 days of stock prices. This example serves as a starting point for developing more sophisticated stock price prediction models using deep learning techniques.

github link


important questions and answers related to the PyTorch Transformer discussed in this conversation:

  1. Why do we pass tgt_batch[:-1] to the model and use tgt_batch[1:] to compare with the output during training?

    We do this because we are using a technique called “teacher forcing” during training. Teacher forcing is a method used in sequence-to-sequence models, where the true output sequence is fed as input to the model during training instead of using the model’s own predictions from the previous time step. This helps the model to learn faster and more accurately.

  2. What determines the number of sequences generated by the model?

    The number of sequences generated by the model is determined by the output_seq_len variable. This means that the model is trained to predict the next output_seq_len stock prices in the sequence, given the previous input_seq_len stock prices.

  3. Why do we use the last position of the prediction at every step during inference?

    We use the last position of the prediction at every step during inference because we are generating the next stock prices one at a time in an autoregressive manner. The new prediction will always be at the last position of the output sequence, so we take the last position of the prediction and append it to the target sequence.

  4. Why is the sequence length in the output the same as the sequence length in tgt?

    The sequence length in the output is the same as the sequence length in tgt because the Transformer model is designed to generate an output sequence of the same length as the input target sequence. The model generates an output sequence based on the input target sequence, and the output sequence has the same length as the input target sequence.

  5. Should we use ground truth during inference?

    During inference, you generally do not have access to the ground truth, as the goal is to make predictions for future data points that are not yet known. The purpose of training a model is to enable it to make accurate predictions when ground truth is not available.

  6. Can we use the previous 4 days’ stock prices as the initial target sequence during inference?

    Yes, you can use the previous 4 days’ stock prices as the initial target sequence during inference if you want to predict the next 5 days based on the last 14 days. This way, the model will have more context to generate predictions for the next 5 days.

  7. Does the value in the “tgt” parameter matter besides the sequence length?

    Yes, the values in the tgt parameter do matter, as they provide context to the model and influence its predictions. The Transformer model generates predictions based on both the source sequence (src) and the target sequence (tgt). The values in the tgt parameter provide additional context to the model, which helps it learn to generate more accurate predictions.

Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !