Using MindsDB to forecast financial time series data: A practical guide

Using MindsDB to forecast financial time series data: A practical guide

Forecasting financial time series data is essential for making informed decisions about investments, business operations, and financial planning.

Financial time series data refers to data collected over time, such as stock prices, interest rates, exchange rates, and other economic indicators.

By forecasting this data, analysts and decision-makers can anticipate future trends, identify risks and opportunities, and develop strategies to optimize financial outcomes.

Accurate financial forecasting can help organizations improve their risk management strategies and identify potential areas of financial instability. By anticipating financial trends and risks, decision-makers can take proactive steps to mitigate losses and maximize gains.

Therefore, forecasting financial time series data plays a crucial role in financial decision-making, risk management, and overall financial planning.

Why Machine Learning?

Machine Learning (ML) plays an increasingly important role in financial forecasting. ML algorithms are used to analyze large volumes of historical financial data, identify patterns, and predict future trends. ML models can also learn from new data and adjust their predictions over time, which makes them particularly useful for analyzing financial time series data.

Some common applications of ML in financial forecasting include:

  • Predicting stock prices and market trends.

  • Forecasting exchange rates and interest rates.

  • Identifying credit risk and predicting loan defaults.

In this article, we will review forecasting stocks with MindsDB.

Prerequisites

To follow through in the article, one must have a bit of knowledge of the following:

  • Data Analysis (The Pandas Library).

  • Machine Learning.

  • SQL.

  • Stocks.

What is MindsDB?

MindsDB is an open-source, automated machine-learning platform that simplifies the creation and deployment of machine-learning models. MindsDB provides an easy-to-use interface that allows users to build, test, and deploy machine learning models without requiring advanced machine learning or programming knowledge. One of the unique features of MindsDB is its ability to learn from existing data and make predictions based on that data. MindsDB can also automate the process of selecting the best machine learning algorithm for a given problem and tuning the hyperparameters of those algorithms to achieve the best possible performance.

MindsDB uses various techniques, including deep learning and natural language processing, to create accurate and scalable models for multiple applications. These applications include time series forecasting, natural language processing, fraud detection, and customer churn prediction. MindsDB can be integrated into existing workflows and systems, such as databases and APIs, making incorporating machine learning into existing applications easy. One of the main advantages of using MindsDB for financial forecasting is its ability to handle time series data, a common type in finance.

In this article, we will use the S&P 500 stock data from Kaggle to build our forecasting model on MindsDB. The first process in building the model is to preprocess the data obtained.

Preprocessing the Data

The S&P 500 stock data contains the following columns: “date”, “open”, “high”, “low”, “close”, “volume”, and “name.” Here’s a brief explanation of the column's

  • date: The date of the stock price data in the format "YYYY-MM-DD".

  • open: The opening price of the stock on that date.

  • high: The highest price of the stock on that date.

  • low: The lowest price of the stock on that date.

  • close: The closing price of the stock on that date.

  • volume: The number of shares traded on that date.

  • name: The stock symbol or ticker (e.g. AAPL for Apple Inc.).

To start the preprocessing process, we would use a tool called Pandas. Pandas is a popular open-source data analysis and manipulation library for the Python programming language.

Installing and importing the Library

The first step is to install the Pandas library. To do that, follow the command:

pip install pandas

# If in a jupyter notebook environment
!pip install pandas

After installing, import the library

import pandas as pd

Reading in the data

After importing the library, the next step is to read the CSV file downloaded

df = pd.read_csv("main.csv")
df.head() # to display the first few rows of the data
df.info() # provides a summary of the data

The first few rows of the data would look like this:

Getting the summary of the data would be:

Preprocessing

We have some null rows from the info given. Since we cannot work with null data, we drop the null data

df.dropna(inplace=True)   # To remove null data

Next, we change the date column to a datetime object since we’re working with time series data and set it as an index:

df['date'] = pd.to_datetime(df['date']) # Convert the 'date' column to a datetime object
df.set_index('date', inplace=True) # Set the 'date' column as the index

For this article, we will be selecting only some companies’ stock. They include AAPL, GOOGL, AMZN,FB, and MSFT.

subset = ['AAPL', 'GOOGL', 'AMZN', 'FB', 'MSFT']
df = df[df['Name'].isin(subset)]

Now, let’s generate descriptive statistics for the data we’re using now using the describe function

print(df.describe())

To check further, we can check if there are any duplicates and remove them:

# Check for duplicates:
print(df.duplicated().sum())

# remove duplicates
df.drop_duplicates(inplace=True)

At the end of the preprocessing, we should be left with 6295 non-null rows. To save the preprocessed data we have now, we use the to_csv function:

df.to_csv("stocks_demo.csv")

The next step is to build the machine learning model with MindsDB.

Building with MindsDB

Like it was explained earlier, MindsDB is an open-source automated machine learning (ML) platform that simplifies the process of building ML models. We would be using the MindsDB cloud editor. Register for a free account to get started:

The interface of the editor would look similar to the screenshot below:

Uploading the Data

MindsDB has a variety of integrations that allow it to work with many different data sources. These integrations enable users to connect quickly and import data from databases such as MongoDB, PostgreSQL, MySQL, and IBM DB2, as well as from cloud-based services such as Google Sheets. For this article, we will be working with the online editor. To get started, we would be uploading the preprocessed data we have. Click on the “Add” button and “Upload a File” option:

Keep note of the name given to our Datasource. It is essential because it will be used later. After a successful upload, we will be presented with an interface like this:

Comment out line 7 and click on “Run”:

SELECT * FROM files.stocks LIMIT 10;

Our result at the bottom would look similar to this:

Creating the Model

Now that we have our data uploaded to the MindsDB editor, we can create a model to help make predictions. In this tutorial, we will be predicting the direction of the stock price movement (e.g., up or down), so we will be using the close column to calculate the price change.

CREATE MODEL 
  mindsdb.prices
FROM files
  (SELECT * FROM files.stocks)
PREDICT close;

In the code above, we use the CREATE MODEL keyword to create a model named prices. The next line is to select the file stocks (our Datasource name) to use for building the model. The last line uses the PREDICT keyword to predict with the close column (our target variable). After running the above command, you should get something similar to this:

Observing the Model

After creating the model, we can check the status of the model to know if the model is ready to use or if it is still training:

SELECT *
FROM mindsdb.models
WHERE name='prices';

The above syntax will give us a description of our model prices. To know if the model is still in training or complete, check the STATUS column of the result in the output screen:

In the screenshot, we can see that the status of this model is complete. That means the model is done with training.

Using the Model for Prediction

Now that our model is done with training let us test our model. For testing, we will be using the AAPL stock to predict the stock's closing price for the 13th of April, 2023. Providing the necessary information from the previous day as of the close of trade that day, we have:

SELECT close, 
      close_explain 
FROM mindsdb.prices
WHERE date= '2023-04-13'
AND open= '161.22'
AND high= '162.06'
AND low= '159.78'
AND close= '160'
AND volume= '62000000'
AND Name= 'AAPL';

Running this with our model will give us the following:

More analysis, preprocessing, and exploration can be done on the data for a better and more accurate prediction. Also, the model hyper-parameters can be tuned more finely.

Conclusion

MindsDB is an open-source automated machine-learning tool that simplifies and speeds up the process of building and deploying machine-learning models. It provides an easy-to-use interface that allows users to train models without requiring a deep understanding of complex machine-learning algorithms.

We can see how possible it was to build a basic model without so much knowledge in the machine learning domain. In conclusion, MindsDB is a go-to machine learning tool for building ML models on the fly. To know more about the features that MindsDB offers, you can check out their documentation.

All the commands and data used have been uploaded to GitHub.

REFERENCES

Happy Building 🚀✨!