I want to start a personal project but am struggling to formulate my business problem into a model. I’d appreciate any advice on how to approach this issue and what models or techniques to research.
The goal is to model the nth day forward moving average of a metric based on previous days and the latest available forward moving average for a given day.
For example, if today is day 30, I want to predict the forward moving average of a metric up to 360 days ahead. I currently have the actual average values for the forward moving averages from 1 to 30 days, and I also have data for these averages over the past 5 years.
The objective is to define ranges for all forward moving averages from the latest date (31 days) to 360 days.
I’m unsure which type of model to use or how to structure the problem, as the goal is not to predict a single value but rather a range of values from 31 to 360 days
It sounds like a fascinating project! Here are a few steps and models that might help you get started:
Approach
Understand the Problem: You want to predict a range of values for the forward moving average from day 31 to day 360. This is essentially a time series forecasting problem, but with a focus on predicting ranges instead of single point estimates.
Data Preparation: Make sure your data is clean and well-structured. You have a good historical dataset, so you might start by visualizing it to understand any trends, seasonality, or patterns.
Feature Engineering: Create features that capture the essence of your problem. Some useful features might include:
Lagged values of the moving average (e.g., values from the past few days/weeks/months)
Day of the year to account for seasonality
Rolling statistics (mean, standard deviation) over different windows
Any external factors that might influence the metric you’re studying
Model Suggestions
ARIMA (AutoRegressive Integrated Moving Average): A classic model for time series forecasting, ARIMA can be useful for capturing the underlying patterns in your data. However, it may not handle complex non-linear relationships well.
Exponential Smoothing (ETS): This method is good for capturing trends and seasonality. Models like Holt-Winters can be particularly useful if your data exhibits these characteristics.
Machine Learning Models:
Random Forests/Gradient Boosting: These can capture complex relationships and interactions in your data. They are also relatively easy to interpret.
LSTM (Long Short-Term Memory) Networks: If your data is highly sequential, LSTMs, a type of recurrent neural network, might be very effective. They are good at capturing long-term dependencies in time series data.
Probabilistic Models: Since you want to predict a range of values, consider models that output a distribution instead of a point estimate. Examples include:
Quantile Regression: This allows you to directly predict quantiles of your target distribution.
Bayesian Methods: Bayesian linear regression or Bayesian neural networks can provide probabilistic forecasts.
Prophet: Developed by Facebook, Prophet is designed for forecasting time series data and can provide uncertainty intervals.
Structuring the Problem
Multi-Step Forecasting: Since you need predictions up to 360 days ahead, you’ll need a strategy for multi-step forecasting. You can:
Build a separate model for each future day (though this might be resource-intensive).
Use a single model to predict multiple steps ahead, which can be done using techniques like sequence-to-sequence models.
Evaluation Metrics: Choose appropriate metrics to evaluate your model. Since you’re predicting ranges, consider metrics that assess the accuracy of these intervals, such as the coverage probability or the width of the prediction intervals.
Cross-Validation: Use cross-validation techniques that respect the temporal order of your data. Time series split methods can be useful here.
I have been testing critical levels, auto trendlines, and a bespoke relative strength indicator based on the main one used on r/realdaytrading vs SPY.
It is currently in forward testing, so it does not operate on all stocks, but when it works, it has a hit rate of more than 75%, and occasionally more than 80%.
If you’re attempting to get away from moving averages, this might be worth looking into. I discovered that moving averages had a low hit rate and frequently produced false signals.