StockAI Blog
Machine Learning for DSE Stock Prediction: How XGBoost Predicts Bangladesh Stock Prices

Machine Learning for DSE Stock Prediction: How XGBoost Predicts Bangladesh Stock Prices

SA

Sarah Ali

Apr 23, 2026 14 min read Stock Analysis Machine Learning

Can a machine learning model actually predict stock prices on the Dhaka Stock Exchange? It's the question every DSE investor asks when they see AI-powered trading tools promising an edge.

The short answer: machine learning doesn't predict exact prices, but it can identify statistical patterns that give you a meaningful probability advantage. And on DSE — where retail investors often rely on tips and rumors — that edge matters.

In this guide, we'll break down exactly how XGBoost machine learning models predict 5-day stock returns on DSE, what features drive those predictions, and how you can use them in your trading strategy. No hype, no false promises — just the mechanics of how it works under the hood.

What Is XGBoost and Why Does It Work for Stock Prediction?

XGBoost (eXtreme Gradient Boosting) is a machine learning algorithm that builds decision trees sequentially. Unlike a single decision tree that makes one guess, XGBoost builds hundreds of trees — each one correcting the mistakes of the previous trees. The final prediction is a weighted combination of all trees.

Think of it like this: the first tree looks at stock data and makes a rough prediction. The second tree looks at where the first tree was wrong and focuses on those errors. The third tree does the same for the combined errors of trees one and two. After 200 trees, the combined prediction captures complex, non-linear patterns that no single model could see.

Why XGBoost specifically for stock prediction?

  • Handles non-linear relationships — Stock prices are influenced by dozens of factors interacting in complex ways. Linear models miss these interactions; tree-based models capture them naturally.
  • Built-in feature importance — You can see exactly which indicators drive each prediction, so it's not a black box.
  • Robust to noisy data — DSE data has gaps, anomalies, and volatility spikes. XGBoost handles missing values natively and regularizes to avoid overfitting.
  • Proven on DSE — Recent research on Dhaka Stock Exchange data shows XGBoost achieves ~85% direction accuracy for trend prediction, outperforming traditional methods.

How the ML Prediction System Works on DSE

The Stock-AI.live platform runs 392+ individual XGBoost models — one for each actively traded stock on DSE. Each model is trained on that stock's unique price history, sector behavior, and fundamental profile. A bank stock model is different from a pharmaceutical model because their price drivers are different.

Here's the pipeline from raw data to prediction:

  1. Data Collection — Stock prices, DSEX index data, news articles, and company fundamentals are scraped and stored every 60 seconds during market hours (Sunday–Thursday, 10:30–14:30 BDT).
  2. Feature Engineering — 30 features are calculated from raw data across 6 categories: technical indicators, market context, momentum, news sentiment, fundamentals, and lagged returns.
  3. Model Training — XGBoost trains on time-ordered data (80% train, 20% test — never shuffled) with walk-forward validation to prevent data leakage.
  4. Prediction — The model predicts the 5-day forward return and converts it to a BUY/SELL/HOLD signal using volatility-adjusted thresholds.
  5. Tracking — Every prediction is saved and compared against actual results 5 trading days later to track real accuracy.

The 30 Features That Drive Predictions

The most critical part of any machine learning system is the input data. Here are the 30 features grouped by category:

Technical Indicators (9 features)

These capture price trends, momentum, and volatility from the stock's own price chart:

  • RSI (14) — Relative Strength Index measures overbought/oversold conditions. Values above 70 suggest the stock may be overbought; below 30, oversold.
  • SMA 20 & EMA 20 — Simple and Exponential Moving Averages smooth out daily noise to reveal the underlying trend. EMA reacts faster to recent price changes.
  • Distance from SMA 20 — How far the current price is from the 20-day average. Large distances often mean a reversion is coming.
  • MACD, Signal, Histogram — Moving Average Convergence Divergence detects trend changes. When MACD crosses above its signal line, it's historically bullish; below, bearish.
  • Bollinger Band Position & Width — Where the price sits within its Bollinger Bands and how wide those bands are (wide = volatile, narrow = quiet).

Market Context (1 feature)

Relative return vs DSEX — Is this stock outperforming or underperforming the broad DSEX index? A stock rising while DSEX falls is showing relative strength. This uses 1-day lagged data to prevent leakage.

Momentum & Volume (3 features)

  • Rate of Change (10-day) — Percentage price change over 10 days, capturing medium-term momentum.
  • Momentum (5-day) — Short-term directional strength.
  • Volume change — Sudden volume spikes often precede significant price moves.

News & Sentiment (4 features)

This is where the model goes beyond pure technical analysis:

  • News buzz (5-day) — How many news articles mentioned this stock in the last 5 trading days. More coverage = more attention = potential price impact.
  • News sentiment (5-day) — Whether recent news is positive, negative, or neutral for the stock.
  • Sector relative return — How the stock's sector is performing relative to other sectors (1-day lagged).
  • Sector momentum — 5-day momentum of the stock's sector as a whole (1-day lagged).

Fundamental Data (7 features)

Long-term value drivers that technical analysis ignores:

  • P/E ratio (log-scaled) — Price-to-earnings ratio, adjusted for scale. Extremely high or low P/E relative to history signals overvaluation or undervaluation.
  • EPS — Earnings per share, the fundamental profit metric.
  • NAV per share — Net Asset Value, how much the company's assets are worth per share.
  • Dividend yield — Annual dividend as a percentage of current price. High-yield stocks behave differently from growth stocks.
  • Market cap (log-scaled) — Company size affects volatility and predictability. Small caps are more volatile.
  • Free float percentage — How much stock is available for public trading. Low float = more volatile.
  • Beta — Sensitivity to market movements. High-beta stocks amplify DSEX moves.

Lagged Returns (3 features)

  • 1-day, 3-day, 5-day lagged returns — Recent past performance. These capture short-term serial correlation patterns that exist in emerging markets like DSE.

Baseline Features (3 features)

  • Daily return — Today's percentage change.
  • Volatility (20-day) — Standard deviation of returns over 20 days. Critical for adjusting signal thresholds.
  • Volume ratio — Today's volume vs 5-day average. Unusual volume often signals institutional activity.

How Predictions Become BUY/SELL/HOLD Signals

The model doesn't predict a stock price. It predicts the 5-day forward return — the percentage gain or loss expected over the next 5 trading days. This is then converted to a signal using volatility-adjusted thresholds, not fixed percentages.

Why volatility-adjusted? Because a 2% predicted return means something very different for a stable utility stock (where daily moves are 0.5%) versus a volatile small-cap stock (where daily swings of 3% are normal).

The threshold formula:

threshold = max(1.5%, stock_volatility × 1.5)

Examples:

  • Low-volatility stock (1% daily vol) → threshold = 1.5% (the minimum). A predicted return above +1.5% = BUY, below -1.5% = SELL.
  • Medium-volatility stock (1.5% daily vol) → threshold = 2.25%. Needs a stronger signal to trigger BUY or SELL.
  • High-volatility stock (3% daily vol) → threshold = 4.5%. Only very strong predictions trigger signals on volatile stocks.

Each signal also gets a confidence level:

  • High confidence — Predicted return is more than 1.5× the threshold. The model is very sure about direction.
  • Medium confidence — Predicted return is between 1× and 1.5× the threshold.
  • Low confidence — Predicted return is near the threshold. Direction is uncertain; treat HOLD signals with extra caution.

Preventing Data Leakage: The Silent Killer of Stock Models

Most amateur ML stock models fail not because of bad algorithms, but because of data leakage — accidentally training the model on future information. The model looks great in backtesting but fails catastrophically in live trading.

Here's how the DSE prediction system prevents leakage:

  • Time-based train/test split — Data is never shuffled. The model trains on older data and tests on newer data, mimicking real trading where you can only use past information.
  • Lagged features — Market context features (relative return, sector data) use 1-day lagged values. The model never sees today's index level when predicting today's stock return.
  • Self-exclusion in sector averages — When computing sector-relative features, the stock itself is excluded from the sector average calculation.
  • Walk-forward validation — Instead of a single train/test split, the system uses 5 rolling windows (120 rows train, 30 rows test) that simulate how the model would have performed across different time periods.
  • Minimum data requirement — Models need at least 100 rows of clean data after removing NaN values. Stocks with insufficient history are excluded.

Accuracy Tracking: Real Results, Not Backtests

Most stock prediction tools show you backtested results. The DSE ML system tracks live prediction accuracy — every prediction is saved and automatically compared against the actual price 5 trading days later.

Here's how it works:

  1. When you request a prediction for ACI, the model generates a predicted return and saves it with the current price as the baseline.
  2. 5 trading days later (remember, DSE trades Sunday–Thursday), the system fetches ACI's actual closing price.
  3. It calculates the actual return and compares direction: did the stock move in the predicted direction?
  4. Predictions where the actual return is near zero (within ±0.5%) are excluded — these are essentially flat moves where direction doesn't matter.

The key metric is direction accuracy: what percentage of predictions correctly identified the direction (up or down)? On DSE, academic research shows XGBoost achieves around 85% direction accuracy for binary trend prediction. Real-world live accuracy is typically lower — 60-70% — because markets are inherently noisy and models degrade over time.

Model Retraining: Staying Current

Stock market patterns aren't static. A model trained on 2024 data may perform poorly in 2026 because market conditions change. The DSE ML system handles this with automatic retraining:

  • Weekly retraining — Every Sunday at 3:00 AM BDT, the top 50 stocks by trading value are retrained with fresh data, hyperparameter tuning, and walk-forward validation.
  • Auto-train on prediction — If you request a prediction for a stock with no model, one is trained automatically before predicting.
  • Manual training — Admin users can trigger training for any stock via the API, with optional tuning and walk-forward validation.

Hyperparameter tuning searches across 6 parameters (number of trees, tree depth, learning rate, subsample ratio, column sample ratio, and minimum child weight) with 20 random iterations, optimizing for direction accuracy rather than mean squared error.

How to Use ML Predictions in Your DSE Strategy

Machine learning predictions are a tool, not a crystal ball. Here's how to use them responsibly:

1. Use Signals as a Starting Point, Not an Ending Point

A BUY signal means the model detected statistical patterns suggesting upward movement. It doesn't mean the stock will definitely go up. Use the signal to narrow your research — then do your own fundamental analysis, read recent news, and check the company's financial health before investing.

2. Pay Attention to Confidence Levels

High-confidence signals have historically been more reliable than low-confidence ones. A high-confidence BUY signal on a low-volatility stock is stronger than a low-confidence BUY on a volatile small-cap.

3. Diversify Across Multiple Signals

Don't put all your capital on a single prediction. Use the Top Picks feature to see the highest-predicted-return stocks across DSE, then build a diversified portfolio from multiple high-confidence signals.

4. Check Accuracy History

Before trusting a stock's signal, check its prediction accuracy. If a stock's model has 55% direction accuracy, its signals are barely better than a coin flip. If it has 70%+, the model has a meaningful edge for that stock.

5. Respect the 5-Day Horizon

The model predicts 5-day returns, not intraday moves or long-term trends. Don't hold a position for 3 weeks based on a 5-day prediction. After 5 trading days, the model's statistical edge diminishes.

XGBoost vs. Other ML Models for Stock Prediction

You might wonder why XGBoost instead of neural networks or LSTM? Here's the practical comparison for DSE data:

Model Strengths Weaknesses Best For
XGBoost Fast training, feature importance, handles small datasets, robust Can't capture very long sequences DSE with limited data per stock
LSTM Captures long-term temporal patterns Needs huge data, slow training, black box Markets with decades of clean data
Random Forest Simple, robust, less overfitting Lower accuracy than XGBoost Baseline comparisons
Transformer State-of-the-art for sequences Very data-hungry, complex Research, not production (yet)
LightGBM Faster than XGBoost, similar accuracy Slightly more prone to overfitting on small data When speed matters most

For DSE specifically, XGBoost strikes the right balance. Each stock has limited historical data (DSE is still developing as a market), so models that need millions of data points (LSTM, Transformers) are impractical. XGBoost delivers strong results with a few thousand rows of daily data while remaining interpretable through feature importance.

Feature Importance: What Actually Drives DSE Predictions?

One of XGBoost's biggest advantages is that you can see which features matter most. Across DSE stocks, the most important features typically are:

  1. Volatility (20-day) — The single most important feature. High-volatility stocks are harder to predict and need stronger signals.
  2. RSI (14) — Overbought/oversold conditions are strong reversal signals, especially on DSE where momentum tends to overshoot.
  3. MACD histogram — Trend change detection. When histogram flips sign, predictions often shift direction.
  4. Volume ratio — Unusual volume is one of the strongest leading indicators on DSE, often signaling institutional activity before price moves.
  5. Distance from SMA 20 — Mean reversion is a powerful force on DSE. Stocks far from their moving average tend to snap back.

News sentiment and fundamental features (P/E, dividend yield) rank lower on average but can be decisive for specific stocks. A negative news spike on a pharmaceutical company, for example, can override technical indicators.

Limitations: What ML Cannot Do on DSE

Being honest about limitations is more valuable than overpromising:

  • Cannot predict black swan events — Regulatory shocks, political instability, or global market crashes are unpredictable. No ML model saw the 2020 COVID crash coming.
  • Struggles with low-liquidity stocks — Stocks that trade infrequently have sparse data and noisy price movements. Models for these stocks have lower accuracy.
  • Degrades during regime changes — When DSE fundamentally changes (new regulations, major policy shifts), historical patterns may no longer apply until the model is retrained on new data.
  • Direction accuracy ≠ profit — Even if direction is correct, the magnitude may be wrong. A prediction of +3% that delivers +0.5% is "correct" in direction but barely profitable after transaction costs.
  • Overfitting risk on small datasets — Stocks with limited history may have models that memorize patterns rather than learning generalizable ones. Walk-forward validation helps detect this.

Getting Started with ML Predictions on DSE

Ready to see machine learning predictions for DSE stocks? Here's how to access them:

  1. Browse predictions — Visit stock-ai.live and search any DSE stock to see its latest ML prediction, signal, and confidence level.
  2. Check top picks — The Top Picks page shows stocks with the highest predicted 5-day returns across DSE.
  3. Review accuracy — Each stock's prediction page shows historical direction accuracy, so you know which models to trust.
  4. Use the API — For programmatic access, the ML prediction API is available with Pro and Enterprise API keys.

Key Takeaways

  • XGBoost predicts 5-day forward returns for 392+ DSE stocks using 30 features across technical, fundamental, and sentiment categories.
  • Predictions are converted to BUY/SELL/HOLD signals using volatility-adjusted thresholds — a 2% predicted return means different things for different stocks.
  • Every prediction is tracked and verified against actual prices 5 days later, providing real accuracy data instead of backtest-only claims.
  • Models are retrained weekly to adapt to changing market conditions on DSE.
  • ML predictions are a tool for narrowing research, not a replacement for your own analysis. Use high-confidence signals as a starting point, then verify with fundamentals and news.

Machine learning on the Dhaka Stock Exchange is still early, but it's already providing a statistical edge that individual investors couldn't access before. The key is understanding what the models can and cannot do — and using them as one input in a broader investment strategy.