Hybrid Deep Learning for Solar Energy Holding Capacity in Bangladesh

Bangladesh sits in a high-irradiance belt, yet most regional planners still size solar deployments from yearly-average heuristics. The gap between installed and holding capacity — the share of nameplate kW the grid can reliably absorb under local weather — is large and seasonal. This work replaces those heuristics with a stacked ensemble that ingests environmental and weather signals and predicts holding capacity for every region.

Motivation

Classical methods (regression on monthly mean irradiance, or physics-based clear-sky models) break down for two reasons. First, the underlying signal is strongly non-linear: humidity, cloud type, and aerosol load interact in ways that linear models can’t capture. Second, the regional variance is huge — coastal cyclone seasons behave nothing like the dry north.

We needed a model that is:

Non-linear and robust to missingness.
Strong on both stable inland regions and high-variance coastal ones.
Cheap to retrain as new sensor data arrives.

Data

We assembled a panel of region-month observations covering 2014 – 2024, with the following features per row:

Feature	Source	Notes
Solar irradiance	NASA POWER, BMD ground stations	Daily GHI, monthly averages
Temperature	BMD	Mean, max, min
Relative humidity	BMD	Affects atmospheric scattering
Wind speed	BMD	Cools panels, raises real-world output
Cloud cover	NASA MERRA-2	Total + low-cloud fractions
Rainfall	BMD	Proxy for monsoon intensity
Cyclone incidents	BMD storm catalog	Binary per region-month
Drought intensity	BAMIS SPI	Standardised Precipitation Index

The target is holding capacity ( $C_h$ ) in MW, derived from utility-side curtailment logs and SCADA dispatch records.

Methodology

Stacked generalisation

We chose stacked generalisation because it combines diverse hypothesis classes through a learned aggregator — exactly the setup that works when no single model dominates across regions. Five base learners feed their out-of-fold predictions into a small neural network meta-learner:

\hat{y} = f_{\text{meta}}\bigl(\,f_1(x),\, f_2(x),\, f_3(x),\, f_4(x),\, f_5(x)\,\bigr)

where $f_1 \dots f_5$ are the five base learners (Random Forest, XGBoost, Gradient Boosting, AdaBoost, and a feedforward NN) and $f_{\text{meta}}$ is a two-layer MLP that learns the optimal blend weights per region.

Loss function

We use a region-aware mean-squared error that down-weights the noisier coastal panel:

\mathcal{L}(\theta) = \frac{1}{N} \sum_{i=1}^{N} w_{r(i)} \, \bigl(\hat{y}_i - y_i\bigr)^2 + \lambda\,\lVert \theta \rVert_2^2

with $w_{r} \in [0.5, 1.0]$ per region $r$ , and $\lambda = 10^{-4}$ for L2 regularisation. The weights are learned end-to-end from a small validation set, so the model adapts to the noise structure of each region instead of us hand-tuning it.

Feature engineering

The biggest accuracy gains came not from the model but from features:

Humidity-corrected irradiance — multiply GHI by an empirical attenuation curve in $RH$ .
Diurnal phase — sine/cosine of the hour-of-day so models can express sunrise/sunset cleanly.
Seasonal one-hot — explicit monsoon / pre-monsoon / dry markers.
Rolling cyclone intensity — three-month EWMA of storm counts.

Results

Per-model accuracy

Test accuracy by model

92 %

Random Forest

92.8 %

XGBoost

91.4 %

Gradient Boost

88.6 %

AdaBoost

90.5 %

Neural Net

95.1 %

Stacked (ours)

Five base learners individually reach 88 – 93%. The stacked meta-learner clears 95% by routing each region to the model that handles its noise best.

The meta-learner consistently picks XGBoost for stable inland regions (Rangpur, Rajshahi) and shifts weight toward the NN for coastal regions (Chittagong, Khulna) where the noise distribution is heavier-tailed.

Regional error breakdown

Mean absolute error by region (MW)

Rangpur

1.4 MW

Rajshahi

1.6 MW

Dhaka

2.1 MW

Sylhet

2.3 MW

Khulna

3.2 MW

Barishal

3.5 MW

Chittagong

4.1 MW

Inland regions are easy; the coastal belt — with cyclones and salt-spray-driven panel degradation — drives the bulk of the residual.

Learning curve

Validation accuracy over training epochs

Stacked (ours) Best base (XGBoost)

The stacked model gains the most in the first 40 epochs as the meta-learner discovers per-region blend weights, then settles.

Headline numbers

Metric	Value
Test accuracy (R²-style)	95.1%
Mean absolute error	2.4 MW
Root mean squared error	3.1 MW
Training time (1 GPU)	18 min
Inference per region	< 2 ms

Implementation

The pipeline is in Python, scikit-learn for the base learners (except XGBoost), TensorFlow for the meta-learner, with a thin Pandas layer for feature engineering.

from sklearn.ensemble import (
    RandomForestRegressor,
    GradientBoostingRegressor,
    AdaBoostRegressor,
)
from sklearn.neural_network import MLPRegressor
from xgboost import XGBRegressor
import numpy as np

base = {
    "rf": RandomForestRegressor(n_estimators=200, max_depth=15, n_jobs=-1),
    "xgb": XGBRegressor(n_estimators=300, max_depth=6, learning_rate=0.05),
    "gbr": GradientBoostingRegressor(n_estimators=200, max_depth=5),
    "ada": AdaBoostRegressor(n_estimators=150, learning_rate=0.8),
    "nn":  MLPRegressor(hidden_layer_sizes=(64, 32), max_iter=400),
}

oof = {name: np.zeros(len(X_train)) for name in base}
for name, model in base.items():
    for tr, va in kfold.split(X_train):
        model.fit(X_train[tr], y_train[tr])
        oof[name][va] = model.predict(X_train[va])

# Meta-learner ingests the OOF predictions.
meta_X = np.column_stack(list(oof.values()))
meta = build_meta_nn()              # 2-layer MLP, 64 → 32 → 1
meta.fit(meta_X, y_train, epochs=100, batch_size=64, validation_split=0.2)

The meta-network is intentionally small. We don’t need it to learn the data again — we only need it to learn which base learner to trust where.

What we’d do next

Replace the meta-MLP with a gating network conditioned explicitly on region embeddings. Anecdotally, the current MLP already learns this, but a gating formulation would be more interpretable.
Add satellite cloud-top imagery through a small CNN feature extractor — the current cloud features are coarse averages.
Quantile regression heads so planners get an interval, not a point estimate.

The full paper appeared at IEEE QPAIN 2026. If you’re working on adjacent problems — utility-scale forecasting, microgrid sizing, curtailment optimisation — I’d love to compare notes.