ID: 23FE10CSE00299

Energy Consumption Prediction at Data Centres Using LSTM-RNN Optimised by Ninja Optimisation Algorithm

A multi-horizon forecasting framework using a Bidirectional stacked LSTM-RNN with NiOA-based hyperparameter optimisation.

Introduction

Data centres are amongst the fastest-growing contributors to global energy consumption and carbon emissions. Continuous computational workloads generate high-dimensional power demand signals. Accurate forecasting of energy incrementsenables proactive scheduling, cooling optimisation, and intelligent resource management.

This project proposes a reproducible deep learning model for multi-horizon energy increment prediction. The core model is a Bidirectional Deep Recurrent Neural Network (DRNN) using LSTM whose hyperparameters are optimised by the Ninja Optimisation Algorithm (NiOA), which is a population-based meta-heuristic inspired by the adaptive hunting strategies of ninjas, balancing exploration and exploitation across iterative refinement cycles.

Problem Statement

Conventional carbon footprint estimation methods are inaccurate and fail to capture dynamic, time-evolving power consumption patterns in data centers. Despite the availability of advanced deep learning architectures and optimization frameworks, accurate, real-time prediction of carbon emissions in data centers remains challenging because of high dimensional sensor data, complex temporal dynamics, and difficulty of hypermeter tuning. Manual hyperparameter tuning is inefficient and suboptimal. Hence, a real-time, scalable, and accurate predictive system is essential for optimizing energy use and minimizing carbon emissions in data centers.

The problem addressed in this research paper is the multi-horizon prediction of short-to-medium-term energy increments in workstation level computing environments. Given a sensor measurement at time ‘t’, the prediction target is given as:
ΔE_k(t) = E(t+k) - E(t),
where, E(t): cumulative energy reading at time t, k: the forecast horizon measured in seconds.
The model learns to predict this target variable using a sliding window of the preceding 120 rows of data.

Objectives

Design a Bidirectional DRNN architecture for multi-horizon cumulative energy increment prediction.
Implement gap-aware target computation to handle recording interruptions of up to 65.8 days.
Optimise hyperparameters automatically using NiOA.
Apply log1p variance-stabilising transformation and Huber loss to address extreme target skewness.
Evaluate across multiple forecast horizons: k = 60 s, 300 s, 900 s, and 1800 s.
Benchmark against Classical ML (LR, SVR, XGBoost, MLP), Deep Learning (Vanilla LSTM, CNN-LSTM), ARIMA, and DRNN+Optuna.

Proposed Methodology

Dataset

Source: IEEEDataPort — Data Server Energy Consumption (Estrada et al., SEIT 2022)
Duration: August–December 2021 (~5 months)
Raw rows: 3,178,051 at 1-second resolution
Max gap: 5,683,031 s (65.8 days)

Features (17): Voltage, Current, Power, Energy, Frequency, PF, Sensor Temp, CPU %, CPU Power, CPU Temp, GPU, GPU Power, GPU Temp, RAM, RAM Power, Hour, Weekday
Target: ΔEₖ(t) = E(t+k) − E(t)
Horizons: k = 1s, 60s, 300s, 900s, 1800s

Pipeline Flow — 10 Stages

Load CSV with fault tolerance

Corrupt lines skipped; Spanish column names mapped to English equivalents.

Timestamp parsing & chronological sort

Forward-fill imputation; duplicate removal; strict temporal ordering.

Gap-aware target computation

Targets rejected where elapsed_time > 2k seconds or ΔE < 0 (counter reset).

Z-score outlier removal (features only)

|z| ≥ 3 removed from feature columns; target column excluded.

Target capping + log1p transform

99.5th percentile cap; np.log1p applied → near-Gaussian range [0, ~6.04]. Inverse np.expm1 at evaluation time restores kWh units.

Chronological 70/15/15 split

Strictly time-ordered. No shuffling. Verified by timestamp boundary assertions.

StandardScaler (train-only fit)

Fit on training split only; transform applied to val and test — prevents leakage.

Sliding window sequence generation

Window = 120 timesteps × 17 features → X shape (N, 120, 17). Stride tricks for memory efficiency.

NiOA hyperparameter search (30% subset)

6 agents × 7 evaluations = 42 function evaluations. Trial time limit: 1,800 s.

Final training on full training split

Huber loss (δ=1.0), AdamW, EarlyStopping (patience=7), batch=16. Best val epoch weights restored.

Model Architecture — Bidirectional Stacked DRNN

Input ( 120 × 17 )

→ Bidirectional LSTM ( units, return_sequences=True )

→ [ LSTM ( units ) + Dropout ] × (layers − 1)

→ BatchNormalization

→ GlobalMaxPooling1D

→ Dense ( 64, ReLU ) → Dropout

→ Dense ( 25, ReLU ) → Dense ( 1 )

Bidirectional: captures both forward and backward temporal patterns
Huber loss δ=1.0: robust to residual outliers in log1p space
AdamW: weight-decay regularisation decoupled from gradient
GlobalMaxPooling1D: aggregates most salient temporal features

NiOA Hyperparameter Search Space

Parameter	Range / Values	Type
LSTM layers	2 – 3	Integer
Units per layer	64, 128	Categorical
Dropout rate	0.30 – 0.60	Float
Optimiser	AdamW	Categorical
Learning rate	5×10⁻⁵ – 5×10⁻⁴	Log-uniform
Batch size	32	Categorical

Algorithm Details

6 agents · 7 evaluation rounds · 42 total evaluations · 1800 s trial limit · Exploration decays with iteration progress · Exploitation probability increases 0.30 → 0.90

Pipeline Evolution — v1 → v3

Version	Target	Loss	Log1p	NiOA Limit	Train MSE	Val MSE	Root Cause Fixed
v1.0	Row-based ΔE	MSE	No	420 s	0.007	1,672	—
v2.0	Time-based ΔEₖ	MSE	No	420 s	~1×10⁻⁵	1.42×10⁻⁵	Gap corruption
v3.0	ΔEₖ + cap + log1p	Huber δ=1	Yes	1800 s	7.86×10⁻⁶	6.59×10⁻⁶	Skewness + time limit increase

v1 Critical Bug

energy.shift(-k) Across 65.8-day gaps, ΔE = total gap energy (~417 kWh), leading to large train/val MSE gap.

v2 Remaining Issue

Positive counter-reset artefacts pass the gap and negative-delta filters. ~10-17% of test samples contain ~417 kWh spurious values, dominating MAE.

v3 Improvements

Log1p transform + Huber loss eliminate MSE collapse on skewed targets. NiOA time limit 420→1800 s allows 4-6 epochs per trial for meaningful comparison.

Experimental Results

NiOA-Optimised Hyperparameters — All Trained Horizons

Horizon k	LSTM Layers	Units	Dropout	Learning Rate	Best Val Loss (Huber)	Total Params
k = 60 s	3	64	0.3062	4.665 x 10⁻⁴	7.021 x 10⁻⁶	130,483
k = 300 s	2	64	0.3963	1.752 x 10⁻⁴	7.090 x 10⁻⁶	97,459
k = 900 s	2	64	0.4387	2.292 x 10⁻⁴	8.070 x 10⁻⁶	97,459

NiOA-DRNN Test Set Metrics — Multi-Horizon Summary

Horizon k	Test Samples	% Artefacts	Full Test Set (with artefacts)				Clean Subset (artefacts removed)
Horizon k	Test Samples	% Artefacts	MAE (kWh)	RMSE (kWh)	R²	sMAPE (%)	MAE (kWh)	RMSE (kWh)	R²	sMAPE (%)
k = 60 s	401,610	10.61%	44.253	135.838	−0.119	107.23	0.00114	0.00182	−0.082	96.22
k = 300 s	401,338	15.42%	64.311	163.765	−0.182	76.86	0.00296	0.00328	−0.265	54.41
k = 900 s	401,227	16.10%	67.142	167.328	−0.192	61.02	0.00584	0.00642	+0.061	34.36

Per-Horizon Detailed Results with Post-hoc Filtered Evaluation Plots

Full MAE

44.25

kWh

Artefact-dominated

Full RMSE

135.84

kWh

Artefact-dominated

Clean MAE

0.00114

kWh

89.4% clean samples

Clean sMAPE

96.22%

High % near-zero targets

Max plausible ΔE

16.67 kWh

1000 W × 60 s / 3600

Artefacts removed

42,628

10.61% of test set

Clean samples

358,982

89.39% retained

60 Minute Horizon Performance Comparison

k=60s, Plot Description

Left panel: Full test scatter shows all predictions near zero while actuals span 0–417 kWh.
Centre panel: Clean subset scatter (n=358,982) shows MAE=0.0011 kWh with tight clustering near the perfect-prediction line for small ΔE values (0–0.010 kWh).
Right panel: Residual distribution on clean subset is highly concentrated with peak near 0 (slight negative bias indicating mild under-prediction).

Full MAE

64.31

kWh

Artefact-dominated

Full RMSE

163.77

kWh

Artefact-dominated

Clean MAE

0.00296

kWh

84.6% clean samples

Clean sMAPE

54.41%

Improving with horizon

Max plausible ΔE

83.33 kWh

1000 W × 300 s / 3600

Artefacts removed

61,890

15.42% of test set

Clean samples

339,448

84.58% retained

300 Minute Horizon Performance Comparison

k=300s, Plot Description

Left panel: Full test scatter identical pattern to k=60, artefact spike at ~417 kWh causes all predictions to cluster near zero relative to actual.
Centre panel: Clean subset shows MAE=0.0030 kWh. The scatter is slightly wider than k=60.
Right panel: Residual distribution shows multi-modal structure, discrete energy consumption levels of the workstation create discrete residual bands, visible as separate peaks in the distribution.

Full MAE

67.14

kWh

Artefact-dominated

Full RMSE

167.33

kWh

Artefact-dominated

Clean MAE

0.00584

kWh

83.9% clean samples

Clean sMAPE

34.36%

Best across horizons

Max plausible ΔE

250.00 kWh

1000 W × 900 s / 3600

Artefacts removed

64,596

16.10% of test set

Clean samples

336,631

83.90% retained

900 Minute Horizon Performance Comparison

Post-hoc Filtered Evaluation Plot, k = 900 s (Actual Output)

k=900s, Best sMAPE Performance

The 34.36% clean-subset sMAPE at k=900 is the best achieved across all trained horizons, longer forecast windows have higher signal-to-noise ratio. The 15-minute energy increment shows stronger temporal autocorrelation, enabling the BiLSTM to learn more predictive patterns from the 120-step look-back window.

Benchmarking Status

Model	Category	Status	Notes
NiOA-DRNN	Deep Learning	(k=60,300,900)	Log1p + Huber + BiLSTM + NiOA, proposed model
Linear Regression	Classical ML	done	MAE=72.77 (100k row cap)
SVR	Classical ML	done	MAE=72.77 (50k row cap, O(n²))
XGBoost	Classical ML	done	MAE=72.77 (full data, early stop)
MLP (sklearn)	Classical ML	done	MAE=72.79 (100k row cap)
Vanilla LSTM	Deep Learning	Pending	Fixed hyperparameters, same loss
CNN-LSTM	Deep Learning	Pending	Conv1D + MaxPool + LSTM
ARIMA	Statistical	Pending	Univariate only; rolling forecast
DRNN + Optuna	Deep Learning	Pending	Identical arch; TPE sampler; equal 42-trial budget

All pending models are fully implemented and ready to run on canonical frozen splits.
The NiOA-DRNN already outperforms all 4 completed classical models on the full test set.

Identified Limitations & Remediation

Positive Artefact Spikes

Meter counter-reset events produce large positive ΔE (ΔE ≈ 417 kWh). They are temporally close (elapsed < 2*k) and positive (not caught by negative-delta filter). ~10–17% of test samples contain these. They dominate MAE.

Fix

Add physical plausibility filter before sequence generation: remove any row where ΔEₖ > (1000 W * k) / 3600 kWh.

References

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681.
El-Kenawy, E. S. M. et al. (2024). NiOA: A novel metaheuristic algorithm modelled on the stealth and precision of Japanese Ninjas. Journal of Artificial Intelligence Engineering and Practice, 1, 17–35.
BenGhorbal, A. et al. (2025). Predicting carbon dioxide emissions using deep learning and Ninja metaheuristic optimization algorithm. Scientific Reports, 15:4021.
Yassen, M. A. et al. (2025). Renewable energy forecasting using optimized quantum temporal model based on Ninja optimization algorithm. Scientific Reports, 15:14714.
Estrada, R. et al. (2022). Learning-based energy consumption prediction. Procedia Computer Science, 203, 272–279. (Original dataset source)

Academic Credits

Project Guide

Dr Shishir Singh Chauhan

Student

Anwesha Singh

23FE10CSE00299

Thank You

Questions & Discussion