Building a solar power forecasting system using machine learning in Python combines two cutting-edge fields — renewable energy management and AI — into a practical application relevant to India’s growing solar infrastructure. Grid operators, solar plant owners, and battery storage managers all need accurate solar generation forecasts to optimise their operations. This guide covers the complete pipeline from weather API data collection to ML model training and deployment.
Table of Contents
- Why Solar Power Forecasting Matters
- Data Sources and Weather APIs for India
- Feature Engineering for Solar Forecasting
- ML Model Selection and Training
- Complete Python Implementation
- Model Evaluation Metrics
- Deployment for Indian Solar Systems
- Frequently Asked Questions
Why Solar Power Forecasting Matters
Accurate solar power forecasting using machine learning is critical for:
- Grid balancing: NLDC/SLDC operators need day-ahead solar forecasts to schedule backup generation. Forecasting errors cost Indian grid operators Rs 500-2,000 crore annually.
- Battery management: Predictive charging/discharging of battery storage using tomorrow’s solar forecast extends battery life by 10-20%.
- Trading/scheduling: RE generators must submit day-ahead generation schedules to NLDC under CERC IEGC regulations. Accurate forecasts reduce deviation settlement charges.
- Maintenance planning: Schedule panel cleaning and maintenance on forecast low-generation days to minimise opportunity cost.
Data Sources and Weather APIs for India
Key data sources for Indian solar forecasting:
- OpenWeatherMap API (free tier): 5-day hourly forecast including cloud cover, humidity, temperature, wind. Available globally including India. Free for 1000 calls/day.
- India Meteorological Department (IMD): Government agency providing meteorological gridded data. Registration required. Free for research.
- Solargis (commercial): Best-in-class solar irradiance data for India at 15-minute resolution. Paid API, Rs 15,000-50,000/year for commercial use.
- NASA POWER API (free): Historical and near-real-time solar radiation data at any location globally. Excellent for training datasets.
- PVGIS (European Commission, free): Historical solar radiation data with hourly resolution for India. Best free historical dataset for panel-level calculations.
# Fetch weather data using OpenWeatherMap API
import requests
import pandas as pd
from datetime import datetime
API_KEY = 'your_openweathermap_api_key'
LAT, LON = 18.5204, 73.8567 # Pune, Maharashtra
def get_weather_forecast(lat, lon):
url = f"https://api.openweathermap.org/data/2.5/forecast"
params = {
'lat': lat, 'lon': lon,
'appid': API_KEY,
'units': 'metric'
}
r = requests.get(url, params=params)
data = r.json()
records = []
for item in data['list']:
records.append({
'datetime': datetime.fromtimestamp(item['dt']),
'temp_c': item['main']['temp'],
'clouds_pct': item['clouds']['all'],
'humidity': item['main']['humidity'],
'wind_speed': item['wind']['speed'],
'description': item['weather'][0]['description']
})
return pd.DataFrame(records)
df_forecast = get_weather_forecast(LAT, LON)
print(df_forecast.head())
Feature Engineering for Solar Forecasting
Raw weather data needs to be transformed into features that capture solar physics:
import numpy as np
import pandas as pd
from pvlib import location, irradiance
def engineer_solar_features(df, lat=18.52, lon=73.86, altitude=560):
"""Add solar position and derived features to weather dataframe"""
site = location.Location(lat, lon, altitude=altitude, tz='Asia/Kolkata')
# Solar position features
solar_pos = site.get_solarposition(df.index)
df['solar_elevation'] = solar_pos['elevation']
df['solar_azimuth'] = solar_pos['azimuth']
df['cos_zenith'] = np.cos(np.radians(solar_pos['zenith']))
# Clear-sky irradiance (maximum possible)
clearsky = site.get_clearsky(df.index)
df['ghi_clearsky'] = clearsky['ghi']
df['dni_clearsky'] = clearsky['dni']
# Cloud-sky modifier
df['cloud_modifier'] = (1 - df['clouds_pct']/100) * 0.7 + 0.3
df['estimated_ghi'] = df['ghi_clearsky'] * df['cloud_modifier']
# Temperature correction factor for panel efficiency
# Panel efficiency drops 0.4%/C above 25C
df['temp_correction'] = 1 - 0.004 * (df['temp_c'] - 25).clip(lower=0)
# Time-based cyclical features
df['hour_sin'] = np.sin(2 * np.pi * df.index.hour / 24)
df['hour_cos'] = np.cos(2 * np.pi * df.index.hour / 24)
df['doy_sin'] = np.sin(2 * np.pi * df.index.dayofyear / 365)
df['doy_cos'] = np.cos(2 * np.pi * df.index.dayofyear / 365)
return df
ML Model Selection and Training
Several ML architectures work well for solar forecasting:
- Gradient Boosting (XGBoost/LightGBM): Best overall performance for day-ahead forecasting. Fast training, handles non-linear relationships well. RMSE typically 8-12% of rated capacity.
- Random Forest: Good baseline, robust to outliers (important for monsoon anomalies in India). RMSE typically 10-15%.
- LSTM (Long Short-Term Memory): Best for capturing temporal patterns (multi-day cloud patterns). Requires more data (1+ years). RMSE typically 7-10% with sufficient data.
- Linear Regression with solar physics features: Simple, interpretable baseline. RMSE 15-20% but useful for understanding relationships.
For Indian conditions, a hybrid physics + ML approach (use pvlib for clear-sky baseline, then train ML to predict the cloud correction factor) often outperforms pure ML.
Complete Python Implementation
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
import joblib
# Feature columns
FEATURES = [
'cos_zenith', 'solar_elevation', 'estimated_ghi',
'temp_c', 'temp_correction', 'clouds_pct', 'humidity',
'wind_speed', 'hour_sin', 'hour_cos', 'doy_sin', 'doy_cos',
'ghi_clearsky'
]
TARGET = 'power_kw' # Actual measured solar output
def train_forecasting_model(df):
# Filter daytime only (elevation > 5 degrees)
df_day = df[df['solar_elevation'] > 5].copy()
df_day = df_day.dropna(subset=FEATURES + [TARGET])
X = df_day[FEATURES]
y = df_day[TARGET]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, shuffle=False # Time series: no shuffle!
)
model = GradientBoostingRegressor(
n_estimators=200,
max_depth=4,
learning_rate=0.05,
subsample=0.8,
random_state=42
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_pred = np.maximum(y_pred, 0) # Solar output can't be negative
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"MAE: {mae:.3f} kW, RMSE: {rmse:.3f} kW")
# Save model
joblib.dump(model, 'solar_forecast_model.pkl')
return model, X_test, y_test, y_pred
def forecast_tomorrow(model, lat, lon):
"""Generate tomorrow's hourly solar forecast"""
df_weather = get_weather_forecast(lat, lon)
df_weather = df_weather.set_index('datetime')
df_features = engineer_solar_features(df_weather, lat, lon)
# Filter to tomorrow's dates
tomorrow = pd.Timestamp.now().date() + pd.Timedelta(days=1)
df_tomorrow = df_features[df_features.index.date == tomorrow]
forecast_kw = model.predict(df_tomorrow[FEATURES])
forecast_kw = np.maximum(forecast_kw, 0)
return pd.Series(forecast_kw, index=df_tomorrow.index, name='forecast_kw')
Model Evaluation Metrics
Standard metrics for solar forecasting in India:
- nRMSE (normalised RMSE): RMSE as % of installed capacity. Target: below 10% for day-ahead, below 5% for hour-ahead
- MAE (Mean Absolute Error): Average absolute error in kW. More interpretable than RMSE for operational use
- Skill Score: Improvement over naive persistence forecast (use yesterday’s generation as forecast)
- CERC Metric: India’s CERC IEGC allows 15% deviation for RE generators; models achieving below 10% nRMSE meet this requirement comfortably
Deployment for Indian Solar Systems
# Simple Flask API for solar forecast deployment
from flask import Flask, jsonify
import joblib
app = Flask(__name__)
model = joblib.load('solar_forecast_model.pkl')
@app.route('/forecast/<lat>/<lon>')
def get_forecast(lat, lon):
forecast = forecast_tomorrow(model, float(lat), float(lon))
return jsonify({
'location': {'lat': lat, 'lon': lon},
'forecast': [
{'time': str(t), 'power_kw': float(p)}
for t, p in forecast.items()
]
})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Frequently Asked Questions
How much training data is needed for a solar ML model?
Minimum 6 months, ideally 1-2 years of hourly data. For Indian systems, ensure your training data covers at least one complete monsoon season (June-September) as monsoon cloud patterns are drastically different from clear-sky winter/summer months. More data always helps, especially for capturing rare weather events.
Can I use LSTM for a small 5 kW residential solar system?
Yes, but gradient boosting (XGBoost) typically performs as well or better than LSTM for short-horizon (1-24 hour) forecasting with less computational complexity. LSTMs show advantages for multi-day (2-7 day) forecasts where sequential temporal patterns matter more.
What is the best free weather API for solar forecasting in India?
OpenWeatherMap free tier (1000 calls/day) combined with NASA POWER historical data is the best free combination for Indian solar forecasting. For serious applications, Solargis or Tomorrow.io provide significantly more accurate irradiance forecasts at a cost.
Does the model need retraining after installation?
Yes. Solar panels degrade 0.4-0.7% per year, dust accumulation patterns change seasonally, and panel orientation may shift slightly. Retrain your model quarterly using the latest 3-6 months of data. Implement automated retraining with performance monitoring triggers.
Add comment