Crop Yield Prediction: IoT Sensor Data and Machine Learning

Crop yield prediction using IoT sensor data and machine learning enables Indian farmers and agricultural planners to forecast harvest volumes weeks in advance, optimise inputs, and improve supply chain logistics. This guide covers building an end-to-end crop yield prediction system with ESP32 sensors, Python ML pipeline, and practical deployment in Indian conditions.

Importance of Yield Prediction in India
Key Sensor Data for Yield Models
Hardware Setup
Data Collection Pipeline
Machine Learning Model
Python ML Code
Field Deployment
Frequently Asked Questions

Importance of Yield Prediction in India

India produces 330+ million tonnes of food grains annually. Even a 5% improvement in yield prediction accuracy translates to better government procurement planning, reduced post-harvest losses (currently 15-30%), and improved farmer income. Key use cases:

Government: State Agricultural Departments use yield forecasts for MSP procurement planning and food security buffers
Banks: Kisan Credit Card sanctioning uses predicted yield as collateral assessment
Commodity traders: Mandi price forecasting based on supply predictions
Agri-input companies: Fertilizer and pesticide demand planning
Farmers: Sell-forward decisions, input optimisation

Key Sensor Data for Yield Models

Yield depends on multiple interacting factors. The most predictive sensor variables are:

Parameter	Sensor	Yield Impact
Soil moisture (root zone)	Capacitive soil sensor	High (water stress = 20-40% yield loss)
Air temperature (min/max)	BME280	High (heat stress at flowering critical)
Relative humidity	BME280/SHT10	Medium (disease risk, pollination)
Solar radiation (LDR/BH1750)	LDR or BH1750	High (photosynthesis, biomass)
Rainfall	Tipping bucket gauge	High (water balance)

Hardware Setup

Recommended Sensors from Zbotic

A field node consists of:

ESP32 (data collection and WiFi transmission)
BME280 (temperature, humidity, atmospheric pressure)
2x Capacitive soil moisture sensors (at 15cm and 30cm depth)
BH1750 light intensity sensor (for solar radiation proxy)
DS3231 RTC for accurate timestamps
Solar power (5W panel + 10Ah LiPo)

Data Collection Pipeline

The ESP32 sends sensor readings every 30 minutes to a central server:

ESP32 reads all sensors and timestamps with RTC
Data sent via WiFi (or LoRa gateway) to MQTT broker
InfluxDB stores time-series data on Raspberry Pi or cloud VM
Python ML pipeline queries InfluxDB weekly for model training and prediction

Minimum training data: 2 complete crop seasons (6-8 months for most Indian crops). With existing IMD weather station data, you can bootstrap a model immediately and refine as field sensor data accumulates.

Machine Learning Model

For crop yield prediction, a Random Forest Regressor provides an excellent balance of accuracy and interpretability:

Linear regression: Simple baseline, works well with 3-5 features and historical yield data
Random Forest: Handles non-linear interactions, robust to missing data, feature importance output
XGBoost: Best accuracy with large datasets (5+ years, multiple farms)
LSTM: Best for sequential time-series patterns (monsoon progression, crop phenology stages)

Python ML Code

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, r2_score
import joblib

# Load sensor data from CSV (exported from InfluxDB or ThingSpeak)
df = pd.read_csv('farm_data_2years.csv', parse_dates=['date'])

# Feature engineering
df['growing_degree_days'] = ((df['max_temp'] + df['min_temp']) / 2 - 10).clip(lower=0)
df['gdd_cumulative'] = df.groupby(['season', 'field_id'])['growing_degree_days'].cumsum()
df['rainfall_7d_sum'] = df.groupby('field_id')['total_rainfall'].rolling(7).sum().reset_index(drop=True)
df['vpd'] = df['avg_temp'] * (1 - df['avg_humidity']/100) * 0.066

features = ['gdd_cumulative', 'avg_soil_moisture', 'rainfall_7d_sum',
            'avg_humidity', 'avg_light_lux', 'vpd', 'days_since_sowing']
target = 'actual_yield_kg_per_acre'

model_df = df[features + [target]].dropna()
X, y = model_df[features], model_df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

rf_model = RandomForestRegressor(n_estimators=200, max_depth=10,
                                  min_samples_leaf=5, random_state=42, n_jobs=-1)
rf_model.fit(X_train_scaled, y_train)

y_pred = rf_model.predict(X_test_scaled)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"MAE: {mae:.0f} kg/acre ({mae/y_test.mean()*100:.1f}% error)")
print(f"R2 score: {r2:.3f}")

importances = pd.Series(rf_model.feature_importances_, index=features).sort_values(ascending=False)
print("Feature Importances:")
print(importances)

joblib.dump(rf_model, 'yield_model.pkl')
joblib.dump(scaler, 'yield_scaler.pkl')

Field Deployment

Integration steps for a complete system:

ESP32 nodes: Deploy 2-3 per 10 acres for representative sampling
Gateway: Raspberry Pi 4 at farmhouse edge with MQTT broker, InfluxDB, and Grafana
Weekly model run: Cron job updates predictions every Monday morning
Farmer interface: WhatsApp bot sends weekly yield forecast in local language (Hindi, Marathi, Telugu)
Extension integration: Share prediction data with local Krishi Vigyan Kendra (KVK)

Typical accuracy with 2 years of training data: Wheat (Punjab) MAE plus or minus 8%, Paddy (AP/Karnataka) MAE plus or minus 12%, Tomato polyhouse MAE plus or minus 6%.

Related Sensing Products

GY-BME280 5V variant for 5V microcontroller systems
Capacitive Soil Moisture Sensor for root zone monitoring

Frequently Asked Questions

How much historical data do I need to train a reliable yield model?

Minimum 2 complete crop seasons (same crop, same field). With 3-5 seasons, accuracy improves significantly. You can augment with IMD weather data (available free from data.gov.in) and published agronomic yield tables for your region.

Can I use this system for multiple crops?

Train separate models for each crop. Crop-specific features (flowering date, critical irrigation stages) differ significantly. Using a single model across crops degrades accuracy by 15-25%.

Is the ML model retraining automatic?

Add an automated retraining pipeline: after each harvest, add actual yield data to the dataset and retrain. Validate new model against held-out last season. If R2 improves, deploy the new model automatically.

What government resources support IoT-based precision farming?

ICAR provides free agronomic data. NABARD funds precision farming pilots under the Agricultural Infrastructure Fund. The Digital Agriculture Mission 2021-25 actively promotes IoT and ML-based advisory systems.

Shop Smart Farming Sensors at Zbotic

Crop Yield Prediction: IoT Sensor Data and Machine Learning

Table of Contents

Importance of Yield Prediction in India

Key Sensor Data for Yield Models

Hardware Setup

Recommended Sensors from Zbotic

Data Collection Pipeline

Machine Learning Model

Python ML Code

Field Deployment

Related Sensing Products

Frequently Asked Questions

How much historical data do I need to train a reliable yield model?

Can I use this system for multiple crops?

Is the ML model retraining automatic?

What government resources support IoT-based precision farming?

Related posts

Farm Drone Pilot Training: Course and Certification India

Crop Insurance Sensor: Weather Data for Claims India

Fertigation Controller: Drip Irrigation Nutrient Mixing

Organic Farm Certification: Monitoring Requirements

Agri-Tech Startups India: Technology Partners for Farmers

Add comment Cancel reply

Call us: 020 69134444 / 1800 209 0998

My Account

Cart

Wishlist

Checkout

My Orders

Track Order

My Account

Information

FAQs

Blogs

Career

About Us

Contact Us

Payment Options

Policies

Privacy Policy

Terms & Conditions

GST Input Tax Credit

Shipping Return Policy

E-Waste Collection Points

Our Sitemap