โ† Back to Projects
Python Excel Pandas Matplotlib Data Pipeline

Climate-Smart Agriculture Analytics

Problem Statement

Smallholder farmers in East Africa face increasing climate uncertainty โ€” erratic rainfall, rising temperatures, and shifting seasons make traditional farming knowledge less reliable. This project analysed crop yield data against climate variables to identify patterns that could help farmers and agricultural planners make better decisions about what to grow, when to plant, and where to invest.

Approach

Starting from a raw Jupyter notebook, I built a structured Python pipeline that takes agricultural data from ingestion through to final insights. The pipeline was designed to be reproducible and transparent โ€” each stage clearly documented so that results can be verified and trusted.

  • Loaded and validated raw agricultural datasets using Pandas
  • Cleaned missing values, corrected data types, and removed outliers
  • Performed statistical analysis to identify correlations between climate variables and crop yields
  • Built visualisations with Matplotlib to communicate findings clearly
  • Exported a summarised Excel report for non-technical stakeholders

Pipeline Architecture

The pipeline follows a clean 4-stage structure:

# Stage 1: Data Ingestion import pandas as pd df = pd.read_csv('agriculture_data.csv') # Stage 2: Data Cleaning df.dropna(subset=['yield_kg', 'rainfall_mm'], inplace=True) df['season'] = df['date'].dt.quarter # Stage 3: Statistical Analysis correlation = df[['yield_kg', 'rainfall_mm', 'temp_avg']].corr() # Stage 4: Export Summary df.to_excel('agriculture_summary.xlsx', index=False)

Screenshots

Excel summary report
Excel Summary โ€” Crop yield analysis for non-technical stakeholders
Python data analysis visualisations
Python โ€” Statistical analysis and visualisations (Matplotlib)
Python data pipeline
Python โ€” Data pipeline in Jupyter Notebook

Results

  • Identified strong correlation between seasonal rainfall and maize yields across regions
  • Flagged temperature anomaly years where yields dropped significantly
  • Produced a clean Excel summary accessible to agricultural extension officers
  • Built a reproducible pipeline that can be rerun as new data becomes available

Skills Demonstrated

  • Python data pipeline design (Pandas, Matplotlib)
  • Data cleaning and validation at scale
  • Statistical correlation analysis
  • Data visualisation for storytelling
  • Excel reporting for non-technical audiences
  • Jupyter Notebook documentation