← Back to Projects
Python Excel Pandas Matplotlib Data Pipeline Maji Ndogo

Climate-Smart Agriculture Analytics

Problem Statement

Maji Ndogo is a fictional region used as a simulated dataset by ExploreAI Academy, modelled on real agricultural challenges across Sub-Saharan Africa. The dataset draws place names from across the continent, including areas in Nigeria, Kenya, and South Africa, to represent diverse farming landscapes and climate conditions.

Smallholder farmers across Sub-Saharan Africa face increasing climate uncertainty. Erratic rainfall, rising temperatures, and shifting seasons make traditional farming knowledge less reliable. This project analysed crop yield data against climate variables within the Maji Ndogo dataset to identify patterns that could help farmers and agricultural planners make better decisions about what to grow, when to plant, and where to invest.

Approach

Starting from raw agricultural data in a SQLite database, I built a structured Python pipeline that takes data from ingestion through to final insights — designed to be reproducible and transparent at every stage.

  • Loaded and validated raw agricultural datasets using Pandas and SQLAlchemy
  • Cleaned missing values, corrected data types, and fixed spelling errors
  • Performed statistical analysis to identify correlations between climate variables and crop yields
  • Built visualisations with Matplotlib to communicate findings clearly
  • Exported a summarised Excel report for non-technical stakeholders

Pipeline Architecture

# Data Ingestion & Pipeline Architecture def create_db_engine(db_path): """Creates and returns a SQLAlchemy database engine.""" engine = create_engine(db_path) return engine def query_data(engine, sql_query): """Queries the database and returns a DataFrame.""" with engine.connect() as connection: df = pd.read_sql_query(text(sql_query), connection) return df def read_from_web_CSV(URL): """Reads a CSV file from a web URL into a DataFrame.""" df = pd.read_csv(URL) return df # Run the full pipeline field_df = query_data(create_db_engine(config_params['db_path']), config_params['sql_query']) weather_df = read_from_web_CSV(config_params['weather_csv_path'])

Screenshots

Python data pipeline in Jupyter Notebook
Jupyter Notebook — Modular data ingestion pipeline functions
Python statistical analysis and visualisations
Python — Statistical analysis and Matplotlib visualisations
Excel summary report for agriculture project
Excel — Crop yield and climate summary exported from Python

Results

  • Identified strong correlation between seasonal rainfall and crop yields across regions
  • Flagged temperature anomaly years where yields dropped significantly
  • Produced a clean Excel summary accessible to agricultural extension officers
  • Built a reproducible pipeline that can be rerun as new data becomes available

Skills Demonstrated

  • Python data pipeline design (Pandas, SQLAlchemy, Matplotlib)
  • Data cleaning and validation at scale
  • Statistical correlation analysis
  • Data visualisation for storytelling
  • Excel reporting for non-technical audiences
  • Jupyter Notebook documentation