PythonExcelPandasMatplotlibData PipelineMaji Ndogo
Climate-Smart Agriculture Analytics
Problem Statement
Maji Ndogo is a fictional region used as a simulated dataset by ExploreAI Academy, modelled on real agricultural challenges across Sub-Saharan Africa. The dataset draws place names from across the continent, including areas in Nigeria, Kenya, and South Africa, to represent diverse farming landscapes and climate conditions.
Smallholder farmers across Sub-Saharan Africa face increasing climate uncertainty. Erratic rainfall, rising temperatures, and shifting seasons make traditional farming knowledge less reliable. This project analysed crop yield data against climate variables within the Maji Ndogo dataset to identify patterns that could help farmers and agricultural planners make better decisions about what to grow, when to plant, and where to invest.
Approach
Starting from raw agricultural data in a SQLite database, I built a structured Python pipeline that takes data from ingestion through to final insights — designed to be reproducible and transparent at every stage.
Loaded and validated raw agricultural datasets using Pandas and SQLAlchemy
Cleaned missing values, corrected data types, and fixed spelling errors
Performed statistical analysis to identify correlations between climate variables and crop yields
Built visualisations with Matplotlib to communicate findings clearly
Exported a summarised Excel report for non-technical stakeholders
Pipeline Architecture
# Data Ingestion & Pipeline Architecturedefcreate_db_engine(db_path):
"""Creates and returns a SQLAlchemy database engine."""
engine = create_engine(db_path)
return engine
defquery_data(engine, sql_query):
"""Queries the database and returns a DataFrame."""with engine.connect() as connection:
df = pd.read_sql_query(text(sql_query), connection)
return df
defread_from_web_CSV(URL):
"""Reads a CSV file from a web URL into a DataFrame."""
df = pd.read_csv(URL)
return df
# Run the full pipeline
field_df = query_data(create_db_engine(config_params['db_path']),
config_params['sql_query'])
weather_df = read_from_web_CSV(config_params['weather_csv_path'])
Screenshots
Jupyter Notebook — Modular data ingestion pipeline functionsPython — Statistical analysis and Matplotlib visualisationsExcel — Crop yield and climate summary exported from Python
Results
Identified strong correlation between seasonal rainfall and crop yields across regions
Flagged temperature anomaly years where yields dropped significantly
Produced a clean Excel summary accessible to agricultural extension officers
Built a reproducible pipeline that can be rerun as new data becomes available
Skills Demonstrated
Python data pipeline design (Pandas, SQLAlchemy, Matplotlib)