Nutrition & Obesity Trends Analysis

A data analysis project examining global nutrition patterns, dietary trends, and their relationship with obesity rates across countries.

📋 Quick Start

1. Setup Environment

# Activate virtual environment
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies (if not already installed)
pip install -r requirements.txt

2. Run Data Processing Pipeline

# Run the complete pipeline (preprocesses all data and creates master panel)
python run_pipeline.py

This will:

✅ Preprocess FAO nutrition data
✅ Preprocess obesity data
✅ Create food group mappings
✅ Create panel datasets
✅ Create final master panel

Output: data/processed/final/master_panel_final.csv

3. Run Analysis

# Exploratory Data Analysis
python scripts/analysis/perform_eda.py

# Create interactive visualizations
python scripts/analysis/interactive_plot.py

4. Explore in Jupyter

Step 0: Understand raw data (recommended first):

jupyter notebook notebooks/00_raw_data_exploration.ipynb

Main analysis notebook:

jupyter notebook notebooks/01_eda_visualization.ipynb

Note: See notebooks/README.md for complete notebook guide.

📁 Project Structure

Bell labs/
│
├── run_pipeline.py              # Main pipeline script - run this first!
│
├── data/
│   ├── raw/                     # Raw data files (FAO, WHO datasets)
│   │   ├── FoodBalanceSheet_data/
│   │   ├── Population_data/
│   │   └── data.csv            # Obesity data
│   │
│   └── processed/              # Cleaned and processed data (organized)
│       ├── cleaned/            # Step 1-3: Cleaned raw data
│       │   ├── Cleaned_FAO_Nutrients.csv
│       │   ├── Cleaned_FAO_Population.csv
│       │   └── Cleaned_Obesity.csv
│       ├── mappings/           # Step 3: Mapping files
│       │   └── Item_to_FoodGroup.csv
│       ├── panels/             # Step 4: Intermediate panels
│       └── final/              # Step 5: Final dataset ⭐
│           └── master_panel_final.csv
│
├── scripts/                     # Processing scripts (organized by purpose)
│   ├── preprocessing/          # Step 1-3: Data preprocessing
│   │   ├── preprocess_fao_data.py
│   │   ├── preprocess_obesity_data.py
│   │   └── preprocess_food_group_mapping.py
│   ├── panels/                 # Step 4-5: Panel dataset creation
│   │   ├── create_panel_datasets.py
│   │   └── create_master_panel.py
│   └── analysis/               # Step 6+: Analysis and visualization
│       ├── perform_eda.py
│       ├── extended_eda.py
│       └── interactive_plot.py
│
├── notebooks/                   # Jupyter notebooks for exploration
│   └── 02_eda_visualization.ipynb      # Main analysis notebook
│
├── doc/                         # Documentation (organized)
│   ├── README.md               # Documentation guide
│   ├── guides/                 # How-to guides
│   │   └── methodology.md     # Detailed methodology
│   ├── reference/              # Reference docs
│   │   ├── data_dictionary.md  # Variable descriptions
│   │   ├── data_analysis.md    # Dataset analysis
│   │   └── dataset_analysis.md # Alternative analysis
│   └── notes/                  # Research notes
│       └── research_notes.md   # Research findings
│   └── reseach_notes.md
│
└── requirements.txt             # Python dependencies

🔄 Data Processing Workflow

Pipeline Steps

Preprocess FAO Data (preprocess_fao_data.py)
- Cleans Food Balance Sheet data
- Extracts nutrients (energy, protein, fat)
- Extracts population data
- Output: Cleaned_FAO_Nutrients.csv, Cleaned_FAO_Population.csv
Preprocess Obesity Data (preprocess_obesity_data.py)
- Cleans WHO obesity dataset
- Standardizes country names
- Output: Cleaned_Obesity.csv
Create Food Group Mapping (preprocess_food_group_mapping.py)
- Maps FAO items to food groups (Cereals, Meat, Dairy, etc.)
- Output: Item_to_FoodGroup.csv
Create Panel Datasets (create_panel_datasets.py)
- Creates country-year panels for nutrients
- Aggregates food groups by country-year
- Output: nutrient_panel.csv, foodgroup_energy_panel.csv, etc.
Create Master Panel (create_master_panel.py)
- Merges all datasets into final panel
- Handles missing data
- Output: master_panel_final.csv ⭐

Running Individual Steps

If you need to run steps individually:

python scripts/preprocess_fao_data.py
python scripts/preprocess_obesity_data.py
python scripts/preprocess_food_group_mapping.py
python scripts/create_panel_datasets.py
python scripts/create_master_panel.py

📊 Final Dataset

File: data/processed/final/master_panel_final.csv

Structure: Country-year panel (171 countries, 2010-2022)

Key Variables:

country, year: Identifiers
energy_kcal_day, protein_g_day, fat_g_day: Nutrients (per capita/day)
Cereals, Meat, Dairy & Eggs, etc.: Food group energy (kcal/capita/day)
Cereals_share, Meat_share, etc.: Food group shares (%)
population: Total population
obesity_pct: Obesity prevalence (%)

See data/processed/README.md for detailed variable descriptions.

🔬 Analysis

Exploratory Data Analysis

python scripts/perform_eda.py

Generates:

Summary statistics
Correlation matrices
Trend visualizations
Outputs saved to data/outputs/

Interactive Visualizations

python scripts/interactive_plot.py

Creates interactive Plotly charts for:

Energy vs Obesity trends
Food group shares over time
Country comparisons

Jupyter Notebooks

Open notebooks/02_eda_visualization.ipynb for interactive exploration.

📚 Documentation

Documentation Guide: doc/README.md - Overview of all documentation
Methodology: doc/guides/methodology.md - Detailed methodology
Data Dictionary: doc/reference/data_dictionary.md - Variable descriptions
Research Notes: doc/notes/research_notes.md - Research findings
Processed Data README: data/processed/README.md - Dataset documentation

🛠️ Requirements

Python 3.8+
See requirements.txt for package list

Main dependencies:

pandas, numpy
matplotlib, seaborn, plotly
jupyter
scikit-learn

📝 Notes

Data Sources: FAO Food Balance Sheets, WHO Global Health Observatory
Year Coverage: 2010-2022 (common years across datasets)
Country Coverage: 171 countries
Missing Data: Handled via interpolation (max 2-year gaps)

🚀 Next Steps

After running the pipeline:

Explore the data: Open notebooks/01_eda_visualization.ipynb
Run EDA: python scripts/analysis/perform_eda.py
Create visualizations: python scripts/analysis/interactive_plot.py
Build models: Use data/processed/final/master_panel_final.csv for regression/ML analysis

❓ Troubleshooting

Issue: ModuleNotFoundError

Solution: Activate virtual environment: source venv/bin/activate

Issue: Missing raw data files

Solution: Ensure data files are in data/raw/ directory

Issue: Pipeline fails at a step

Solution: Check error message, fix the issue, and re-run from that step

Last Updated: 2025-01-20

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
doc		doc
notebooks		notebooks
reports		reports
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
cleanup_intermediate_files.py		cleanup_intermediate_files.py
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nutrition & Obesity Trends Analysis

📋 Quick Start

1. Setup Environment

2. Run Data Processing Pipeline

3. Run Analysis

4. Explore in Jupyter

📁 Project Structure

🔄 Data Processing Workflow

Pipeline Steps

Running Individual Steps

📊 Final Dataset

🔬 Analysis

Exploratory Data Analysis

Interactive Visualizations

Jupyter Notebooks

📚 Documentation

🛠️ Requirements

📝 Notes

🚀 Next Steps

❓ Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nutrition & Obesity Trends Analysis

📋 Quick Start

1. Setup Environment

2. Run Data Processing Pipeline

3. Run Analysis

4. Explore in Jupyter

📁 Project Structure

🔄 Data Processing Workflow

Pipeline Steps

Running Individual Steps

📊 Final Dataset

🔬 Analysis

Exploratory Data Analysis

Interactive Visualizations

Jupyter Notebooks

📚 Documentation

🛠️ Requirements

📝 Notes

🚀 Next Steps

❓ Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages