ProjectBzarre

End-to-end space weather ML pipeline for data ingestion, preprocessing, label generation, model training (regression and classification), and inference.

Project Architecture

The project is structured into distinct pipeline stages:

data_sources/: Scripts to download and collect raw space weather data from various sources (ACE, DSCOVR, SWPC, etc.).
database_builder/: SQLite utilities to handle raw data warehousing and table construction.
preprocessing_pipeline/: Feature engineering, aggregation, splits, normalization, label target generation, and merging into unified datasets.
regression_pipeline/ & classification_pipeline/: Training, evaluation, and modeling scripts for multi-horizon forecasting and probability calculation.
inference/: End-to-end inference execution, taking live/recent data, formatting it, and running the pre-trained models.
common/: Shared utilities (e.g., logging, HTTP requests).
tests/: Project tests.

Setup & Environment

The project requires several scientific and machine learning libraries (e.g., NumPy, Pandas, Astropy, Sunpy, Scikit-Learn, XGBoost). You can set up the environment using Conda with the provided environment.yml or pip with requirements.txt.

Example using Conda:

conda env create -f environment.yml
conda activate projectbzarre

Running the Pipelines

Most major stages of the pipeline have top-level runner scripts that execute the components in the correct order. Example entry points:

Preprocessing: python3 preprocessing_pipeline/run_full_preprocessing_pipeline.py
Regression Modeling: python3 regression_pipeline/run_full_regression.py
Inference: python3 inference/run_full_inference.py

Notes

Databases are primarily Local SQLite (.db) files and reside under their respective pipeline directories.
Many stages rely on environment variables for configuration, such as defining split windows and aggregation cadences (e.g., PREPROC_SPLIT_TRAIN_START, PREPROC_AGG_FREQ).
Horizon selection for training is generally handled by constants within the specific modeling scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProjectBzarre

Project Architecture

Setup & Environment

Running the Pipelines

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 921 Commits
.github		.github
classification_pipeline		classification_pipeline
common		common
data_sources		data_sources
database_builder		database_builder
inference		inference
preprocessing_pipeline		preprocessing_pipeline
regression_pipeline		regression_pipeline
tests		tests
.gitignore		.gitignore
README.md		README.md
combined_predicted_dst_and_prob.png		combined_predicted_dst_and_prob.png
environment.yml		environment.yml
requirements.txt		requirements.txt
space_weather_api.py		space_weather_api.py
space_weather_warehouse.py		space_weather_warehouse.py

Folders and files

Latest commit

History

Repository files navigation

ProjectBzarre

Project Architecture

Setup & Environment

Running the Pipelines

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages