Deep Neural Network Learning

A comprehensive hands-on deep learning implementation project with 30 tasks covering neural network fundamentals to production deployment.

Project Overview

This project implements deep learning components from scratch with NumPy, verified against PyTorch autograd. All implementations pass gradient checks (<1e-6 error) and include comprehensive documentation.

Tech Stack

Component	Version
Python	3.10+
PyTorch	2.4.1+cu124
CUDA	12.4
NumPy	2.x
pytest	Latest

Hardware Target

GPU: NVIDIA RTX 3050 Ti (4GB VRAM)
Optimized for limited VRAM with mixed precision, gradient checkpointing, and accumulation

Project Structure

deep-neural-network-learning/
├── phase1_basics/           # Neural Network Fundamentals
│   ├── activations.py       # 6 activation functions (Sigmoid, Tanh, ReLU, LeakyReLU, GELU, Swish)
│   ├── loss.py              # 5 loss functions (MSE, CrossEntropy, Focal, LabelSmoothing, Triplet)
│   ├── mlp.py               # Forward/backward propagation with computational graph
│   ├── optimizer.py         # 6 optimizers (SGD, Momentum, Nesterov, AdaGrad, RMSProp, Adam)
│   └── weight_init.py       # 5 initialization methods (Xavier, He, Kaiming, LSUV, Zero)
│
├── phase2_architectures/    # Neural Network Architectures
│   ├── cnn_layers.py        # Conv2d, MaxPool2d, AvgPool2d with im2col/col2im
│   ├── simple_cnn.py        # ResNet-style CNN with BatchNorm, ResidualBlock
│   ├── rnn_cells.py         # RNN, LSTM, GRU with BPTT gradient verification
│   └── attention.py         # Self-Attention, Multi-Head Attention, Position Encoding
│
├── phase3_training/         # Training Techniques
│   ├── normalization.py     # BatchNorm, LayerNorm, InstanceNorm, GroupNorm
│   ├── dropout.py           # Standard, Variational, MC, Alpha Dropout
│   ├── regularization.py    # L1, L2, ElasticNet, MaxNorm, SpectralNorm
│   ├── lr_scheduler.py      # 11 schedulers (Step, Cosine, Warmup, OneCycle, etc.)
│   ├── image_augmentation.py # RandomCrop, Flip, Rotation, ColorJitter, Mixup, CutMix
│   ├── text_augmentation.py  # Token masking, synonym replacement
│   └── transfer_learning.py  # Pretrained models, freeze strategies, discriminative LR
│
├── phase4_advanced/         # Advanced Training
│   ├── mixed_precision.py   # FP16/BF16/TF32 with GradScaler
│   ├── gradient_stability.py # Gradient clipping, residual connections
│   ├── early_stopping.py    # Patience counter, best weights restoration
│   ├── training_monitor.py  # Gradient flow, activation distribution
│   ├── tensorboard_debug.py # TensorBoard integration
│   └── nan_debugger.py      # NaN detection, LR auto-adjustment
│
├── phase5_deployment/       # Optimization & Deployment
│   ├── ddp_training.py      # Distributed Data Parallel with NCCL
│   ├── multi_gpu.py         # Multi-GPU configuration
│   ├── gradient_accumulation.py # Memory-efficient large batch training
│   ├── memory_optimizer.py  # Gradient checkpointing, CPU offloading
│   ├── checkpoint_manager.py # Save/resume training state
│   ├── pruning.py           # Magnitude, Gradient, Channel pruning
│   ├── quantization.py      # PTQ, QAT, INT8/INT4 quantization
│   ├── onnx_export.py       # PyTorch to ONNX with dynamic axes
│   ├── onnx_inference.py    # ONNX Runtime inference
│   ├── distillation.py      # Knowledge distillation with temperature scaling
│   ├── tensorrt_inference.py # TensorRT FP16/INT8 acceleration
│   └── mobile_deployment.py  # NCNN, Core ML export
│
├── tests/                   # Comprehensive Test Suite (500+ tests)
│   └── test_*.py            # All modules tested with gradient verification
│
└── experiments/             # Benchmarks & Comparisons
    ├── optimizer_comparison.py
    ├── architecture_comparison.py
    └── amp_benchmark.py

Completed Features (27/30 Tasks)

Phase 1: Neural Network Fundamentals (5/5)

Task	Description	File
Activation Functions	6 functions with gradient verification	`phase1_basics/activations.py`
Forward/Backward Propagation	MLP with computational graph	`phase1_basics/mlp.py`
Loss Functions	MSE, CrossEntropy, Focal, LabelSmoothing, Triplet	`phase1_basics/loss.py`
Optimizers	SGD, Momentum, Nesterov, AdaGrad, RMSProp, Adam	`phase1_basics/optimizer.py`
Weight Initialization	Xavier, He, Kaiming, LSUV	`phase1_basics/weight_init.py`

Phase 2: Architectures (3/3)

Task	Description	File
CNN Architecture	Conv2d, Pooling, ResNet-style blocks	`phase2_architectures/cnn_layers.py`
RNN/LSTM/GRU	Sequential models with BPTT	`phase2_architectures/rnn_cells.py`
Self-Attention	Multi-head attention, position encoding	`phase2_architectures/attention.py`

Phase 3: Training Techniques (5/5)

Task	Description	File
Normalization	BatchNorm, LayerNorm, InstanceNorm, GroupNorm	`phase3_training/normalization.py`
Dropout & Regularization	4 dropout variants, L1/L2/ElasticNet	`phase3_training/dropout.py`
LR Schedulers	11 scheduling strategies	`phase3_training/lr_scheduler.py`
Data Augmentation	Image (Mixup, CutMix) + Text (Token masking)	`phase3_training/image_augmentation.py`
Transfer Learning	Freeze strategies, discriminative LR	`phase3_training/transfer_learning.py`

Phase 4: Advanced Training (4/4)

Task	Description	File
Mixed Precision	FP16/BF16 with GradScaler	`phase4_advanced/mixed_precision.py`
Gradient Stability	Clipping, residual connections	`phase4_advanced/gradient_stability.py`
Training Debugging	TensorBoard, gradient visualization	`phase4_advanced/training_monitor.py`
Early Stopping	Patience counter, best weights	`phase4_advanced/early_stopping.py`

Phase 5: Deployment (10/13)

Task	Description	File
DDP Training	Distributed data parallel	`phase5_deployment/ddp_training.py`
Gradient Accumulation	Memory-efficient training	`phase5_deployment/gradient_accumulation.py`
Memory Optimization	Gradient checkpointing, CPU offloading	`phase5_deployment/memory_optimizer.py`
Checkpoint/Resume	Save and restore training state	`phase5_deployment/checkpoint_manager.py`
Model Pruning	Magnitude, gradient, channel pruning	`phase5_deployment/pruning.py`
Model Quantization	PTQ, QAT, INT8/INT4	`phase5_deployment/quantization.py`
ONNX Export	PyTorch to ONNX conversion	`phase5_deployment/onnx_export.py`
Knowledge Distillation	Teacher-student training	`phase5_deployment/distillation.py`
Architecture Comparison	CNN vs Transformer vs RNN	`experiments/architecture_comparison.py`
NaN Debugging	Loss instability diagnosis	`phase4_advanced/nan_debugger.py`

In Progress (1/30)

Task	Description	Status
Edge Deployment	TensorRT, NCNN, Core ML	Deliverables complete, hardware validation pending

Planned Features (2/30)

Task	Description	Blocker
E2E Image Classification	Full pipeline: data -> training -> deployment	Edge deployment
Hyperparameter Tuning	Grid, Random, Bayesian, Hyperband	E2E pipeline

Quick Start

# Clone repository
git clone https://github.com/StarChen-Cycler/deep-neural-network-learning.git
cd deep-neural-network-learning

# Create conda environment
conda create -n dnn python=3.10
conda activate dnn

# Install dependencies
pip install torch torchvision numpy pytest

# Run tests
pytest tests/ -v

# Run specific module
python -m phase1_basics.activations

Key Implementation Highlights

Gradient Verification

All implementations pass numerical gradient checking:

# Example: Activation gradient verification
from phase1_basics.activations import sigmoid, sigmoid_grad
from tests.test_activations import gradient_check

x = np.random.randn(10, 5)
analytical = sigmoid_grad(x)
numerical = gradient_check(lambda x: np.sum(sigmoid(x)), x)
assert np.allclose(analytical, numerical, atol=1e-6)

Memory Optimization for 4GB VRAM

from phase5_deployment.gradient_accumulation import GradientAccumulationTrainer
from phase5_deployment.memory_optimizer import enable_gradient_checkpointing

# Effective batch size = 32 with only 4 samples in VRAM
trainer = GradientAccumulationTrainer(model, accumulation_steps=8)

# Enable gradient checkpointing for 50% memory savings
enable_gradient_checkpointing(model)

Mixed Precision Training

from phase4_advanced.mixed_precision import MixedPrecisionTrainer

trainer = MixedPrecisionTrainer(model, optimizer, precision="fp16")
trainer.train(dataloader, epochs=10)

Model Compression Pipeline

from phase5_deployment.pruning import MagnitudePruner
from phase5_deployment.quantization import StaticQuantizer
from phase5_deployment.distillation import KnowledgeDistiller

# Prune 50% weights
pruner = MagnitudePruner(sparsity=0.5)
pruner.prune(model)

# Quantize to INT8
quantizer = StaticQuantizer()
quantized_model = quantizer.quantize(model)

# Distill to smaller model
distiller = KnowledgeDistiller(teacher=large_model)
distiller.train(student_model, dataloader)

Test Coverage

Module	Tests	Status
Activations	50	Passing
MLP	31	Passing
Loss	44	Passing
Optimizer	25	Passing
CNN	41	Passing
RNN/LSTM/GRU	28	Passing
Attention	22	Passing
Normalization	30	Passing
Dropout	24	Passing
LR Scheduler	35	Passing
Mixed Precision	27	Passing
DDP Training	33	Passing
Pruning	43	Passing
Quantization	30	Passing
ONNX	32	Passing
Distillation	35	Passing
Checkpoint	22	Passing
Gradient Accumulation	37	Passing
Early Stopping	31	Passing

References

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Neural Network Learning

Project Overview

Tech Stack

Hardware Target

Project Structure

Completed Features (27/30 Tasks)

Phase 1: Neural Network Fundamentals (5/5)

Phase 2: Architectures (3/3)

Phase 3: Training Techniques (5/5)

Phase 4: Advanced Training (4/4)

Phase 5: Deployment (10/13)

In Progress (1/30)

Planned Features (2/30)

Quick Start

Key Implementation Highlights

Gradient Verification

Memory Optimization for 4GB VRAM

Mixed Precision Training

Model Compression Pipeline

Test Coverage

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
experiments		experiments
phase1_basics		phase1_basics
phase2_architectures		phase2_architectures
phase3_training		phase3_training
phase4_advanced		phase4_advanced
phase5_deployment		phase5_deployment
tests		tests
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Deep Neural Network Learning

Project Overview

Tech Stack

Hardware Target

Project Structure

Completed Features (27/30 Tasks)

Phase 1: Neural Network Fundamentals (5/5)

Phase 2: Architectures (3/3)

Phase 3: Training Techniques (5/5)

Phase 4: Advanced Training (4/4)

Phase 5: Deployment (10/13)

In Progress (1/30)

Planned Features (2/30)

Quick Start

Key Implementation Highlights

Gradient Verification

Memory Optimization for 4GB VRAM

Mixed Precision Training

Model Compression Pipeline

Test Coverage

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages