Skip to content

suvarnak/vision-transformer-finetuning

Repository files navigation

Vision Transformer Fine-tuning Repository

This repository contains code for fine-tuning Vision Transformers using two different approaches for different computer vision tasks.

📁 Repository Structure

vision-transformer-finetuning/
├── lora-vit-classification/     # LoRA fine-tuning for image classification
│   ├── lora.ipynb             # Jupyter notebook with LoRA implementation
│   └── README.md              # Detailed documentation
├── dino2-object-detection/     # DINO v2 fine-tuning for object detection
│   ├── finetune_dino2_simple.py      # Simple DINO detector
│   ├── improved_dino_detector.py     # Enhanced DINO detector
│   ├── comprehensive_validation.py   # Complete evaluation
│   ├── plot_results.py              # Performance visualization
│   └── README.md                    # Detailed documentation
└── README.md                   # This file

🎯 Two Approaches

1. LoRA Fine-tuning for Image Classification

Location: lora-vit-classification/

  • Task: Image Classification
  • Dataset: CIFAR-10, ImageNet
  • Model: Vision Transformer (ViT) with LoRA adaptation
  • Key Benefit: Parameter-efficient fine-tuning (1-5% trainable parameters)

Quick Start:

cd lora-vit-classification
# Open lora.ipynb in Jupyter notebook

2. DINO v2 Fine-tuning for Object Detection

Location: dino2-object-detection/

  • Task: Object Detection
  • Dataset: COCO Detection Dataset (91 classes)
  • Model: DINO v2 backbone with detection heads
  • Key Benefit: State-of-the-art performance, outperforms YOLOv8

Quick Start:

cd dino2-object-detection
python improved_dino_detector.py  # Train enhanced model
python comprehensive_validation.py  # Evaluate performance
python plot_results.py  # Generate comparison charts

🏆 Key Results

LoRA Classification

  • Parameter Efficiency: 95%+ accuracy with only 1-5% trainable parameters
  • Memory Reduction: 50-70% less GPU memory usage
  • Speed: Faster training and inference

DINO v2 Object Detection

  • Best Performance: 64.2% accuracy (vs YOLOv8x: 56.0%)
  • Superior IoU: 0.322 (vs YOLO: ~0.13)
  • Dramatic Improvement: 197% better than simple approach

🛠️ Installation

# Clone repository
git clone <repository-url>
cd vision-transformer-finetuning

# Install core dependencies
pip install torch torchvision transformers datasets pillow accelerate

# Optional: For YOLO comparison and visualization
pip install ultralytics matplotlib seaborn

📊 Performance Comparison

Approach Task Dataset Best Model Key Metric
LoRA ViT Classification CIFAR-10/ImageNet LoRA-adapted ViT 95%+ accuracy, 1-5% parameters
DINO v2 Object Detection COCO Improved DINO 64.2% accuracy, 0.322 IoU

🚀 Getting Started

  1. Choose your task:

    • Image Classification → lora-vit-classification/
    • Object Detection → dino2-object-detection/
  2. Follow folder-specific README for detailed instructions

  3. Run the code and compare with baselines

📖 Documentation

Each folder contains detailed documentation:

  • Technical implementation details
  • Training procedures
  • Evaluation metrics
  • Performance analysis
  • Usage instructions

🎓 Learning Outcomes

This repository demonstrates:

  • Parameter-efficient fine-tuning with LoRA
  • Self-supervised model adaptation with DINO v2
  • Advanced loss functions for object detection
  • Comprehensive evaluation and comparison methodologies
  • Performance optimization techniques

📄 License

[Add your license information here]

🤝 Contributing

[Add contribution guidelines here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors