Skip to content

GabaSatvik/DSG-BEGINNERS-HYPOTHESIS-2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌌 Kaggle Solution: Decoding the Black Box (Top 1%)

This repository contains the source code for our Gold Medal submission to the Beginners' Hypothesis 2026 challenge. Our solution achieves a near-perfect score by decomposing the problem into three orthogonal factors: Semantic Object Classification, Global Matrix Energy (Holt Metric), and Geometric Dependency (Peralta Factor).

🏆 Final Score: 0.9756 (Private Leaderboard) 🚀 Runtime: ~12 Minutes (GPU T4 x2)


🧠 The "Trinity" Architecture

Unlike standard approaches that throw a single massive backbone (e.g., EfficientNet-B7) at the problem, we hypothesized that the three target variables were generated by fundamentally different processes. We built three specialized solvers:

Factor Target Variable Type The Solver (Model) Key Insight
I label Classification ConvNeXt Base Semantic features (Object recognition).
II variable Regression SVD (Linear Algebra) The variable is the Effective Rank Ratio ($\sigma_1 / \sum \sigma_i$).
III hidden_label Classification ResNet-18 (Scratch) Geometric dependency between Red/Blue dots.

🛠️ Methodology & Experiments

Factor 1: The Object Class (Semantic)

  • Goal: Classify the background object (e.g., "Microscope", "Compass").
  • Challenge: The objects are heavily obscured by darkness and noise.
  • Solution: ConvNeXt Base (ImageNet weights).
  • Why it worked: ConvNeXt's modernized architecture (large kernels, layer norms) excelled at extracting semantic features from low-light, noisy environments where older ResNets struggled.

Factor 2: The Holt Metric (Mathematical)

  • Goal: Predict the continuous variable ($R^2=1.0$).
  • Failed Experiments:
    • Deep Learning (CNNs): High RMSE. Failed to converge.
    • Matrix Hack (Pixel Ridge Regression): $R^2 \approx 0.66$. Proved the variable wasn't pixel-position dependent.
  • The "Eureka" Moment: We analyzed the Singular Value Decomposition (SVD) of the image matrices.
  • The Formula: $$ \text{variable} = \frac{\sigma_1}{\sum_{i=1}^{N} \sigma_i} $$
    • This ratio represents the "Spectral Concentration" of the image matrix—a global property measuring how much information is contained in the first singular value.
    • Result: Perfect linear correlation ($R^2 = 1.0000$).

Factor 3: The Peralta Factor (Geometric)

  • Goal: Classify the hidden label (A, B, C, D) based on the Red and Blue dots.
  • The "Texture Bias" Problem: Pre-trained models (ResNet-50) failed (~96% acc) because they prioritized texture over geometry.
  • Failed Experiments:
    • Hough Circle Transform: Failed due to pixelated noise and non-circular artifacts.
    • YOLOv8: Overkill; detection boxes were unstable on 3x3 pixel dots.
  • The Solution:
    • Architecture: ResNet-18 trained from scratch.
    • Geometric Cleaning: We implemented a "Quadrant Split" that zeroed out noise, enforcing the rule that Blue $\in$ Top-Left and Red $\in$ Bottom-Right.
    • Loss Function: We stuck to MSE (Mean Squared Error) over Quartic Error ($Error^4$) to avoid gradient explosion from outliers.

📊 Visual Evidence

1. The SVD Discovery

The graph below proves that the variable is a direct function of the matrix's singular values, not a visual feature.

SVD Graph

2. The Texture Bias

A confusion matrix showing where pre-trained models failed (confusing Classes B and C) compared to our geometric-focused model.

Confusion Matrix


💻 Installation & Usage

Prerequisites

  • Python 3.8+
  • PyTorch 2.0+
  • OpenCV, Scikit-Learn, Pandas

Running the Solution

  1. Clone the repo:
    git clone [https://github.com/GabaSatvik/DSG-BEGINNERS-HYPOTHESIS-2026.git](https://github.com/GabaSatvik/DSG-BEGINNERS-HYPOTHESIS-2026.git)
  2. Download Data: Place the competition data in ./data/.
  3. Run the Kernel:
  jupyter notebook solution.ipynb

🤝 Acknowledgments

Competition Hosts: DSG-IIT ROORKEE.

Team: SATVIK 25125034

📝 License

MIT License. Free to use for educational purposes.

About

beginnner's hypothesis-2026 code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors