🌌 Kaggle Solution: Decoding the Black Box (Top 1%)

This repository contains the source code for our Gold Medal submission to the Beginners' Hypothesis 2026 challenge. Our solution achieves a near-perfect score by decomposing the problem into three orthogonal factors: Semantic Object Classification, Global Matrix Energy (Holt Metric), and Geometric Dependency (Peralta Factor).

🏆 Final Score: 0.9756 (Private Leaderboard) 🚀 Runtime: ~12 Minutes (GPU T4 x2)

🧠 The "Trinity" Architecture

Unlike standard approaches that throw a single massive backbone (e.g., EfficientNet-B7) at the problem, we hypothesized that the three target variables were generated by fundamentally different processes. We built three specialized solvers:

Factor	Target Variable	Type	The Solver (Model)	Key Insight
I	`label`	Classification	ConvNeXt Base	Semantic features (Object recognition).
II	`variable`	Regression	SVD (Linear Algebra)	The variable is the Effective Rank Ratio ($\sigma_1 / \sum \sigma_i$).
III	`hidden_label`	Classification	ResNet-18 (Scratch)	Geometric dependency between Red/Blue dots.

🛠️ Methodology & Experiments

Factor 1: The Object Class (Semantic)

Goal: Classify the background object (e.g., "Microscope", "Compass").
Challenge: The objects are heavily obscured by darkness and noise.
Solution: ConvNeXt Base (ImageNet weights).
Why it worked: ConvNeXt's modernized architecture (large kernels, layer norms) excelled at extracting semantic features from low-light, noisy environments where older ResNets struggled.

Factor 2: The Holt Metric (Mathematical)

Goal: Predict the continuous variable ($R^2=1.0$).
Failed Experiments:
- Deep Learning (CNNs): High RMSE. Failed to converge.
- Matrix Hack (Pixel Ridge Regression): $R^2 \approx 0.66$. Proved the variable wasn't pixel-position dependent.
The "Eureka" Moment: We analyzed the Singular Value Decomposition (SVD) of the image matrices.
The Formula: $$ \text{variable} = \frac{\sigma_1}{\sum_{i=1}^{N} \sigma_i} $$
- This ratio represents the "Spectral Concentration" of the image matrix—a global property measuring how much information is contained in the first singular value.
- Result: Perfect linear correlation ($R^2 = 1.0000$).

Factor 3: The Peralta Factor (Geometric)

Goal: Classify the hidden label (A, B, C, D) based on the Red and Blue dots.
The "Texture Bias" Problem: Pre-trained models (ResNet-50) failed (~96% acc) because they prioritized texture over geometry.
Failed Experiments:
- Hough Circle Transform: Failed due to pixelated noise and non-circular artifacts.
- YOLOv8: Overkill; detection boxes were unstable on 3x3 pixel dots.
The Solution:
- Architecture: ResNet-18 trained from scratch.
- Geometric Cleaning: We implemented a "Quadrant Split" that zeroed out noise, enforcing the rule that Blue $\in$ Top-Left and Red $\in$ Bottom-Right.
- Loss Function: We stuck to MSE (Mean Squared Error) over Quartic Error ($Error^4$) to avoid gradient explosion from outliers.

📊 Visual Evidence

1. The SVD Discovery

The graph below proves that the variable is a direct function of the matrix's singular values, not a visual feature.

2. The Texture Bias

A confusion matrix showing where pre-trained models failed (confusing Classes B and C) compared to our geometric-focused model.

💻 Installation & Usage

Prerequisites

Python 3.8+
PyTorch 2.0+
OpenCV, Scikit-Learn, Pandas

Running the Solution

Clone the repo:

git clone [https://github.com/GabaSatvik/DSG-BEGINNERS-HYPOTHESIS-2026.git](https://github.com/GabaSatvik/DSG-BEGINNERS-HYPOTHESIS-2026.git)

Download Data: Place the competition data in ./data/.
Run the Kernel:

  jupyter notebook solution.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
EXPERIMENTS		EXPERIMENTS
FINAL_SUBMISSION_NOTEBOOK		FINAL_SUBMISSION_NOTEBOOK
graph		graph
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌌 Kaggle Solution: Decoding the Black Box (Top 1%)

🧠 The "Trinity" Architecture

🛠️ Methodology & Experiments

Factor 1: The Object Class (Semantic)

Factor 2: The Holt Metric (Mathematical)

Factor 3: The Peralta Factor (Geometric)

📊 Visual Evidence

1. The SVD Discovery

2. The Texture Bias

💻 Installation & Usage

Prerequisites

Running the Solution

🤝 Acknowledgments

Competition Hosts: DSG-IIT ROORKEE.

Team: SATVIK 25125034

📝 License

MIT License. Free to use for educational purposes.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌌 Kaggle Solution: Decoding the Black Box (Top 1%)

🧠 The "Trinity" Architecture

🛠️ Methodology & Experiments

Factor 1: The Object Class (Semantic)

Factor 2: The Holt Metric (Mathematical)

Factor 3: The Peralta Factor (Geometric)

📊 Visual Evidence

1. The SVD Discovery

2. The Texture Bias

💻 Installation & Usage

Prerequisites

Running the Solution

🤝 Acknowledgments

Competition Hosts: DSG-IIT ROORKEE.

Team: SATVIK 25125034

📝 License

MIT License. Free to use for educational purposes.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages