[GSoC 2026] POC: Breeze-aware agent skill with context detection and drift enforcement by HARDIK-WEB-OSS · Pull Request #63661 · apache/airflow

HARDIK-WEB-OSS · 2026-03-15T15:06:50Z

POC for #62500 — Airflow Contribution & Verification Agent Skills (GSoC 2026)

What's included

scripts/ci/prek/breeze_context_detect.py — stdlib-only context detection. Priority chain: AIRFLOW_BREEZE_CONTAINER → /.containerenv → /.dockerenv → /opt/airflow. Call get_command('run-tests') and get the right command for your context.
.github/skills/breeze-contribution/SKILL.md — skill file at the existing .github/skills/ path. Command taxonomy sourced directly from developer_commands_config.py.
contributing-docs/03_contributors_quick_start.rst — .. agent-skill:: directives embedded inline with the human docs. Contributing-docs as source of truth, per the discussion in [GSOC 2026] Airflow Contribution & Verification Agent Skills #62500.
scripts/ci/prek/extract_agent_skills.py — parses RST directives, writes skills.json, exits 1 on drift (--check mode). Mirrors the update-breeze-cmd-output pattern at line 917 of .pre-commit-config.yaml.
scripts/ci/prek/test_breeze_agent_skills.py — 29 tests including RST extraction, context detection, and full E2E pipeline.
.pre-commit-config.yaml — check-agent-skills-drift hook wired.

Output

$ python3 scripts/ci/prek/extract_agent_skills.py --check
OK: skills.json is in sync with SKILL.md

$ pytest scripts/ci/prek/test_breeze_agent_skills.py
29 passed in 0.09s

Part of GSoC 2026 application for #62500

Was generative AI tooling used to co-author this PR?

Yes, Claude (Anthropic) was used to help structure and review code during development.
All logic, design decisions, and testing were done by me.

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

potiuk · 2026-03-15T23:02:59Z

@HARDIK-WEB-OSS Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

❌ Pre-commit / static checks: Failing: CI image checks / Static checks. Run prek run --from-ref main locally to find and fix issues. See Pre-commit / static checks docs.

Note: Your branch is 38 commits behind main. Please rebase and push again to get up-to-date CI results.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.

Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

HARDIK-WEB-OSS · 2026-03-16T16:12:39Z

Rebased on main. If there are still any failures showing, let me know which ones to fix — ready for feedback.

- Add .github/skills/breeze-contribution/SKILL.md with structured workflow definitions for host vs container command selection - Add scripts/ci/prek/breeze_context_detect.py implementing the context detection API using filesystem markers derived from /opt/airflow and /.dockerenv (Breeze canonical mount points) - Command taxonomy sourced directly from developer_commands_config.py - Follows existing .github/skills/ path pattern from translation skill - Includes agent-skill-sync markers for future prek drift-detection hook Addresses Task 1 (environment detection) and Task 2 (workflow modeling) from apache#62500

- Add scripts/ci/prek/extract_agent_skills.py that parses agent-skill-sync markers from SKILL.md and writes skills.json - Add .github/skills/breeze-contribution/skills.json (generated) - Add scripts/ci/prek/test_breeze_agent_skills.py with 20 tests for marker parsing, extraction, drift detection, and context detection - extract_agent_skills.py --check mode exits 1 on drift, enabling prek hook enforcement (same pattern as update-breeze-cmd-output) Part of apache#62500

Add DX_REPORT.md documenting 4 concrete failure modes that the Breeze agent skill prevents, with exact commands and expected output Wire extract_agent_skills.py --check into .pre-commit-config.yaml as check-agent-skills-drift hook, triggered when SKILL.md or skills.json changes — same enforcement pattern as update-breeze-cmd-output Part of apache#62500

- Add agent-skill directives to contributing-docs/03_contributors_quick_start.rst - Update extract_agent_skills.py to parse RST contributing-docs - contributing-docs is now source of truth per Jarek/Jason guidance - Fix prek hook entry to use python3 Part of apache#62500

potiuk · 2026-04-01T11:04:42Z

@HARDIK-WEB-OSS Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

❌ mypy (type checking): Failing: CI image checks / MyPy checks (mypy-providers). Run prek --stage manual mypy-providers --all-files locally to reproduce. You need breeze ci-image build --python 3.10 for Docker-based mypy. See mypy (type checking) docs.
❌ Provider tests: Failing: provider distributions tests / Compat 2.11.1:P3.10:, provider distributions tests / Compat 3.0.6:P3.10:, provider distributions tests / Compat 3.1.8:P3.10:, Non-DB tests: providers / Non-DB-prov::3.10:-amazon,celer...standard, Special tests / Latest Boto test: providers / All-prov:LatestBoto-Postgres:14:3.10:-amazon,celer...standard (+1 more). Run provider tests with breeze run pytest <provider-test-path> -xvs. See Provider tests docs.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.

Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

…s, RST pipeline)

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.

+def build_skills_json(skills: list[dict[str, str]]) -> dict:
+    """
+    Build the full skills.json structure from extracted skill dicts.
+    """
+    return {
+        "$schema": "breeze-agent-skills/v1",
+        "source": str(SKILL_MD),
+        "description": (
+            "Auto-generated from agent-skill-sync markers in SKILL.md. "
+            "Do not edit manually — update SKILL.md markers instead."
+        ),
+        "skills": [
+            {
+                "workflow": s["workflow"],
+                "host": s.get("host", ""),
+                "breeze": s.get("breeze", ""),
+                "fallback_condition": s.get("fallback", s.get("fallback_condition", "never")),
+            }
+            for s in skills
+        ],


+        files: >
+          (?x)
+          ^\.github/skills/breeze-contribution/SKILL\.md$|
+          ^\.github/skills/breeze-contribution/skills\.json$


+def _main() -> None:
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description="Detect Breeze context and get recommended agent skill commands"
+    )
+    parser.add_argument("--workflow", choices=sorted(WORKFLOWS.keys()))
+    parser.add_argument("--test-path", default="{test_path}")
+    parser.add_argument("--distribution-folder", default="{distribution_folder}")
+    args = parser.parse_args()
+
+    context = get_context()
+    print(f"Context: {context.upper()}")
+    print()
+
+    if args.workflow:
+        result = get_command(
+            args.workflow,
+            test_path=args.test_path,
+            distribution_folder=args.distribution_folder,
+        )
+        print(f"Workflow : {result['workflow']}")
+        print(f"Command  : {result['command']}")
+        print(f"Note     : {result['note']}")
+    else:
+        print("All workflows for this context:")
+        print("-" * 60)
+        for wf_name in sorted(WORKFLOWS.keys()):
+            result = get_command(wf_name)
+            print(f"  {wf_name:<20} {result['command']}")


+   :context: host
+   :local: uv run --project {distribution_folder} pytest {test_path} -xvs
+   :breeze: pytest {test_path} -xvs
+   :prereqs: run-static-checks


+      "host": "",
+      "breeze": "prek",
+      "fallback_condition": "never"
+    },
+    {
+      "workflow": "run-tests",
+      "host": "",


+# Make scripts importable when running from repo root
+sys.path.insert(0, str(Path(__file__).parent))
+
+from breeze_context_detect import get_command, is_inside_breeze
+from extract_agent_skills import build_skills_json, check_drift, extract_skills, parse_marker


+      - id: check-agent-skills-drift
+        name: Check agent skills are in sync with SKILL.md
+        description: Fails if skills.json has drifted from agent-skill-sync markers in SKILL.md
+        entry: python3 scripts/ci/prek/extract_agent_skills.py --check


+def build_skills_json(skills: list[dict[str, str]]) -> dict:
+    """
+    Build the full skills.json structure from extracted skill dicts.
+    """
+    return {
+        "$schema": "breeze-agent-skills/v1",
+        "source": str(SKILL_MD),
+        "description": (
+            "Auto-generated from agent-skill-sync markers in SKILL.md. "
+            "Do not edit manually — update SKILL.md markers instead."
+        ),
+        "skills": [
+            {
+                "workflow": s["workflow"],
+                "host": s.get("host", ""),
+                "breeze": s.get("breeze", ""),
+                "fallback_condition": s.get("fallback", s.get("fallback_condition", "never")),
+            }
+            for s in skills
+        ],
+    }
+
+
+def write_skills_json(data: dict, output_path: Path) -> None:
+    """Write skills dict to JSON file with stable formatting."""
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    output_path.write_text(
+        json.dumps(data, indent=2, ensure_ascii=False) + "\n",
+        encoding="utf-8",
+    )
+
+
+def check_drift(generated: dict, existing_path: Path) -> bool:
+    """
+    Returns True if drift is detected (committed file differs from generated).
+    Returns False if they match.
+    """
+    if not existing_path.exists():
+        print(f"DRIFT: {existing_path} does not exist but should be generated.", file=sys.stderr)
+        return True
+
+    committed = json.loads(existing_path.read_text(encoding="utf-8"))
+
+    # Compare only the skills list — ignore metadata fields like description
+    if committed.get("skills") != generated.get("skills"):
+        print("DRIFT DETECTED: committed skills.json does not match SKILL.md markers.", file=sys.stderr)
+        print("Run: python3 scripts/ci/prek/extract_agent_skills.py", file=sys.stderr)
+        print("Then commit the updated skills.json.", file=sys.stderr)
+        return True


+RST_SKILL_RE = re.compile(r"[.][.] agent-skill::[ ]*\n(?P<fields>(?:[ ]{3}:[^:]+: .+\n)+)", re.MULTILINE)
+RST_FIELD_RE = re.compile(r"[ ]{3}:([^:]+): (.+)")
+
+
+def extract_skills_from_rst(rst_path: Path) -> list[dict[str, str]]:
+    """Extract agent-skill directives from RST contributing docs."""
+    if not rst_path.exists():
+        return []
+    skills = []
+    for match in RST_SKILL_RE.finditer(rst_path.read_text(encoding="utf-8")):
+        fields = dict(RST_FIELD_RE.findall(match.group("fields")))
+        if "id" in fields:
+            fields["workflow"] = fields.pop("id")
+            skills.append(fields)
+    return skills


+def _main() -> None:
+    import argparse
+


- Fix prereqs field: run-static-checks -> static-checks - Fix local->host mapping in build_skills_json - Update DX_REPORT test count to avoid hardcoding - Regenerate skills.json with correct host commands

…s, python3->python

HARDIK-WEB-OSS · 2026-04-20T19:09:32Z

Addressed all Copilot review feedback — fixed local→host mapping, RST regex for empty fields, prereqs ID consistency, RST path in hook trigger, and python3→python for portability.

HARDIK-WEB-OSS requested review from amoghrajesh, ashb, bugraoz93, choo121600, gopidesupavan, jason810496, jscheffl, kaxil, potiuk and shahar1 as code owners March 15, 2026 15:06

boring-cyborg Bot added area:dev-tools backport-to-v3-1-test labels Mar 15, 2026

potiuk marked this pull request as draft March 15, 2026 23:02

HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch 2 times, most recently from bd4b85e to 92ba099 Compare March 16, 2026 13:30

HARDIK-WEB-OSS marked this pull request as ready for review March 16, 2026 16:12

HARDIK-WEB-OSS mentioned this pull request Mar 16, 2026

[GSOC 2026] Airflow Contribution & Verification Agent Skills #62500

Open

1 task

HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch 5 times, most recently from c63d8fd to b349874 Compare March 21, 2026 08:37

HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch from b349874 to 666ea9a Compare March 30, 2026 23:02

HARDIK-WEB-OSS added 5 commits March 31, 2026 11:29

test: add RST extraction tests and E2E pipeline test, now 24 passing

979a534

HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch from 8a429a7 to b3ea591 Compare March 31, 2026 11:30

potiuk marked this pull request as draft April 1, 2026 11:04

tests: extend agent skills coverage to 29 tests (Podman, new workflow…

037e14a

…s, RST pipeline)

HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch from b3ea591 to 037e14a Compare April 2, 2026 10:07

pierrejeambrun removed the backport-to-v3-1-test label Apr 8, 2026

kaxil requested a review from Copilot April 10, 2026 19:55

Copilot AI reviewed Apr 10, 2026

View reviewed changes

HARDIK-WEB-OSS marked this pull request as ready for review April 12, 2026 13:55

HARDIK-WEB-OSS requested a review from Copilot April 12, 2026 16:17

Copilot started reviewing on behalf of HARDIK-WEB-OSS April 12, 2026 16:18 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

fix: address Copilot review feedback

20317f9

- Fix prereqs field: run-static-checks -> static-checks - Fix local->host mapping in build_skills_json - Update DX_REPORT test count to avoid hardcoding - Regenerate skills.json with correct host commands

HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch from 0cebbe1 to 20317f9 Compare April 12, 2026 19:04

fix: remaining Copilot feedback - RST path in hook, regex empty field…

74f74bd

…s, python3->python

jason810496 marked this pull request as draft April 21, 2026 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC 2026] POC: Breeze-aware agent skill with context detection and drift enforcement#63661

[GSoC 2026] POC: Breeze-aware agent skill with context detection and drift enforcement#63661
HARDIK-WEB-OSS wants to merge 8 commits intoapache:mainfrom
HARDIK-WEB-OSS:feat/breeze-agent-skill-poc

HARDIK-WEB-OSS commented Mar 15, 2026 •

edited

Loading

Uh oh!

potiuk commented Mar 15, 2026 •

edited

Loading

Uh oh!

HARDIK-WEB-OSS commented Mar 16, 2026

Uh oh!

potiuk commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

HARDIK-WEB-OSS commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

HARDIK-WEB-OSS commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

POC for #62500 — Airflow Contribution & Verification Agent Skills (GSoC 2026)

What's included

Output

Part of GSoC 2026 application for #62500

Was generative AI tooling used to co-author this PR?

Uh oh!

potiuk commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HARDIK-WEB-OSS commented Mar 16, 2026

Uh oh!

potiuk commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

HARDIK-WEB-OSS commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HARDIK-WEB-OSS commented Mar 15, 2026 •

edited

Loading

potiuk commented Mar 15, 2026 •

edited

Loading

potiuk commented Apr 1, 2026 •

edited

Loading