Skip to content

[GSoC 2026] POC: Breeze-aware agent skill with context detection and drift enforcement#63661

Draft
HARDIK-WEB-OSS wants to merge 8 commits intoapache:mainfrom
HARDIK-WEB-OSS:feat/breeze-agent-skill-poc
Draft

[GSoC 2026] POC: Breeze-aware agent skill with context detection and drift enforcement#63661
HARDIK-WEB-OSS wants to merge 8 commits intoapache:mainfrom
HARDIK-WEB-OSS:feat/breeze-agent-skill-poc

Conversation

@HARDIK-WEB-OSS
Copy link
Copy Markdown
Contributor

@HARDIK-WEB-OSS HARDIK-WEB-OSS commented Mar 15, 2026

POC for #62500 — Airflow Contribution & Verification Agent Skills (GSoC 2026)

What's included

  • scripts/ci/prek/breeze_context_detect.py — stdlib-only context detection. Priority chain: AIRFLOW_BREEZE_CONTAINER/.containerenv/.dockerenv/opt/airflow. Call get_command('run-tests') and get the right command for your context.
  • .github/skills/breeze-contribution/SKILL.md — skill file at the existing .github/skills/ path. Command taxonomy sourced directly from developer_commands_config.py.
  • contributing-docs/03_contributors_quick_start.rst.. agent-skill:: directives embedded inline with the human docs. Contributing-docs as source of truth, per the discussion in [GSOC 2026] Airflow Contribution & Verification Agent Skills #62500.
  • scripts/ci/prek/extract_agent_skills.py — parses RST directives, writes skills.json, exits 1 on drift (--check mode). Mirrors the update-breeze-cmd-output pattern at line 917 of .pre-commit-config.yaml.
  • scripts/ci/prek/test_breeze_agent_skills.py — 29 tests including RST extraction, context detection, and full E2E pipeline.
  • .pre-commit-config.yamlcheck-agent-skills-drift hook wired.

Output

$ python3 scripts/ci/prek/extract_agent_skills.py --check
OK: skills.json is in sync with SKILL.md

$ pytest scripts/ci/prek/test_breeze_agent_skills.py
29 passed in 0.09s

Part of GSoC 2026 application for #62500

Was generative AI tooling used to co-author this PR?
  • Yes, Claude (Anthropic) was used to help structure and review code during development.
    All logic, design decisions, and testing were done by me.

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Mar 15, 2026

@HARDIK-WEB-OSS Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

  • Pre-commit / static checks: Failing: CI image checks / Static checks. Run prek run --from-ref main locally to find and fix issues. See Pre-commit / static checks docs.

Note: Your branch is 38 commits behind main. Please rebase and push again to get up-to-date CI results.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@potiuk potiuk marked this pull request as draft March 15, 2026 23:02
@HARDIK-WEB-OSS HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch 2 times, most recently from bd4b85e to 92ba099 Compare March 16, 2026 13:30
@HARDIK-WEB-OSS HARDIK-WEB-OSS marked this pull request as ready for review March 16, 2026 16:12
@HARDIK-WEB-OSS
Copy link
Copy Markdown
Contributor Author

Rebased on main. If there are still any failures showing, let me know which ones to fix — ready for feedback.

@HARDIK-WEB-OSS HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch 5 times, most recently from c63d8fd to b349874 Compare March 21, 2026 08:37
@HARDIK-WEB-OSS HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch from b349874 to 666ea9a Compare March 30, 2026 23:02
- Add .github/skills/breeze-contribution/SKILL.md with structured
  workflow definitions for host vs container command selection
- Add scripts/ci/prek/breeze_context_detect.py implementing the
  context detection API using filesystem markers derived from
  /opt/airflow and /.dockerenv (Breeze canonical mount points)
- Command taxonomy sourced directly from developer_commands_config.py
- Follows existing .github/skills/ path pattern from translation skill
- Includes agent-skill-sync markers for future prek drift-detection hook

Addresses Task 1 (environment detection) and Task 2 (workflow modeling)
from apache#62500
- Add scripts/ci/prek/extract_agent_skills.py that parses
  agent-skill-sync markers from SKILL.md and writes skills.json
- Add .github/skills/breeze-contribution/skills.json (generated)
- Add scripts/ci/prek/test_breeze_agent_skills.py with 20 tests for
  marker parsing, extraction, drift detection, and context detection
- extract_agent_skills.py --check mode exits 1 on drift, enabling
  prek hook enforcement (same pattern as update-breeze-cmd-output)

Part of apache#62500
Add DX_REPORT.md documenting 4 concrete failure modes that the
  Breeze agent skill prevents, with exact commands and expected output
Wire extract_agent_skills.py --check into .pre-commit-config.yaml
  as check-agent-skills-drift hook, triggered when SKILL.md or
  skills.json changes — same enforcement pattern as update-breeze-cmd-output
Part of apache#62500
- Add agent-skill directives to contributing-docs/03_contributors_quick_start.rst
- Update extract_agent_skills.py to parse RST contributing-docs
- contributing-docs is now source of truth per Jarek/Jason guidance
- Fix prek hook entry to use python3

Part of apache#62500
@HARDIK-WEB-OSS HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch from 8a429a7 to b3ea591 Compare March 31, 2026 11:30
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 1, 2026

@HARDIK-WEB-OSS Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

  • mypy (type checking): Failing: CI image checks / MyPy checks (mypy-providers). Run prek --stage manual mypy-providers --all-files locally to reproduce. You need breeze ci-image build --python 3.10 for Docker-based mypy. See mypy (type checking) docs.
  • Provider tests: Failing: provider distributions tests / Compat 2.11.1:P3.10:, provider distributions tests / Compat 3.0.6:P3.10:, provider distributions tests / Compat 3.1.8:P3.10:, Non-DB tests: providers / Non-DB-prov::3.10:-amazon,celer...standard, Special tests / Latest Boto test: providers / All-prov:LatestBoto-Postgres:14:3.10:-amazon,celer...standard (+1 more). Run provider tests with breeze run pytest <provider-test-path> -xvs. See Provider tests docs.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@potiuk potiuk marked this pull request as draft April 1, 2026 11:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@HARDIK-WEB-OSS HARDIK-WEB-OSS marked this pull request as ready for review April 12, 2026 13:55
@HARDIK-WEB-OSS HARDIK-WEB-OSS requested a review from Copilot April 12, 2026 16:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.

Comment on lines +122 to +141
def build_skills_json(skills: list[dict[str, str]]) -> dict:
"""
Build the full skills.json structure from extracted skill dicts.
"""
return {
"$schema": "breeze-agent-skills/v1",
"source": str(SKILL_MD),
"description": (
"Auto-generated from agent-skill-sync markers in SKILL.md. "
"Do not edit manually — update SKILL.md markers instead."
),
"skills": [
{
"workflow": s["workflow"],
"host": s.get("host", ""),
"breeze": s.get("breeze", ""),
"fallback_condition": s.get("fallback", s.get("fallback_condition", "never")),
}
for s in skills
],
Comment thread .pre-commit-config.yaml Outdated
files: >
(?x)
^\.github/skills/breeze-contribution/SKILL\.md$|
^\.github/skills/breeze-contribution/skills\.json$
Comment on lines +127 to +156
def _main() -> None:
import argparse

parser = argparse.ArgumentParser(
description="Detect Breeze context and get recommended agent skill commands"
)
parser.add_argument("--workflow", choices=sorted(WORKFLOWS.keys()))
parser.add_argument("--test-path", default="{test_path}")
parser.add_argument("--distribution-folder", default="{distribution_folder}")
args = parser.parse_args()

context = get_context()
print(f"Context: {context.upper()}")
print()

if args.workflow:
result = get_command(
args.workflow,
test_path=args.test_path,
distribution_folder=args.distribution_folder,
)
print(f"Workflow : {result['workflow']}")
print(f"Command : {result['command']}")
print(f"Note : {result['note']}")
else:
print("All workflows for this context:")
print("-" * 60)
for wf_name in sorted(WORKFLOWS.keys()):
result = get_command(wf_name)
print(f" {wf_name:<20} {result['command']}")
:context: host
:local: uv run --project {distribution_folder} pytest {test_path} -xvs
:breeze: pytest {test_path} -xvs
:prereqs: run-static-checks
Comment on lines +8 to +14
"host": "",
"breeze": "prek",
"fallback_condition": "never"
},
{
"workflow": "run-tests",
"host": "",
Comment on lines +36 to +40
# Make scripts importable when running from repo root
sys.path.insert(0, str(Path(__file__).parent))

from breeze_context_detect import get_command, is_inside_breeze
from extract_agent_skills import build_skills_json, check_drift, extract_skills, parse_marker
Comment thread .pre-commit-config.yaml Outdated
- id: check-agent-skills-drift
name: Check agent skills are in sync with SKILL.md
description: Fails if skills.json has drifted from agent-skill-sync markers in SKILL.md
entry: python3 scripts/ci/prek/extract_agent_skills.py --check
Comment on lines +122 to +170
def build_skills_json(skills: list[dict[str, str]]) -> dict:
"""
Build the full skills.json structure from extracted skill dicts.
"""
return {
"$schema": "breeze-agent-skills/v1",
"source": str(SKILL_MD),
"description": (
"Auto-generated from agent-skill-sync markers in SKILL.md. "
"Do not edit manually — update SKILL.md markers instead."
),
"skills": [
{
"workflow": s["workflow"],
"host": s.get("host", ""),
"breeze": s.get("breeze", ""),
"fallback_condition": s.get("fallback", s.get("fallback_condition", "never")),
}
for s in skills
],
}


def write_skills_json(data: dict, output_path: Path) -> None:
"""Write skills dict to JSON file with stable formatting."""
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(
json.dumps(data, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)


def check_drift(generated: dict, existing_path: Path) -> bool:
"""
Returns True if drift is detected (committed file differs from generated).
Returns False if they match.
"""
if not existing_path.exists():
print(f"DRIFT: {existing_path} does not exist but should be generated.", file=sys.stderr)
return True

committed = json.loads(existing_path.read_text(encoding="utf-8"))

# Compare only the skills list — ignore metadata fields like description
if committed.get("skills") != generated.get("skills"):
print("DRIFT DETECTED: committed skills.json does not match SKILL.md markers.", file=sys.stderr)
print("Run: python3 scripts/ci/prek/extract_agent_skills.py", file=sys.stderr)
print("Then commit the updated skills.json.", file=sys.stderr)
return True
Comment thread scripts/ci/prek/extract_agent_skills.py Outdated
Comment on lines +86 to +100
RST_SKILL_RE = re.compile(r"[.][.] agent-skill::[ ]*\n(?P<fields>(?:[ ]{3}:[^:]+: .+\n)+)", re.MULTILINE)
RST_FIELD_RE = re.compile(r"[ ]{3}:([^:]+): (.+)")


def extract_skills_from_rst(rst_path: Path) -> list[dict[str, str]]:
"""Extract agent-skill directives from RST contributing docs."""
if not rst_path.exists():
return []
skills = []
for match in RST_SKILL_RE.finditer(rst_path.read_text(encoding="utf-8")):
fields = dict(RST_FIELD_RE.findall(match.group("fields")))
if "id" in fields:
fields["workflow"] = fields.pop("id")
skills.append(fields)
return skills
Comment on lines +127 to +129
def _main() -> None:
import argparse

- Fix prereqs field: run-static-checks -> static-checks
- Fix local->host mapping in build_skills_json
- Update DX_REPORT test count to avoid hardcoding
- Regenerate skills.json with correct host commands
@HARDIK-WEB-OSS HARDIK-WEB-OSS force-pushed the feat/breeze-agent-skill-poc branch from 0cebbe1 to 20317f9 Compare April 12, 2026 19:04
@HARDIK-WEB-OSS
Copy link
Copy Markdown
Contributor Author

Addressed all Copilot review feedback — fixed local→host mapping, RST regex for empty fields, prereqs ID consistency, RST path in hook trigger, and python3→python for portability.

@jason810496 jason810496 marked this pull request as draft April 21, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants