[GSoC 2026] POC: Breeze-aware agent skill with context detection and drift enforcement#63661
[GSoC 2026] POC: Breeze-aware agent skill with context detection and drift enforcement#63661HARDIK-WEB-OSS wants to merge 8 commits intoapache:mainfrom
Conversation
|
@HARDIK-WEB-OSS Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.
See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
bd4b85e to
92ba099
Compare
|
Rebased on main. If there are still any failures showing, let me know which ones to fix — ready for feedback. |
c63d8fd to
b349874
Compare
b349874 to
666ea9a
Compare
- Add .github/skills/breeze-contribution/SKILL.md with structured workflow definitions for host vs container command selection - Add scripts/ci/prek/breeze_context_detect.py implementing the context detection API using filesystem markers derived from /opt/airflow and /.dockerenv (Breeze canonical mount points) - Command taxonomy sourced directly from developer_commands_config.py - Follows existing .github/skills/ path pattern from translation skill - Includes agent-skill-sync markers for future prek drift-detection hook Addresses Task 1 (environment detection) and Task 2 (workflow modeling) from apache#62500
- Add scripts/ci/prek/extract_agent_skills.py that parses agent-skill-sync markers from SKILL.md and writes skills.json - Add .github/skills/breeze-contribution/skills.json (generated) - Add scripts/ci/prek/test_breeze_agent_skills.py with 20 tests for marker parsing, extraction, drift detection, and context detection - extract_agent_skills.py --check mode exits 1 on drift, enabling prek hook enforcement (same pattern as update-breeze-cmd-output) Part of apache#62500
Add DX_REPORT.md documenting 4 concrete failure modes that the Breeze agent skill prevents, with exact commands and expected output Wire extract_agent_skills.py --check into .pre-commit-config.yaml as check-agent-skills-drift hook, triggered when SKILL.md or skills.json changes — same enforcement pattern as update-breeze-cmd-output Part of apache#62500
- Add agent-skill directives to contributing-docs/03_contributors_quick_start.rst - Update extract_agent_skills.py to parse RST contributing-docs - contributing-docs is now source of truth per Jarek/Jason guidance - Fix prek hook entry to use python3 Part of apache#62500
8a429a7 to
b3ea591
Compare
|
@HARDIK-WEB-OSS Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.
See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
b3ea591 to
037e14a
Compare
| def build_skills_json(skills: list[dict[str, str]]) -> dict: | ||
| """ | ||
| Build the full skills.json structure from extracted skill dicts. | ||
| """ | ||
| return { | ||
| "$schema": "breeze-agent-skills/v1", | ||
| "source": str(SKILL_MD), | ||
| "description": ( | ||
| "Auto-generated from agent-skill-sync markers in SKILL.md. " | ||
| "Do not edit manually — update SKILL.md markers instead." | ||
| ), | ||
| "skills": [ | ||
| { | ||
| "workflow": s["workflow"], | ||
| "host": s.get("host", ""), | ||
| "breeze": s.get("breeze", ""), | ||
| "fallback_condition": s.get("fallback", s.get("fallback_condition", "never")), | ||
| } | ||
| for s in skills | ||
| ], |
| files: > | ||
| (?x) | ||
| ^\.github/skills/breeze-contribution/SKILL\.md$| | ||
| ^\.github/skills/breeze-contribution/skills\.json$ |
| def _main() -> None: | ||
| import argparse | ||
|
|
||
| parser = argparse.ArgumentParser( | ||
| description="Detect Breeze context and get recommended agent skill commands" | ||
| ) | ||
| parser.add_argument("--workflow", choices=sorted(WORKFLOWS.keys())) | ||
| parser.add_argument("--test-path", default="{test_path}") | ||
| parser.add_argument("--distribution-folder", default="{distribution_folder}") | ||
| args = parser.parse_args() | ||
|
|
||
| context = get_context() | ||
| print(f"Context: {context.upper()}") | ||
| print() | ||
|
|
||
| if args.workflow: | ||
| result = get_command( | ||
| args.workflow, | ||
| test_path=args.test_path, | ||
| distribution_folder=args.distribution_folder, | ||
| ) | ||
| print(f"Workflow : {result['workflow']}") | ||
| print(f"Command : {result['command']}") | ||
| print(f"Note : {result['note']}") | ||
| else: | ||
| print("All workflows for this context:") | ||
| print("-" * 60) | ||
| for wf_name in sorted(WORKFLOWS.keys()): | ||
| result = get_command(wf_name) | ||
| print(f" {wf_name:<20} {result['command']}") |
| :context: host | ||
| :local: uv run --project {distribution_folder} pytest {test_path} -xvs | ||
| :breeze: pytest {test_path} -xvs | ||
| :prereqs: run-static-checks |
| "host": "", | ||
| "breeze": "prek", | ||
| "fallback_condition": "never" | ||
| }, | ||
| { | ||
| "workflow": "run-tests", | ||
| "host": "", |
| # Make scripts importable when running from repo root | ||
| sys.path.insert(0, str(Path(__file__).parent)) | ||
|
|
||
| from breeze_context_detect import get_command, is_inside_breeze | ||
| from extract_agent_skills import build_skills_json, check_drift, extract_skills, parse_marker |
| - id: check-agent-skills-drift | ||
| name: Check agent skills are in sync with SKILL.md | ||
| description: Fails if skills.json has drifted from agent-skill-sync markers in SKILL.md | ||
| entry: python3 scripts/ci/prek/extract_agent_skills.py --check |
| def build_skills_json(skills: list[dict[str, str]]) -> dict: | ||
| """ | ||
| Build the full skills.json structure from extracted skill dicts. | ||
| """ | ||
| return { | ||
| "$schema": "breeze-agent-skills/v1", | ||
| "source": str(SKILL_MD), | ||
| "description": ( | ||
| "Auto-generated from agent-skill-sync markers in SKILL.md. " | ||
| "Do not edit manually — update SKILL.md markers instead." | ||
| ), | ||
| "skills": [ | ||
| { | ||
| "workflow": s["workflow"], | ||
| "host": s.get("host", ""), | ||
| "breeze": s.get("breeze", ""), | ||
| "fallback_condition": s.get("fallback", s.get("fallback_condition", "never")), | ||
| } | ||
| for s in skills | ||
| ], | ||
| } | ||
|
|
||
|
|
||
| def write_skills_json(data: dict, output_path: Path) -> None: | ||
| """Write skills dict to JSON file with stable formatting.""" | ||
| output_path.parent.mkdir(parents=True, exist_ok=True) | ||
| output_path.write_text( | ||
| json.dumps(data, indent=2, ensure_ascii=False) + "\n", | ||
| encoding="utf-8", | ||
| ) | ||
|
|
||
|
|
||
| def check_drift(generated: dict, existing_path: Path) -> bool: | ||
| """ | ||
| Returns True if drift is detected (committed file differs from generated). | ||
| Returns False if they match. | ||
| """ | ||
| if not existing_path.exists(): | ||
| print(f"DRIFT: {existing_path} does not exist but should be generated.", file=sys.stderr) | ||
| return True | ||
|
|
||
| committed = json.loads(existing_path.read_text(encoding="utf-8")) | ||
|
|
||
| # Compare only the skills list — ignore metadata fields like description | ||
| if committed.get("skills") != generated.get("skills"): | ||
| print("DRIFT DETECTED: committed skills.json does not match SKILL.md markers.", file=sys.stderr) | ||
| print("Run: python3 scripts/ci/prek/extract_agent_skills.py", file=sys.stderr) | ||
| print("Then commit the updated skills.json.", file=sys.stderr) | ||
| return True |
| RST_SKILL_RE = re.compile(r"[.][.] agent-skill::[ ]*\n(?P<fields>(?:[ ]{3}:[^:]+: .+\n)+)", re.MULTILINE) | ||
| RST_FIELD_RE = re.compile(r"[ ]{3}:([^:]+): (.+)") | ||
|
|
||
|
|
||
| def extract_skills_from_rst(rst_path: Path) -> list[dict[str, str]]: | ||
| """Extract agent-skill directives from RST contributing docs.""" | ||
| if not rst_path.exists(): | ||
| return [] | ||
| skills = [] | ||
| for match in RST_SKILL_RE.finditer(rst_path.read_text(encoding="utf-8")): | ||
| fields = dict(RST_FIELD_RE.findall(match.group("fields"))) | ||
| if "id" in fields: | ||
| fields["workflow"] = fields.pop("id") | ||
| skills.append(fields) | ||
| return skills |
| def _main() -> None: | ||
| import argparse | ||
|
|
- Fix prereqs field: run-static-checks -> static-checks - Fix local->host mapping in build_skills_json - Update DX_REPORT test count to avoid hardcoding - Regenerate skills.json with correct host commands
0cebbe1 to
20317f9
Compare
…s, python3->python
|
Addressed all Copilot review feedback — fixed local→host mapping, RST regex for empty fields, prereqs ID consistency, RST path in hook trigger, and python3→python for portability. |
POC for #62500 — Airflow Contribution & Verification Agent Skills (GSoC 2026)
What's included
scripts/ci/prek/breeze_context_detect.py— stdlib-only context detection. Priority chain:AIRFLOW_BREEZE_CONTAINER→/.containerenv→/.dockerenv→/opt/airflow. Callget_command('run-tests')and get the right command for your context..github/skills/breeze-contribution/SKILL.md— skill file at the existing.github/skills/path. Command taxonomy sourced directly fromdeveloper_commands_config.py.contributing-docs/03_contributors_quick_start.rst—.. agent-skill::directives embedded inline with the human docs. Contributing-docs as source of truth, per the discussion in [GSOC 2026] Airflow Contribution & Verification Agent Skills #62500.scripts/ci/prek/extract_agent_skills.py— parses RST directives, writesskills.json, exits 1 on drift (--checkmode). Mirrors theupdate-breeze-cmd-outputpattern at line 917 of.pre-commit-config.yaml.scripts/ci/prek/test_breeze_agent_skills.py— 29 tests including RST extraction, context detection, and full E2E pipeline..pre-commit-config.yaml—check-agent-skills-drifthook wired.Output
Part of GSoC 2026 application for #62500
Was generative AI tooling used to co-author this PR?
All logic, design decisions, and testing were done by me.
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.