Skip to content

Commit cadcf65

Browse files
committed
SY-3690: Refactor Integration Test Infrastructure (#2099)
Large-scale refactor of the integration test framework and CI pipeline. The monolithic `test_conductor.py` (~1400 lines) has been decomposed into focused, single-responsibility modules: - **config_client.py** — Discovers, loads, filters, and expands test configurations from `*_tests.json` files, including parameter matrix expansion and test class discovery - **execution_client.py** — Manages test lifecycle (setup, run, teardown) with sequencing modes (sequential, random, asynchronous) and thread pool management - **telemetry_client.py** — Streams real-time test metrics (pass/fail counts, timing) to Synnax channels - **report_client.py** — Generates JSON and human-readable test result reports - **log_client.py** — Structured logging with optional Synnax channel output - **models.py** — Shared data models (TestResult, TestStatus, etc.) - **target_filter.py** — Flexible test filtering with substring matching at file, sequence, and case levels. Supports comma-separated file targets (e.g., `driver,control,latency`) The `TestCase` base class has been simplified — Synnax client connection management moved into the framework, and dead code removed. All ~50 test case files updated with minor import changes. Integration tests now run concurrently across multiple runners instead of sequentially on a single runner per OS: - **New reusable workflow** (`test.integration.worker.yaml`) — Handles the full test lifecycle for a single matrix entry: artifact download, process cleanup, core startup, test execution, and artifact upload - **Matrix strategy** — Tests split into 3 groups (arc, console, driver) running in parallel on both Windows and Ubuntu (6 concurrent runners) - **Single-source-of-truth matrix** (`validate_test_coverage.sh`) — Defines the test groups, validates all `*_tests.json` files are covered, and outputs the matrix JSON for GitHub Actions - **Custom flags support** — When the `FLAGS` workflow input is provided, the matrix collapses to a single worker per OS, passing flags directly to test-conductor (e.g., `uv run tc console/lifecycle` or `uv run tc -f channel_life`) - Added retry mechanism for flaky tests - Integrated structured logging to Synnax channels - Improved failure output artifacts
1 parent de91a24 commit cadcf65

78 files changed

Lines changed: 2729 additions & 2133 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
name: Integration Test Run
2+
3+
on:
4+
workflow_call:
5+
inputs:
6+
test_matrix:
7+
description: "JSON test matrix (e.g., from validate_test_coverage.sh)"
8+
type: string
9+
required: true
10+
ref_run_id:
11+
description: "Reference run ID for downloading build artifacts"
12+
type: string
13+
required: true
14+
os:
15+
description: "Runner OS label (e.g., 'windows-latest', 'ubuntu-latest')"
16+
type: string
17+
required: true
18+
flags:
19+
description: "Additional flags to pass to test-conductor"
20+
type: string
21+
required: false
22+
default: ""
23+
24+
jobs:
25+
run:
26+
name: ${{ matrix.name }}
27+
timeout-minutes: 30
28+
runs-on:
29+
- self-hosted
30+
- ${{ inputs.os }}
31+
strategy:
32+
fail-fast: false
33+
matrix: ${{ fromJSON(inputs.test_matrix) }}
34+
steps:
35+
- name: Checkout Repository
36+
uses: actions/checkout@v6
37+
with:
38+
clean: false
39+
40+
- name: Force Quit Existing Synnax Processes (Windows)
41+
if: runner.os == 'Windows'
42+
shell: cmd
43+
run: integration/scripts/KillSynnaxProcessesWindows.cmd
44+
45+
- name: Force Quit Existing Synnax Processes (Unix)
46+
if: runner.os == 'Linux'
47+
continue-on-error: true
48+
run: integration/scripts/kill_synnax_processes_unix.sh
49+
50+
- name: Download and Setup Artifacts (Windows)
51+
if: runner.os == 'Windows'
52+
shell: cmd
53+
env:
54+
GH_TOKEN: ${{ github.token }}
55+
REF_RUN_ID: ${{ inputs.ref_run_id }}
56+
run: integration/scripts/DownloadArtifactsWindows.cmd
57+
58+
- name: Download and Setup Artifacts (Unix)
59+
if: runner.os == 'Linux'
60+
env:
61+
GH_TOKEN: ${{ github.token }}
62+
REF_RUN_ID: ${{ inputs.ref_run_id }}
63+
run: integration/scripts/download_artifacts_unix.sh
64+
65+
- name: Update Submodules
66+
run: git submodule update --init --recursive
67+
68+
- name: Install uv
69+
uses: astral-sh/setup-uv@v7
70+
71+
- name: Set up Python
72+
uses: actions/setup-python@v6
73+
with:
74+
python-version-file: integration/pyproject.toml
75+
76+
- name: Install Playwright (Windows)
77+
if: runner.os == 'Windows'
78+
shell: cmd
79+
working-directory: integration
80+
run: uv run playwright install --with-deps
81+
82+
- name: Install Playwright (Unix)
83+
if: runner.os == 'Linux'
84+
working-directory: integration
85+
run: uv run playwright install --with-deps
86+
87+
- name: Start Core (Windows)
88+
if: runner.os == 'Windows'
89+
timeout-minutes: 1
90+
shell: powershell
91+
env:
92+
SYNNAX_LICENSE_KEY: ${{ secrets.SYNNAX_LICENSE_KEY }}
93+
run: integration/scripts/StartCoreWindows.ps1
94+
95+
- name: Start Core (Unix)
96+
if: runner.os == 'Linux'
97+
timeout-minutes: 1
98+
env:
99+
SYNNAX_LICENSE_KEY: ${{ secrets.SYNNAX_LICENSE_KEY }}
100+
run: integration/scripts/start_core_unix.sh
101+
102+
- name: Test Conductor (Windows)
103+
if: runner.os == 'Windows'
104+
id: test-conductor-win
105+
timeout-minutes: 30
106+
shell: cmd
107+
working-directory: integration
108+
env:
109+
PYTHONIOENCODING: utf-8
110+
run: >-
111+
uv run test-conductor --name tc_win ${{ matrix.target || inputs.flags }}
112+
113+
- name: Test Conductor (Unix)
114+
if: runner.os == 'Linux'
115+
id: test-conductor-unix
116+
timeout-minutes: 30
117+
working-directory: integration
118+
run: >-
119+
uv run test-conductor --name tc_unix ${{ matrix.target || inputs.flags }}
120+
121+
- name: Upload Test Results
122+
uses: actions/upload-artifact@v6
123+
if: always()
124+
with:
125+
name: test-results-${{ inputs.os }}-${{ matrix.name }}
126+
path: integration/tests/results/*
127+
retention-days: 7
128+
129+
- name: Upload Core Logs
130+
uses: actions/upload-artifact@v6
131+
if: ${{ failure() || cancelled() }}
132+
with:
133+
name:
134+
core-logs-${{ inputs.os }}-${{ matrix.name }}-attempt-${{ github.run_attempt
135+
}}
136+
path: ~/synnax-data/synnax-core.log
137+
retention-days: 7

.github/workflows/test.integration.yaml

Lines changed: 32 additions & 135 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,9 @@ on:
1212
paths:
1313
- .bazeliskrc
1414
- .bazelrc
15-
- .github/workflows/test.integration.yaml
1615
- .github/workflows/build.synnax.yaml
16+
- .github/workflows/test.integration.yaml
17+
- .github/workflows/test.integration.worker.yaml
1718
- alamos/go/**
1819
- alamos/py/**
1920
- alamos/ts/**
@@ -90,6 +91,9 @@ jobs:
9091
${{ steps.update-outputs.outputs.REF_RUN_ID ||
9192
steps.set-outputs.outputs.REF_RUN_ID }}
9293
FLAGS: ${{ steps.set-outputs.outputs.FLAGS }}
94+
TEST_MATRIX:
95+
${{ steps.set-outputs.outputs.CUSTOM_MATRIX ||
96+
steps.test-matrix.outputs.TEST_MATRIX }}
9397
steps:
9498
- name: Checkout Repository
9599
uses: actions/checkout@v6
@@ -116,6 +120,10 @@ jobs:
116120
run: |
117121
echo "FLAGS=${FLAGS}" >> $GITHUB_OUTPUT
118122
echo "SKIP_BUILD=${SKIP_BUILD}" >> $GITHUB_OUTPUT
123+
if [ -n "${FLAGS}" ]; then
124+
SAFE_NAME=$(echo "${FLAGS}" | sed 's/[^a-zA-Z0-9_-]/_/g')
125+
echo "CUSTOM_MATRIX={\"include\":[{\"name\":\"${SAFE_NAME}\",\"target\":\"\"}]}" >> $GITHUB_OUTPUT
126+
fi
119127
if [ "${SKIP_BUILD}" = "true" ]; then
120128
if [ -z "${REF_RUN_ID}" ]; then
121129
echo "Empty REF_RUN_ID. Setting SKIP_BUILD to false."
@@ -145,6 +153,11 @@ jobs:
145153
echo "REF_RUN_ID=${{ github.run_id }}" >> $GITHUB_OUTPUT
146154
fi
147155
156+
- name: Validate Test Coverage
157+
if: ${{ steps.set-outputs.outputs.FLAGS == '' }}
158+
id: test-matrix
159+
run: integration/scripts/validate_test_coverage.sh
160+
148161
build:
149162
needs: setup
150163
if: needs.setup.outputs.SKIP_BUILD == 'false'
@@ -163,146 +176,30 @@ jobs:
163176
version: ${{ needs.setup.outputs.VERSION }}
164177
secrets: inherit
165178

166-
deploy-and-test-windows:
167-
name: Deploy and Test (windows)
168-
timeout-minutes: 30
169-
runs-on:
170-
- self-hosted
171-
- windows-latest
179+
test-windows:
180+
name: Test (windows)
172181
needs: [setup, build]
173182
if:
174183
${{ !cancelled() && needs.setup.result == 'success' && (needs.build.result ==
175184
'success' || needs.build.result == 'skipped') }}
176-
steps:
177-
- name: Checkout Repository
178-
uses: actions/checkout@v6
179-
with:
180-
clean: false
181-
182-
- name: Force Quit Existing Synnax Processes (Windows)
183-
shell: cmd
184-
run: integration/scripts/KillSynnaxProcessesWindows.cmd
185-
186-
- name: Download and Setup Windows Artifacts
187-
shell: cmd
188-
env:
189-
GH_TOKEN: ${{ github.token }}
190-
REF_RUN_ID: ${{ needs.setup.outputs.REF_RUN_ID }}
191-
run: integration/scripts/DownloadArtifactsWindows.cmd
192-
193-
- name: Update Submodules
194-
run: git submodule update --init --recursive
195-
196-
- name: Install uv
197-
uses: astral-sh/setup-uv@v7
198-
199-
- name: Set up Python
200-
uses: actions/setup-python@v6
201-
with:
202-
python-version-file: integration/pyproject.toml
203-
204-
- name: Install Playwright
205-
shell: cmd
206-
working-directory: integration
207-
run: uv run playwright install --with-deps
208-
209-
- name: Start Core
210-
timeout-minutes: 1
211-
shell: powershell
212-
env:
213-
SYNNAX_LICENSE_KEY: ${{ secrets.SYNNAX_LICENSE_KEY }}
214-
run: integration/scripts/StartCoreWindows.ps1
215-
216-
- name: Test Conductor
217-
id: test-conductor
218-
timeout-minutes: 30
219-
shell: cmd
220-
working-directory: integration
221-
env:
222-
PYTHONIOENCODING: utf-8
223-
run: uv run test-conductor --name tc_win ${{ needs.setup.outputs.FLAGS }}
224-
225-
- name: Upload Test Results
226-
uses: actions/upload-artifact@v6
227-
if: always()
228-
with:
229-
name: test-results-windows
230-
path: integration/tests/results/*
231-
retention-days: 7
232-
233-
- name: Upload Core Logs
234-
uses: actions/upload-artifact@v6
235-
if: ${{ failure() || cancelled() }}
236-
with:
237-
name: core-logs-windows-attempt-${{ github.run_attempt }}
238-
path: ~/synnax-data/synnax-core.log
239-
retention-days: 7
185+
uses: ./.github/workflows/test.integration.worker.yaml
186+
with:
187+
test_matrix: ${{ needs.setup.outputs.TEST_MATRIX }}
188+
ref_run_id: ${{ needs.setup.outputs.REF_RUN_ID }}
189+
os: windows-latest
190+
flags: ${{ needs.setup.outputs.FLAGS }}
191+
secrets: inherit
240192

241-
deploy-and-test-ubuntu:
242-
name: Deploy and Test (ubuntu)
243-
timeout-minutes: 30
244-
runs-on:
245-
- self-hosted
246-
- ubuntu-latest
193+
test-ubuntu:
194+
name: Test (ubuntu)
247195
needs: [setup, build]
248196
if:
249197
${{ !cancelled() && needs.setup.result == 'success' && (needs.build.result ==
250198
'success' || needs.build.result == 'skipped') }}
251-
steps:
252-
- name: Checkout Repository
253-
uses: actions/checkout@v6
254-
with:
255-
clean: false
256-
257-
- name: Force Quit Existing Synnax Processes
258-
continue-on-error: true
259-
run: integration/scripts/kill_synnax_processes_unix.sh
260-
261-
- name: Download and Setup Linux Artifacts
262-
env:
263-
GH_TOKEN: ${{ github.token }}
264-
REF_RUN_ID: ${{ needs.setup.outputs.REF_RUN_ID }}
265-
run: integration/scripts/download_artifacts_unix.sh
266-
267-
- name: Update Submodules
268-
run: git submodule update --init --recursive
269-
270-
- name: Install uv
271-
uses: astral-sh/setup-uv@v7
272-
273-
- name: Set up Python
274-
uses: actions/setup-python@v6
275-
with:
276-
python-version-file: integration/pyproject.toml
277-
278-
- name: Install Playwright
279-
working-directory: integration
280-
run: uv run playwright install --with-deps
281-
282-
- name: Start Core
283-
timeout-minutes: 1
284-
env:
285-
SYNNAX_LICENSE_KEY: ${{ secrets.SYNNAX_LICENSE_KEY }}
286-
run: integration/scripts/start_core_unix.sh
287-
288-
- name: Test Conductor
289-
id: test-conductor
290-
timeout-minutes: 30
291-
working-directory: integration
292-
run: uv run test-conductor --name tc_unix ${{ needs.setup.outputs.FLAGS }}
293-
294-
- name: Upload Test Results
295-
uses: actions/upload-artifact@v6
296-
if: always()
297-
with:
298-
name: test-results-linux
299-
path: integration/tests/results/*
300-
retention-days: 7
301-
302-
- name: Upload Core Logs
303-
uses: actions/upload-artifact@v6
304-
if: ${{ failure() || cancelled() }}
305-
with:
306-
name: core-logs-linux-attempt-${{ github.run_attempt }}
307-
path: ~/synnax-data/synnax-core.log
308-
retention-days: 7
199+
uses: ./.github/workflows/test.integration.worker.yaml
200+
with:
201+
test_matrix: ${{ needs.setup.outputs.TEST_MATRIX }}
202+
ref_run_id: ${{ needs.setup.outputs.REF_RUN_ID }}
203+
os: ubuntu-latest
204+
flags: ${{ needs.setup.outputs.FLAGS }}
205+
secrets: inherit

0 commit comments

Comments
 (0)