Skip to content

Add Sysbench metrics#9

Open
aliel wants to merge 1 commit intodevfrom
aliel-sysbench
Open

Add Sysbench metrics#9
aliel wants to merge 1 commit intodevfrom
aliel-sysbench

Conversation

@aliel
Copy link
Copy Markdown
Member

@aliel aliel commented Apr 4, 2024

No description provided.

@aliel aliel requested a review from hoh April 4, 2024 12:48
Copy link
Copy Markdown

@foxpatch-aleph foxpatch-aleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR successfully refactors shared utilities into utils.py and adds a new benchmarks module that collects sysbench CPU/RAM/disk metrics from CRNs. The code structure is reasonable, but there are a few real bugs: TimeoutGenerator is duplicated instead of imported from utils.py; asyncio.gather is missing return_exceptions=True despite the return type claiming Union[B, BaseException]; and save_as_json is called with a misnamed keyword arg. There is also no test coverage for the new benchmark module.

aleph_scoring/benchmarks/__init__.py (line 33): TimeoutGenerator is re-defined here with an identical NewType declaration, but it was already moved to aleph_scoring/utils.py. This duplication will cause the two types to be distinct (same structure, different identity). Import it from utils instead:

from aleph_scoring.utils import NodeInfo, TimeoutGenerator, ...

aleph_scoring/benchmarks/__init__.py (line 115): asyncio.gather(...) is called without return_exceptions=True, but the return type annotation is Sequence[Union[B, BaseException]], which only makes sense with that flag set. Without it, any exception that escapes get_crn_benchmarks (e.g. an unexpected error not covered by the three caught types) will cancel the entire gather and propagate to the caller. Either add return_exceptions=True or fix the return type to Sequence[B].

aleph_scoring/benchmarks/__init__.py (line 55): The version-fetching retry loop is misleading: get_crn_version internally swallows all exceptions and returns None on failure, so the for loop with range(3) just calls it up to 3 times silently regardless of the failure mode. The variable attempt is also unused. Either remove the loop (single call is sufficient) or add a short sleep between retries to make the retry intent clear.

aleph_scoring/__main__.py (line 146): save_as_json is called with node_metrics=node_benchmarks, but the variable being passed is a NodeBenchmarks object, not node metrics. If save_as_json accepts a keyword arg named node_metrics, this will silently work but is confusing. Check the signature of save_as_json and rename the parameter or the argument to avoid the mismatch.

aleph_scoring/__main__.py (line 133): run_benchmarks accepts a stdout parameter (defaulting to False) but the benchmark CLI command (line 339) never passes stdout, so the stdout print path is permanently dead code from the CLI. Either expose it as a CLI option (as run_measurements does) or remove the parameter.

aleph_scoring/benchmarks/__init__.py (line 1): No tests exist for the new benchmark module. At minimum, consider adding unit tests for: get_crn_benchmarks with mocked HTTP responses (200, non-200, connection error, timeout); collect_node_benchmarks with an empty node list; and the CrnBenchmarks/NodeBenchmarks model validators.

aleph_scoring/config.py (line 40): BENCHMARK_VM_HASH is hardcoded to what appears to be a devnet VM hash. Add a comment explaining what this VM is expected to expose (the /vm/{hash}/sysbench/* endpoints), so future maintainers understand why this specific hash is the default and when it would need to change.

aleph_scoring/utils.py (line 218): Missing blank line before the timeout_generator function definition (PEP 8 / existing code style).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants