Skip to content

[BUG] KeyError: 'appId' in QualX feature extraction for certain GPU eventlogs #2050

@parthosa

Description

@parthosa

Summary

DefaultFeaturesExtractor.extract_raw_features() crashes with KeyError: 'appId' when processing GPU eventlogs that do not produce an appId column during feature extraction. The error is an unhandled KeyError with no context, making it difficult for callers to diagnose.

Steps to Reproduce

  1. Obtain a GPU eventlog where the Spark application does not emit data source information (e.g., data_source_information CSV is missing or empty in profiling output)
  2. Call predict_from_profiles() with that eventlog's profiling output
from spark_rapids_tools.tools.qualx.predict import predict_from_profiles

results = predict_from_profiles(
    model_type="xgboost",
    model_path="<path_to_model>",
    profile_output_dirs=["<path_to_profiling_output>"],
)

Observed Behavior

Raw KeyError with no actionable context:

Traceback (most recent call last):
  File ".../spark_rapids_tools/tools/qualx/predict.py", line ..., in predict_from_profiles
    ...
  File ".../spark_rapids_tools/tools/qualx/featurizers/default.py", line 438, in extract_raw_features
    ...
KeyError: 'appId'

The error originates in extract_raw_features() where the code does groupby(['appId', 'sqlID']) on a DataFrame that is missing the appId column.

Expected Behavior

Either:

  1. Raise a descriptive error, e.g., ValueError("Feature extraction failed: 'appId' column not found in extracted features for <app_id>. The eventlog may not contain sufficient profiling data.")
  2. Return an empty result / skip the app gracefully, so callers can handle it

Metadata

Metadata

Assignees

No one assigned

    Labels

    ? - Needs TriagebugSomething isn't workinguser_toolsScope the wrapper module running CSP, QualX, and reports (python)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions