Skip to content

[FEA] Spark Connect eventlog support for tools #2058

@sayedbilalbari

Description

@sayedbilalbari

Summary

Tools should be able to parse Spark Connect event logs. Connect introduces 9 new SparkListener event types that provide session-level and operation-level metadata not available in standard Spark. Additionally, the existing modifiedConfigs field in SparkListenerSQLExecutionStart — which the tools currently ignore — becomes critical for Connect workloads because spark.conf.set() is the only way Connect users can set configs (they don't control server startup).

This issue tracks the overall Spark Connect event-log support effort and is the parent issue for the phase-specific implementation work below.

Sub-issues

Suggested execution order:

  1. [FEA] Parse modifiedConfigs from SparkListenerSQLExecutionStart #2063
  2. [FEA] Add minimum Spark Connect event-log awareness to parser #2064
  3. [FEA] Extract and report Spark Connect session and operation metadata #2065

Part 1: Parse modifiedConfigs from SparkListenerSQLExecutionStart (Bug Fix)

Not Connect-specific — this is an existing gap in OSS Spark parsing that Connect makes more visible.

SparkListenerSQLExecutionStart has a modifiedConfigs field that captures session-level config overrides (set via spark.conf.set()). The tools already parse this event but ignore modifiedConfigs entirely.

Why it matters more for Connect

Aspect spark-submit Spark Connect
Baseline (SharedState) Set at spark-submit time, includes --conf Set at server startup, shared across all sessions
Config mechanism --conf flags + spark.conf.set() spark.conf.set() only
modifiedConfigs importance Low (often empty, since configs passed via --conf land in SparkListenerEnvironmentUpdate) High — primary config visibility mechanism; only way to see per-session overrides

What modifiedConfigs contains

Session-level configs that differ from the global baseline (SharedState.conf, frozen at startup). spark.driver.* and spark.executor.* prefixes are excluded. Example:

{
  "Event": "org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart",
  "executionId": 0,
  "modifiedConfigs": {
    "spark.sql.autoBroadcastJoinThreshold": "-1",
    "spark.app.name": "saralihalli-test"
  }
}

See core/docs/spark-connect-modifiedConfigs-analysis.md for the full code-path trace.


Part 2: Support Connect-Specific Events (New Feature)

Spark Connect introduces 9 new SparkListenerEvent types posted to the standard LiveListenerBus (so they appear in event logs). These enable multi-user attribution, operation-level timing, session lifecycle tracking, and error reporting at the Connect operation level.

Event Catalog

Service/Session Lifecycle

Event Key Fields Purpose
SparkListenerConnectServiceStarted hostAddress, bindingPort Identifies the log as a Connect server log (1 per app)
SparkListenerConnectSessionStarted sessionId, userId Marks client connection; enables multi-user attribution
SparkListenerConnectSessionClosed sessionId, userId Session end; paired with Started gives session duration

Operation Lifecycle

Event Key Fields Purpose
ConnectOperationStarted jobTag, operationId, sessionId, userId, statementText Richest event — contains protobuf plan text, correlation key
ConnectOperationAnalyzed jobTag, operationId Marks end of analysis phase
ConnectOperationReadyForExecution jobTag, operationId Marks end of planning phase
ConnectOperationFinished jobTag, operationId, producedRowCount Marks execution complete
ConnectOperationClosed jobTag, operationId Results sent to client, operation cleaned up
ConnectOperationFailed jobTag, operationId, errorMessage Error attribution with full traceback
ConnectOperationCanceled jobTag, operationId Client-initiated cancellation (not observed in test data)

Correlation Model

The jobTag is the universal correlation key linking Connect operations to existing Spark events:

Format: SparkConnect_OperationTag_User_{userId}_Session_{sessionId}_Operation_{operationId}

It appears in:

  • SparkListenerConnectOperation*.jobTag
  • SparkListenerSQLExecutionStart.jobTags (array)
  • SparkListenerJobStart.Properties["spark.job.tags"] (comma-separated string)
ConnectServiceStarted (1 per server)
  └── ConnectSessionStarted (1+ per server, one per client)
        ├── ConnectOperationStarted ──────────────────┐
        │     (jobTag, sessionId, userId,              │
        │      statementText)                          │ linked via jobTag
        ├── ConnectOperationAnalyzed                   │
        ├── ConnectOperationReadyForExecution           │
        ├── ConnectOperationFinished (or Failed)       │
        ├── ConnectOperationClosed                     │
        │                                              │
        │   SQLExecutionStart ─────────────────────────┤
        │     (executionId, physicalPlan,               │ jobTags contains
        │      modifiedConfigs, jobTags)               │ the same jobTag
        │                                              │
        │   JobStart ──────────────────────────────────┘
        │     (Properties["spark.job.tags"]             linked via spark.job.tags
        │      contains the same jobTag)
        │
        └── ConnectSessionClosed

Operation Lifecycle Timing

Connect events provide phase-level timing breakdown not available from standard Spark events:

OperationStarted    ─┐
                     ├─ analysis time (~500-700ms)
OperationAnalyzed   ─┤
                     ├─ planning time (~400ms)
ReadyForExecution   ─┤
                     ├─ execution time (~2-6s)
OperationFinished   ─┤
                     ├─ result transfer time (~250ms)
OperationClosed     ─┘

Implementation Phases

Phase 1: modifiedConfigs parsing (Part 1)

Tracked by: #2063

  • Parse modifiedConfigs from SparkListenerSQLExecutionStart
  • Surface per-execution config overrides in qualification/profiling output
  • Benefits all Spark workloads, not just Connect

Phase 2: Connect event awareness (Part 2 — minimum)

Tracked by: #2064

  • Detect Connect mode via presence of SparkListenerConnectServiceStarted
  • Accept Connect events in the event log parser (currently filtered by eventlog-parser.yaml)
  • Parse jobTag for operation↔SQL execution correlation

Phase 3: Connect metadata extraction (Part 2 — full)

Tracked by: #2065

  • Parse session events for user attribution (userId, sessionId)
  • Parse operation lifecycle events for phase-level timing
  • Parse OperationFailed for error attribution
  • Per-session config tracking via modifiedConfigs + sessionId correlation
  • Multi-user reporting (break down qualification/profiling results by user)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions