Distributed may deadlock in `one_rank_first` when an exception occurs on the target rank

## Bug description

In `ignite/distributed/utils.py`, the `one_rank_first` context manager is susceptible to a distributed deadlock if an exception occurs within the context block on the target rank.

The current implementation uses two barriers to synchronize ranks. If the process with the designated `rank` encounters an error (e.g., a network failure during data download or a disk I/O error) within the `yield` block, it will never reach the second barrier. Consequently, all other processes that are either waiting at the first barrier or expecting to synchronize at the second will hang indefinitely.

## Steps to reproduce
Run the following logic in a distributed environment (2+ ranks):

```Python
import ignite.distributed as idist

# Simulate a crash only on rank 0
with idist.one_rank_first(rank=0):
    if idist.get_rank() == 0:
        raise RuntimeError("Rank 0 crashed!")
    # Other ranks are now stuck at the first barrier or will hit the second
```
## Expected behavior
The exception should propagate, and the entire distributed job should terminate gracefully. Instead, the non-crashed ranks hang, requiring a manual `kill` of the processes.

## Code Snippet
Current implementation in `ignite/distributed/utils.py`:

```Python
if current_rank != rank:
    barrier()

yield  # <--- If an exception happens here on 'rank'

if current_rank == rank:
    barrier() # <--- Other ranks never reach or pass synchronization
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed may deadlock in `one_rank_first` when an exception occurs on the target rank #3675

Bug description

Steps to reproduce

Expected behavior

Code Snippet

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Distributed may deadlock in one_rank_first when an exception occurs on the target rank #3675

Description

Bug description

Steps to reproduce

Expected behavior

Code Snippet

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Distributed may deadlock in `one_rank_first` when an exception occurs on the target rank #3675