Skip to content

Optimize World::get_entity_mut for large entity slices#23740

Open
CrazyRoka wants to merge 3 commits intobevyengine:mainfrom
CrazyRoka:optimize-entity-fetch-mut
Open

Optimize World::get_entity_mut for large entity slices#23740
CrazyRoka wants to merge 3 commits intobevyengine:mainfrom
CrazyRoka:optimize-entity-fetch-mut

Conversation

@CrazyRoka
Copy link
Copy Markdown
Contributor

Objective

Optimize World::get_entity_mut (and the underlying WorldEntityFetch trait) when a slice of entities is passed in.
The previous duplicate-entity check used nested loops (O(N²)) and showed up as a CPU hotspot in real workloads with thousands of entities.

This PR makes the check O(N) while preserving exact behaviour and error semantics.

Solution

  • Replaced the nested for i in 0..len { for j in 0..i } duplicate check with a single-pass EntityHashSet in both &[Entity] and &[Entity; N] implementations of WorldEntityFetch::fetch_mut.
  • Added a dedicated Criterion benchmark (get_entity_mut_slice) that exercises the hot path with slices up to 2000 entities.

Testing

  • Ran the new benchmark before and after the change using cargo bench -p benches --bench ecs -- get_entity_mut_slice.
  • Tested on Linux (x86_64).

Showcase

Benchmark results before:

baseline

Benchmark results after:

optimized

Criterion table summary:

get_entity_mut_slice/size/10
                        time:   [145.55 ns 145.85 ns 146.16 ns]
                        change: [+149.24% +150.50% +151.72%] (p = 0.00 < 0.05)
                        Performance has regressed.
get_entity_mut_slice/size/100
                        time:   [1.3810 µs 1.3848 µs 1.3886 µs]
                        change: [−32.743% −32.141% −31.647%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/200
                        time:   [2.7047 µs 2.7117 µs 2.7187 µs]
                        change: [−58.343% −58.163% −57.978%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/400
                        time:   [5.3071 µs 5.3299 µs 5.3552 µs]
                        change: [−77.351% −77.065% −76.815%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/600
                        time:   [7.9886 µs 8.0188 µs 8.0498 µs]
                        change: [−83.848% −83.675% −83.525%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/1000
                        time:   [13.218 µs 13.275 µs 13.334 µs]
                        change: [−89.906% −89.836% −89.763%] (p = 0.00 < 0.05)
                        Performance has improved.
get_entity_mut_slice/size/2000
                        time:   [27.051 µs 27.107 µs 27.160 µs]
                        change: [−94.644% −94.611% −94.581%] (p = 0.00 < 0.05)
                        Performance has improved.

Replace nested loops with a HashSet for O(N) duplicate entity detection.
This improves performance significantly for larger entity lists.
@kfc35 kfc35 added C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward A-ECS Entities, components, systems, and events labels Apr 9, 2026
@github-project-automation github-project-automation bot moved this to Needs SME Triage in ECS Apr 9, 2026
@Victoronz
Copy link
Copy Markdown
Contributor

Good observation that EntityHashSet is more efficient for greater entity quantities!
These impls were originally only intended for low N, so that use case should not see regression.
The regression can be addressed by using the previous duplicate check below a certain N.

I'll also note that the vast, vast majority of arrays are small, so the EntityHashSet branch there should practically never be hit.

As an isolated change this makes sense (and we can merge it in the meantime)!
Overall though, I am still of the opinion that implicit get_entity_mut for slices is iffy design as was previously discussed here (and its follow-up PR).

I am curious, in the use case you've seen, is the entire result always used, or only partially iterated/consumed?

If only partial use is needed, then an iteration-based duplication check would be more performant.

Additionally, if the source slices/arrays are not mutated between each get_entity_mut call, then placing the check after each mutation (by turning it into an EntitySet) could also reduce unnecessary work.

@Victoronz Victoronz added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged

Projects

Status: Needs SME Triage

Development

Successfully merging this pull request may close these issues.

3 participants