Use caching allocator for runner (#15730)#15730
Conversation
When doing prefill for quantized kv cache, with large prefill length, parallelizing this op helps. Differential Revision: [D84962234](https://our.internmc.facebook.com/intern/diff/D84962234/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D84962234/)! [ghstack-poisoned]
Reason this doesnt directly use Vectorize class is because the equivalent APIs dont exist in Vectorize class Differential Revision: [D84962236](https://our.internmc.facebook.com/intern/diff/D84962236/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D84962236/)! [ghstack-poisoned]
As the title Differential Revision: [D84962233](https://our.internmc.facebook.com/intern/diff/D84962233/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D84962233/)! [ghstack-poisoned]
For small models dequantizing portions of v cache causes extra alloc overhead. Probably a better way to handle this is to dequantize entire v cache outside the model There isnt significant perf advantage from this yet but subsequent diffs will use caching allocator where this refactor help. Differential Revision: [D85532077](https://our.internmc.facebook.com/intern/diff/D85532077/) [ghstack-poisoned]
Meant to use this for temp allocator for kernels. Specifically for sdpa, it seems that on iOS there is a significant overhead coming from allocations Differential Revision: [D85532079](https://our.internmc.facebook.com/intern/diff/D85532079/) [ghstack-poisoned]
…allocator" Meant to use this for temp allocator for kernels. Specifically for sdpa, it seems that on iOS there is a significant overhead coming from allocations Differential Revision: [D85532079](https://our.internmc.facebook.com/intern/diff/D85532079/) [ghstack-poisoned]
Meant to use this for temp allocator for kernels. Specifically for sdpa, it seems that on iOS there is a significant overhead coming from allocations Differential Revision: [D85532079](https://our.internmc.facebook.com/intern/diff/D85532079/) [ghstack-poisoned]
…allocator" Meant to use this for temp allocator for kernels. Specifically for sdpa, it seems that on iOS there is a significant overhead coming from allocations Differential Revision: [D85532079](https://our.internmc.facebook.com/intern/diff/D85532079/) [ghstack-poisoned]
Meant to use this for temp allocator for kernels. Specifically for sdpa, it seems that on iOS there is a significant overhead coming from allocations Differential Revision: [D85532079](https://our.internmc.facebook.com/intern/diff/D85532079/) [ghstack-poisoned]
…allocator" Meant to use this for temp allocator for kernels. Specifically for sdpa, it seems that on iOS there is a significant overhead coming from allocations Differential Revision: [D85532079](https://our.internmc.facebook.com/intern/diff/D85532079/) [ghstack-poisoned]
Meant to use this for temp allocator for kernels. Specifically for sdpa, it seems that on iOS there is a significant overhead coming from allocations Differential Revision: [D85532079](https://our.internmc.facebook.com/intern/diff/D85532079/) [ghstack-poisoned]
…allocator" Meant to use this for temp allocator for kernels. Specifically for sdpa, it seems that on iOS there is a significant overhead coming from allocations Differential Revision: [D85532079](https://our.internmc.facebook.com/intern/diff/D85532079/) [ghstack-poisoned]
Meant to use this for temp allocator for kernels. Specifically for sdpa, it seems that on iOS there is a significant overhead coming from allocations Differential Revision: [D85532079](https://our.internmc.facebook.com/intern/diff/D85532079/) [ghstack-poisoned]
This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
Existing constructors dont compose well such that if you want data loader or data files constructor then you cannot get to override memory allocator. Fix that. Differential Revision: [D86120037](https://our.internmc.facebook.com/intern/diff/D86120037/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15730
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 Cancelled Jobs, 2 Unrelated FailuresAs of commit c75df37 with merge base eef7921 ( CANCELLED JOBS - The following jobs were cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/) ghstack-source-id: 322321964 Pull Request resolved: #15730
Pull Request resolved: #15730 We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 327688519 @exported-using-ghexport Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/)
…r runner" We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/) [ghstack-poisoned]
Pull Request resolved: #15730 We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 @exported-using-ghexport Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/)
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038
3cd0176 to
32ebe0f
Compare
|
@kimishpatel has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86120038. |
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038
32ebe0f to
6ebb435
Compare
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Pull Request resolved: #15730 We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 367670359 @exported-using-ghexport Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/)
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Pull Request resolved: #15730 We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 368353572 @exported-using-ghexport Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/)
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Pull Request resolved: #15730 We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 368981082 @exported-using-ghexport Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/)
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Pull Request resolved: #15730 We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 368983883 @exported-using-ghexport Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/)
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Pull Request resolved: #15730 We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 369818674 @exported-using-ghexport Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/)
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Summary: We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 328001114 exported-using-ghexport Reviewed By: navsud, derekdixu Differential Revision: D86120038 [ghstack-poisoned]
Pull Request resolved: #15730 We observed that on iOS it improves perf by 6% because SDPA op does temp allocations. No significant difference on android though. ghstack-source-id: 372501554 @exported-using-ghexport Differential Revision: [D86120038](https://our.internmc.facebook.com/intern/diff/D86120038/)
Summary:
We observed that on iOS it improves perf by 6% because SDPA op does temp allocations.
No significant difference on android though.
ghstack-source-id: 328001114
exported-using-ghexport
Reviewed By: navsud, derekdixu
Differential Revision: D86120038