Add LoRA multimethod export to CoreML static LLM export by lucylq · Pull Request #18347 · pytorch/executorch

lucylq · 2026-03-19T21:46:27Z

Add --adapter CLI for exporting LoRA adapters as separate methods in
a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights
across methods. Supports combination with --multifunction for
decode/prefill variants per adapter.

Authored with Claude.

[ghstack-poisoned]

lucylq · 2026-03-19T21:46:28Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-03-19T21:46:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18347

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 0c2b493 with merge base e90d3c8 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-mypy (gh)
pull / test-samsung-models-linux / linux-job (gh)
test_resnet50_fp16

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: 16ab852 ghstack-comment-id: 4093498502 Pull-Request: #18347

[ghstack-poisoned]

Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: eaac058 ghstack-comment-id: 4093498502 Pull-Request: #18347

[ghstack-poisoned]

Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: d3fc801 ghstack-comment-id: 4093498502 Pull-Request: #18347

[ghstack-poisoned]

lucylq · 2026-03-20T00:50:49Z

+            "forward": _export_model(model, example_inputs, "base"),
+        }
+        for name, lora_model in lora_models.items():
+            methods[name] = _export_model(lora_model, example_inputs, name)


add methods for each lora

[ghstack-poisoned]

lucylq · 2026-03-20T00:51:37Z

+            methods[f"{name}_forward"] = _export_model(
+                lora_model, decode_inputs, f"{name} decode"
+            )
+            methods[f"{name}_prefill"] = _export_model(


add methods for each lora with separate prefill, decode

Not sure if this is how we want to do it, though.

Are we hardcoding these still? I thought you were going to include in the config?

I guess we might not include it in the config yet...

It also requires some refactor as coreml export doesn't take in llm_config - that might be better as a separate PR.

[ghstack-poisoned]

lucylq · 2026-03-20T19:37:42Z

-            constant_methods=constant_methods,
-            compile_config=edge_compile_config,
+        if has_adapters:
+            constant_methods["has_lora"] = True


Not sure if we should add a method like this here. Or if we add, it should be more granular, like a list of lora methods.

[ghstack-poisoned]

metascroy · 2026-03-23T18:20:26Z

+            methods[f"{name}_forward"] = _export_model(
+                lora_model, decode_inputs, f"{name} decode"
+            )
+            methods[f"{name}_prefill"] = _export_model(


Are we hardcoding these still? I thought you were going to include in the config?

metascroy · 2026-03-23T18:21:11Z

-            compile_config=edge_compile_config,
+        if has_adapters:
+            constant_methods["has_lora"] = True
+    elif has_adapters:


Why is has_adapaters mutually exclusive from multifunction?

Hmn, it isn't - this is the path if we have lora adapters without multifunction. The block above is multifunction with adapters if they exist.

But I think we can combine the adapter-no-multifunction and default branch (no adapters, no multifunction, single method export).

[ghstack-poisoned]

github-actions · 2026-03-24T22:22:52Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Adds optional LoRA adapter export support to the CoreML static LLM export script by emitting adapters as additional methods in a single multi-method PTE, enabling CoreML positional multi-method weight sharing to deduplicate base weights.

Changes:

Extend export_static_llm_coreml.py with --adapter CLI to load LoRA adapter checkpoints/configs and export them as separate methods (optionally alongside --multifunction decode/prefill variants).
Update model loading and state_dict key remapping to support LoRA modules under static_mha attention (e.g., wq.* -> wqs.0.*).
Adjust linear quantization filtering to avoid quantizing LoRA A/B projection weights while still quantizing base linear weights.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
examples/apple/coreml/llama/utils.py	Signature formatting only (no behavioral change).
examples/apple/coreml/llama/export_static_llm_coreml.py	Adds LoRA adapter loading/remapping and exports adapters as additional methods with multi-method weight sharing support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T22:27:47Z

+        from executorch.examples.models.llama.convert_weights import (
+            load_and_convert_unsloth_to_meta,
+        )


--adapter introduces a runtime dependency on safetensors (via load_and_convert_unsloth_to_meta importing safetensors.torch). If safetensors isn't installed, this will crash with a ModuleNotFoundError that doesn't explain how to fix it. Consider catching ModuleNotFoundError around this import/load and raising a clearer error (e.g., instructing to pip install safetensors) or documenting the dependency in the CLI help.

Suggested change

from executorch.examples.models.llama.convert_weights import (

load_and_convert_unsloth_to_meta,

)

try:

from executorch.examples.models.llama.convert_weights import (

load_and_convert_unsloth_to_meta,

)

except ModuleNotFoundError as e:

raise ModuleNotFoundError(

"Using --adapter requires the 'safetensors' package. "

"Install it with 'pip install safetensors' and try again."

) from e

Copilot · 2026-03-24T22:27:47Z

@@ -157,6 +170,23 @@ def load_model(
                f"layers.{i}.attention.wv.weight"
            )



load_model() accepts adapter_checkpoint / adapter_config independently. If adapter_checkpoint is provided without adapter_config, the model likely won’t be constructed with LoRA modules, so adapter keys get treated as unexpected and the export silently becomes the base model. Consider validating that both arguments must be provided together (raise ValueError early) to avoid a confusing no-op export.

Suggested change

if (adapter_checkpoint is None) != (adapter_config is None):

raise ValueError(

"adapter_checkpoint and adapter_config must both be provided together "

"when loading LoRA adapters."

)

Copilot · 2026-03-24T22:27:47Z

+                adapter_checkpoint=adapter_ckpt,
+                adapter_config=adapter_cfg,
+            )
+            lora_model = _transform_eager_model(lora_model, args, float_dtype)
+            lora_models[name] = lora_model


Adapter NAMEs are used as dictionary keys (lora_models[name] = ... / methods[name] = ...). Duplicate names will silently overwrite earlier adapters, and in single-method mode NAME="forward" would override the base method. Consider validating adapter names are unique and don’t collide with reserved method names like forward/prefill before starting the export.

Copilot · 2026-03-24T22:27:47Z

+        help="LoRA adapter: method name, path to adapter.safetensors, "
+        "path to adapter_config.json. Can be repeated for multiple adapters.",


The --adapter help says NAME is the “method name”, but in --multifunction mode the exported method names are suffixed ({NAME}_forward / {NAME}_prefill). Consider updating the CLI help (or the method naming) so users can predict the exported method names.

Suggested change

help="LoRA adapter: method name, path to adapter.safetensors, "

"path to adapter_config.json. Can be repeated for multiple adapters.",

help=(

"LoRA adapter: base method name, path to adapter.safetensors, path to "

"adapter_config.json. In --multifunction mode, the exported adapter "

"methods will be named {NAME}_prefill and {NAME}_forward. Can be "

"repeated for multiple adapters."

),

Copilot · 2026-03-24T22:27:48Z


+    if adapter_config is not None:
+        with open(adapter_config, "r") as f:
+            lora_config = json.loads(f.read())


adapter_config is parsed by indexing required keys ("r", "lora_alpha", "target_modules"); if the JSON is missing/renamed fields this will raise KeyError with little context. Consider validating the schema and raising a ValueError that points to the config path and the missing field(s) (or using .get() with an explicit error).

Suggested change

lora_config = json.loads(f.read())

lora_config = json.loads(f.read())

required_keys = ("r", "lora_alpha", "target_modules")

missing_keys = [key for key in required_keys if key not in lora_config]

if missing_keys:

raise ValueError(

f"Adapter config '{adapter_config}' is missing required field(s): "

f"{', '.join(missing_keys)}"

)

[ghstack-poisoned]

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T00:46:39Z

+    if adapter_config is not None:
+        with open(adapter_config, "r") as f:
+            lora_config = json.loads(f.read())
+        args.r = lora_config["r"]
+        args.lora_alpha = lora_config["lora_alpha"]
+        args.target_modules = lora_config["target_modules"]
+


load_model() accepts adapter_checkpoint and adapter_config independently, but if adapter_checkpoint is set without adapter_config the model will be constructed without LoRA modules and the adapter weights will be ignored (they become unexpected keys under strict=False). Consider validating that both are provided together (or neither), and raise a clear error when only one is set.

Copilot · 2026-03-31T00:46:39Z

+    # Load adapter models
+    lora_models = {}
+    if has_adapters:
+        for name, adapter_ckpt, adapter_cfg in args.adapter:
+            print(f"\nLoading adapter '{name}' from {adapter_ckpt}...")
+            lora_model, _ = load_model(
+                args.checkpoint,
+                args.params,
+                args.max_context_len,
+                generate_full_logits=generate_full_logits,
+                adapter_checkpoint=adapter_ckpt,
+                adapter_config=adapter_cfg,
+            )
+            lora_model = _transform_eager_model(lora_model, args, float_dtype)
+            lora_models[name] = lora_model


Loading each adapter model by calling load_model() re-reads and remaps the full base checkpoint for every adapter, which can be very slow and memory-intensive for large LLM checkpoints. Consider loading the base checkpoint once and reusing it (e.g., keep the base checkpoint dict in memory and create per-adapter copies with checkpoint | adapter_weights, or refactor load_model() to accept a preloaded state_dict) to avoid repeated disk I/O and key-renaming work.

Copilot · 2026-03-31T00:46:39Z

+    )
+
    args = parser.parse_args()



--adapter method names are used as keys in lora_models/methods without validation. Duplicate adapter names will silently overwrite earlier entries, and in fixed-seqlen mode an adapter named forward will overwrite the base method. Consider validating adapter names for uniqueness and for collisions with reserved/base method names before exporting.

Suggested change

# Validate adapter method names to avoid silent overwrites or collisions

if args.adapter is not None:

adapter_names = [a[0] for a in args.adapter]

# Ensure adapter names are unique

seen = set()

duplicates = set()

for name in adapter_names:

if name in seen:

duplicates.add(name)

else:

seen.add(name)

if duplicates:

raise ValueError(

f"Duplicate adapter method name(s) specified: {sorted(duplicates)}. "

"Adapter names must be unique."

)

# Prevent collisions with reserved/base method names

reserved_method_names = {"forward"}

if args.multifunction:

# In multifunction mode, prefill/decode are reserved method names

reserved_method_names.update({"prefill", "decode"})

colliding = reserved_method_names.intersection(adapter_names)

if colliding:

raise ValueError(

"Adapter method name(s) collide with reserved method names "

f"{sorted(reserved_method_names)}: {sorted(colliding)}. "

"Please choose different adapter names."

)

Copilot · 2026-03-31T00:46:40Z

+        if has_adapters:
+            constant_methods["has_lora"] = True


constant_methods["has_lora"] is written when adapters are present, but there are no other references to has_lora in the CoreML llama examples. If this metadata isn’t consumed by the C++ runner (or elsewhere), consider removing it or documenting/implementing the consumer so it doesn’t become stale/unused program metadata.

Suggested change

if has_adapters:

constant_methods["has_lora"] = True

[ghstack-poisoned]

lucylq added 5 commits March 19, 2026 14:40

Update

da7f489

[ghstack-poisoned]

Update

ac42ef2

[ghstack-poisoned]

Update

4c90db6

[ghstack-poisoned]

Update

c470898

[ghstack-poisoned]

Update

6dabe0e

[ghstack-poisoned]

lucylq requested review from cccclai and metascroy as code owners March 19, 2026 21:46

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2026

lucylq added 5 commits March 19, 2026 14:50

Update

bb400fa

[ghstack-poisoned]

Update

367970a

[ghstack-poisoned]

Update

bd071ee

[ghstack-poisoned]

Update

23f921a

[ghstack-poisoned]

Update

8351887

[ghstack-poisoned]

lucylq added 5 commits March 19, 2026 14:53

Update

2bb273d

[ghstack-poisoned]

Update

a4598d9

[ghstack-poisoned]

Update

be27f2c

[ghstack-poisoned]

Update

ae03b26

[ghstack-poisoned]

Update

4f64738

[ghstack-poisoned]

lucylq added 3 commits March 19, 2026 15:00

Update

a8a23df

[ghstack-poisoned]

Update

2de0d83

[ghstack-poisoned]

Update

182abb6

[ghstack-poisoned]

Update

a09454b

[ghstack-poisoned]

lucylq mentioned this pull request Mar 19, 2026

Add --method argument to CoreML static LLM runner #18355

Open

Update

62c38bc

[ghstack-poisoned]

lucylq commented Mar 20, 2026

View reviewed changes

Update

7968c1b

[ghstack-poisoned]

lucylq commented Mar 20, 2026

View reviewed changes

lucylq added 4 commits March 20, 2026 10:27

Update

06a59b2

[ghstack-poisoned]

Update

d58628d

[ghstack-poisoned]

Update

10f6022

[ghstack-poisoned]

Update

95a866e

[ghstack-poisoned]

lucylq commented Mar 20, 2026

View reviewed changes

lucylq added 3 commits March 20, 2026 15:52

Update

d983688

[ghstack-poisoned]

Update

0384f06

[ghstack-poisoned]

Update

7ef0005

[ghstack-poisoned]

lucylq changed the base branch from gh/lucylq/145/head to gh/lucylq/144/head March 20, 2026 22:52

lucylq commented Mar 20, 2026

View reviewed changes

Comment thread examples/apple/coreml/llama/export_static_llm_coreml.py

Base automatically changed from gh/lucylq/144/head to main March 23, 2026 16:55

metascroy reviewed Mar 23, 2026

View reviewed changes

Update

4a8b270

[ghstack-poisoned]

Copilot AI review requested due to automatic review settings March 24, 2026 22:22

Copilot started reviewing on behalf of lucylq March 24, 2026 22:22 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

metascroy approved these changes Mar 26, 2026

View reviewed changes

lucylq added 2 commits March 30, 2026 16:56

Update

5774041

[ghstack-poisoned]

Update

fa7904b

[ghstack-poisoned]

Copilot AI review requested due to automatic review settings March 31, 2026 00:42

Copilot started reviewing on behalf of lucylq March 31, 2026 00:43 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Update

0c2b493

[ghstack-poisoned]

-        from executorch.examples.models.llama.convert_weights import (
-            load_and_convert_unsloth_to_meta,
-        )
+        try:
+            from executorch.examples.models.llama.convert_weights import (
+                load_and_convert_unsloth_to_meta,
+            )
+        except ModuleNotFoundError as e:
+            raise ModuleNotFoundError(
+                "Using --adapter requires the 'safetensors' package. "
+                "Install it with 'pip install safetensors' and try again."
+            ) from e

		@@ -157,6 +170,23 @@ def load_model(
		f"layers.{i}.attention.wv.weight"
		)

+    if (adapter_checkpoint is None) != (adapter_config is None):
+        raise ValueError(
+            "adapter_checkpoint and adapter_config must both be provided together "
+            "when loading LoRA adapters."
+        )

		help="LoRA adapter: method name, path to adapter.safetensors, "
		"path to adapter_config.json. Can be repeated for multiple adapters.",

-        help="LoRA adapter: method name, path to adapter.safetensors, "
-        "path to adapter_config.json. Can be repeated for multiple adapters.",
+        help=(
+            "LoRA adapter: base method name, path to adapter.safetensors, path to "
+            "adapter_config.json. In --multifunction mode, the exported adapter "
+            "methods will be named {NAME}_prefill and {NAME}_forward. Can be "
+            "repeated for multiple adapters."
+        ),

-            lora_config = json.loads(f.read())
+            lora_config = json.loads(f.read())
+        required_keys = ("r", "lora_alpha", "target_modules")
+        missing_keys = [key for key in required_keys if key not in lora_config]
+        if missing_keys:
+            raise ValueError(
+                f"Adapter config '{adapter_config}' is missing required field(s): "
+                f"{', '.join(missing_keys)}"
+            )

+    # Validate adapter method names to avoid silent overwrites or collisions
+    if args.adapter is not None:
+        adapter_names = [a[0] for a in args.adapter]
+        # Ensure adapter names are unique
+        seen = set()
+        duplicates = set()
+        for name in adapter_names:
+            if name in seen:
+                duplicates.add(name)
+            else:
+                seen.add(name)
+        if duplicates:
+            raise ValueError(
+                f"Duplicate adapter method name(s) specified: {sorted(duplicates)}. "
+                "Adapter names must be unique."
+            )
+        # Prevent collisions with reserved/base method names
+        reserved_method_names = {"forward"}
+        if args.multifunction:
+            # In multifunction mode, prefill/decode are reserved method names
+            reserved_method_names.update({"prefill", "decode"})
+        colliding = reserved_method_names.intersection(adapter_names)
+        if colliding:
+            raise ValueError(
+                "Adapter method name(s) collide with reserved method names "
+                f"{sorted(reserved_method_names)}: {sorted(colliding)}. "
+                "Please choose different adapter names."
+            )

Conversation

lucylq commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucylq commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18347

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

lucylq Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

metascroy Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

metascroy Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

metascroy Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 24, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

lucylq commented Mar 19, 2026 •

edited

Loading

lucylq commented Mar 19, 2026 •

edited

Loading

pytorch-bot Bot commented Mar 19, 2026 •

edited

Loading

lucylq Mar 20, 2026 •

edited

Loading

lucylq Mar 23, 2026 •

edited

Loading

This PR needs a `release notes:` label