Skip to content

Add LoRA multimethod export to CoreML static LLM export#18347

Open
lucylq wants to merge 37 commits intomainfrom
gh/lucylq/146/head
Open

Add LoRA multimethod export to CoreML static LLM export#18347
lucylq wants to merge 37 commits intomainfrom
gh/lucylq/146/head

Conversation

@lucylq
Copy link
Copy Markdown
Contributor

@lucylq lucylq commented Mar 19, 2026

Add --adapter CLI for exporting LoRA adapters as separate methods in
a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights
across methods. Supports combination with --multifunction for
decode/prefill variants per adapter.

Authored with Claude.

lucylq added 5 commits March 19, 2026 14:40
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@lucylq
Copy link
Copy Markdown
Contributor Author

lucylq commented Mar 19, 2026

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 19, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18347

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 0c2b493 with merge base e90d3c8 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lucylq added a commit that referenced this pull request Mar 19, 2026
Add --adapter CLI for exporting LoRA adapters as separate methods in
a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights
across methods. Supports combination with --multifunction for
decode/prefill variants per adapter.

Authored with Claude.


ghstack-source-id: 16ab852
ghstack-comment-id: 4093498502
Pull-Request: #18347
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2026
lucylq added 5 commits March 19, 2026 14:50
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
lucylq added a commit that referenced this pull request Mar 19, 2026
Add --adapter CLI for exporting LoRA adapters as separate methods in
a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights
across methods. Supports combination with --multifunction for
decode/prefill variants per adapter.

Authored with Claude.


ghstack-source-id: eaac058
ghstack-comment-id: 4093498502
Pull-Request: #18347
lucylq added 5 commits March 19, 2026 14:53
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
lucylq added a commit that referenced this pull request Mar 19, 2026
Add --adapter CLI for exporting LoRA adapters as separate methods in
a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights
across methods. Supports combination with --multifunction for
decode/prefill variants per adapter.

Authored with Claude.


ghstack-source-id: d3fc801
ghstack-comment-id: 4093498502
Pull-Request: #18347
lucylq added 3 commits March 19, 2026 15:00
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
"forward": _export_model(model, example_inputs, "base"),
}
for name, lora_model in lora_models.items():
methods[name] = _export_model(lora_model, example_inputs, name)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add methods for each lora

[ghstack-poisoned]
methods[f"{name}_forward"] = _export_model(
lora_model, decode_inputs, f"{name} decode"
)
methods[f"{name}_prefill"] = _export_model(
Copy link
Copy Markdown
Contributor Author

@lucylq lucylq Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add methods for each lora with separate prefill, decode

Not sure if this is how we want to do it, though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we hardcoding these still? I thought you were going to include in the config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we might not include it in the config yet...

It also requires some refactor as coreml export doesn't take in llm_config - that might be better as a separate PR.

lucylq added 4 commits March 20, 2026 10:27
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
constant_methods=constant_methods,
compile_config=edge_compile_config,
if has_adapters:
constant_methods["has_lora"] = True
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should add a method like this here. Or if we add, it should be more granular, like a list of lora methods.

lucylq added 3 commits March 20, 2026 15:52
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@lucylq lucylq changed the base branch from gh/lucylq/145/head to gh/lucylq/144/head March 20, 2026 22:52
Comment thread examples/apple/coreml/llama/export_static_llm_coreml.py
Base automatically changed from gh/lucylq/144/head to main March 23, 2026 16:55
methods[f"{name}_forward"] = _export_model(
lora_model, decode_inputs, f"{name} decode"
)
methods[f"{name}_prefill"] = _export_model(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we hardcoding these still? I thought you were going to include in the config?

compile_config=edge_compile_config,
if has_adapters:
constant_methods["has_lora"] = True
elif has_adapters:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is has_adapaters mutually exclusive from multifunction?

Copy link
Copy Markdown
Contributor Author

@lucylq lucylq Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmn, it isn't - this is the path if we have lora adapters without multifunction. The block above is multifunction with adapters if they exist.

But I think we can combine the adapter-no-multifunction and default branch (no adapters, no multifunction, single method export).

[ghstack-poisoned]
Copilot AI review requested due to automatic review settings March 24, 2026 22:22
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional LoRA adapter export support to the CoreML static LLM export script by emitting adapters as additional methods in a single multi-method PTE, enabling CoreML positional multi-method weight sharing to deduplicate base weights.

Changes:

  • Extend export_static_llm_coreml.py with --adapter CLI to load LoRA adapter checkpoints/configs and export them as separate methods (optionally alongside --multifunction decode/prefill variants).
  • Update model loading and state_dict key remapping to support LoRA modules under static_mha attention (e.g., wq.* -> wqs.0.*).
  • Adjust linear quantization filtering to avoid quantizing LoRA A/B projection weights while still quantizing base linear weights.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
examples/apple/coreml/llama/utils.py Signature formatting only (no behavioral change).
examples/apple/coreml/llama/export_static_llm_coreml.py Adds LoRA adapter loading/remapping and exports adapters as additional methods with multi-method weight sharing support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +174 to +176
from executorch.examples.models.llama.convert_weights import (
load_and_convert_unsloth_to_meta,
)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--adapter introduces a runtime dependency on safetensors (via load_and_convert_unsloth_to_meta importing safetensors.torch). If safetensors isn't installed, this will crash with a ModuleNotFoundError that doesn't explain how to fix it. Consider catching ModuleNotFoundError around this import/load and raising a clearer error (e.g., instructing to pip install safetensors) or documenting the dependency in the CLI help.

Suggested change
from executorch.examples.models.llama.convert_weights import (
load_and_convert_unsloth_to_meta,
)
try:
from executorch.examples.models.llama.convert_weights import (
load_and_convert_unsloth_to_meta,
)
except ModuleNotFoundError as e:
raise ModuleNotFoundError(
"Using --adapter requires the 'safetensors' package. "
"Install it with 'pip install safetensors' and try again."
) from e

Copilot uses AI. Check for mistakes.
@@ -157,6 +170,23 @@ def load_model(
f"layers.{i}.attention.wv.weight"
)

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_model() accepts adapter_checkpoint / adapter_config independently. If adapter_checkpoint is provided without adapter_config, the model likely won’t be constructed with LoRA modules, so adapter keys get treated as unexpected and the export silently becomes the base model. Consider validating that both arguments must be provided together (raise ValueError early) to avoid a confusing no-op export.

Suggested change
if (adapter_checkpoint is None) != (adapter_config is None):
raise ValueError(
"adapter_checkpoint and adapter_config must both be provided together "
"when loading LoRA adapters."
)

Copilot uses AI. Check for mistakes.
Comment on lines +586 to +590
adapter_checkpoint=adapter_ckpt,
adapter_config=adapter_cfg,
)
lora_model = _transform_eager_model(lora_model, args, float_dtype)
lora_models[name] = lora_model
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapter NAMEs are used as dictionary keys (lora_models[name] = ... / methods[name] = ...). Duplicate names will silently overwrite earlier adapters, and in single-method mode NAME="forward" would override the base method. Consider validating adapter names are unique and don’t collide with reserved method names like forward/prefill before starting the export.

Copilot uses AI. Check for mistakes.
Comment on lines +525 to +526
help="LoRA adapter: method name, path to adapter.safetensors, "
"path to adapter_config.json. Can be repeated for multiple adapters.",
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --adapter help says NAME is the “method name”, but in --multifunction mode the exported method names are suffixed ({NAME}_forward / {NAME}_prefill). Consider updating the CLI help (or the method naming) so users can predict the exported method names.

Suggested change
help="LoRA adapter: method name, path to adapter.safetensors, "
"path to adapter_config.json. Can be repeated for multiple adapters.",
help=(
"LoRA adapter: base method name, path to adapter.safetensors, path to "
"adapter_config.json. In --multifunction mode, the exported adapter "
"methods will be named {NAME}_prefill and {NAME}_forward. Can be "
"repeated for multiple adapters."
),

Copilot uses AI. Check for mistakes.

if adapter_config is not None:
with open(adapter_config, "r") as f:
lora_config = json.loads(f.read())
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adapter_config is parsed by indexing required keys ("r", "lora_alpha", "target_modules"); if the JSON is missing/renamed fields this will raise KeyError with little context. Consider validating the schema and raising a ValueError that points to the config path and the missing field(s) (or using .get() with an explicit error).

Suggested change
lora_config = json.loads(f.read())
lora_config = json.loads(f.read())
required_keys = ("r", "lora_alpha", "target_modules")
missing_keys = [key for key in required_keys if key not in lora_config]
if missing_keys:
raise ValueError(
f"Adapter config '{adapter_config}' is missing required field(s): "
f"{', '.join(missing_keys)}"
)

Copilot uses AI. Check for mistakes.
lucylq added 2 commits March 30, 2026 16:56
[ghstack-poisoned]
[ghstack-poisoned]
Copilot AI review requested due to automatic review settings March 31, 2026 00:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +140 to +146
if adapter_config is not None:
with open(adapter_config, "r") as f:
lora_config = json.loads(f.read())
args.r = lora_config["r"]
args.lora_alpha = lora_config["lora_alpha"]
args.target_modules = lora_config["target_modules"]

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_model() accepts adapter_checkpoint and adapter_config independently, but if adapter_checkpoint is set without adapter_config the model will be constructed without LoRA modules and the adapter weights will be ignored (they become unexpected keys under strict=False). Consider validating that both are provided together (or neither), and raise a clear error when only one is set.

Copilot uses AI. Check for mistakes.
Comment on lines +578 to +592
# Load adapter models
lora_models = {}
if has_adapters:
for name, adapter_ckpt, adapter_cfg in args.adapter:
print(f"\nLoading adapter '{name}' from {adapter_ckpt}...")
lora_model, _ = load_model(
args.checkpoint,
args.params,
args.max_context_len,
generate_full_logits=generate_full_logits,
adapter_checkpoint=adapter_ckpt,
adapter_config=adapter_cfg,
)
lora_model = _transform_eager_model(lora_model, args, float_dtype)
lora_models[name] = lora_model
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading each adapter model by calling load_model() re-reads and remaps the full base checkpoint for every adapter, which can be very slow and memory-intensive for large LLM checkpoints. Consider loading the base checkpoint once and reusing it (e.g., keep the base checkpoint dict in memory and create per-adapter copies with checkpoint | adapter_weights, or refactor load_model() to accept a preloaded state_dict) to avoid repeated disk I/O and key-renaming work.

Copilot uses AI. Check for mistakes.
)

args = parser.parse_args()

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--adapter method names are used as keys in lora_models/methods without validation. Duplicate adapter names will silently overwrite earlier entries, and in fixed-seqlen mode an adapter named forward will overwrite the base method. Consider validating adapter names for uniqueness and for collisions with reserved/base method names before exporting.

Suggested change
# Validate adapter method names to avoid silent overwrites or collisions
if args.adapter is not None:
adapter_names = [a[0] for a in args.adapter]
# Ensure adapter names are unique
seen = set()
duplicates = set()
for name in adapter_names:
if name in seen:
duplicates.add(name)
else:
seen.add(name)
if duplicates:
raise ValueError(
f"Duplicate adapter method name(s) specified: {sorted(duplicates)}. "
"Adapter names must be unique."
)
# Prevent collisions with reserved/base method names
reserved_method_names = {"forward"}
if args.multifunction:
# In multifunction mode, prefill/decode are reserved method names
reserved_method_names.update({"prefill", "decode"})
colliding = reserved_method_names.intersection(adapter_names)
if colliding:
raise ValueError(
"Adapter method name(s) collide with reserved method names "
f"{sorted(reserved_method_names)}: {sorted(colliding)}. "
"Please choose different adapter names."
)

Copilot uses AI. Check for mistakes.
Comment on lines +705 to +706
if has_adapters:
constant_methods["has_lora"] = True
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant_methods["has_lora"] is written when adapters are present, but there are no other references to has_lora in the CoreML llama examples. If this metadata isn’t consumed by the C++ runner (or elsewhere), consider removing it or documenting/implementing the consumer so it doesn’t become stale/unused program metadata.

Suggested change
if has_adapters:
constant_methods["has_lora"] = True

Copilot uses AI. Check for mistakes.
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants