Add LoRA multimethod export to CoreML static LLM export#18347
Add LoRA multimethod export to CoreML static LLM export#18347
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18347
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit 0c2b493 with merge base e90d3c8 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: 16ab852 ghstack-comment-id: 4093498502 Pull-Request: #18347
Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: eaac058 ghstack-comment-id: 4093498502 Pull-Request: #18347
Add --adapter CLI for exporting LoRA adapters as separate methods in a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights across methods. Supports combination with --multifunction for decode/prefill variants per adapter. Authored with Claude. ghstack-source-id: d3fc801 ghstack-comment-id: 4093498502 Pull-Request: #18347
| "forward": _export_model(model, example_inputs, "base"), | ||
| } | ||
| for name, lora_model in lora_models.items(): | ||
| methods[name] = _export_model(lora_model, example_inputs, name) |
There was a problem hiding this comment.
add methods for each lora
| methods[f"{name}_forward"] = _export_model( | ||
| lora_model, decode_inputs, f"{name} decode" | ||
| ) | ||
| methods[f"{name}_prefill"] = _export_model( |
There was a problem hiding this comment.
add methods for each lora with separate prefill, decode
Not sure if this is how we want to do it, though.
There was a problem hiding this comment.
Are we hardcoding these still? I thought you were going to include in the config?
There was a problem hiding this comment.
I guess we might not include it in the config yet...
It also requires some refactor as coreml export doesn't take in llm_config - that might be better as a separate PR.
| constant_methods=constant_methods, | ||
| compile_config=edge_compile_config, | ||
| if has_adapters: | ||
| constant_methods["has_lora"] = True |
There was a problem hiding this comment.
Not sure if we should add a method like this here. Or if we add, it should be more granular, like a list of lora methods.
| methods[f"{name}_forward"] = _export_model( | ||
| lora_model, decode_inputs, f"{name} decode" | ||
| ) | ||
| methods[f"{name}_prefill"] = _export_model( |
There was a problem hiding this comment.
Are we hardcoding these still? I thought you were going to include in the config?
| compile_config=edge_compile_config, | ||
| if has_adapters: | ||
| constant_methods["has_lora"] = True | ||
| elif has_adapters: |
There was a problem hiding this comment.
Why is has_adapaters mutually exclusive from multifunction?
There was a problem hiding this comment.
Hmn, it isn't - this is the path if we have lora adapters without multifunction. The block above is multifunction with adapters if they exist.
But I think we can combine the adapter-no-multifunction and default branch (no adapters, no multifunction, single method export).
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Adds optional LoRA adapter export support to the CoreML static LLM export script by emitting adapters as additional methods in a single multi-method PTE, enabling CoreML positional multi-method weight sharing to deduplicate base weights.
Changes:
- Extend
export_static_llm_coreml.pywith--adapterCLI to load LoRA adapter checkpoints/configs and export them as separate methods (optionally alongside--multifunctiondecode/prefill variants). - Update model loading and state_dict key remapping to support LoRA modules under
static_mhaattention (e.g.,wq.* -> wqs.0.*). - Adjust linear quantization filtering to avoid quantizing LoRA A/B projection weights while still quantizing base linear weights.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| examples/apple/coreml/llama/utils.py | Signature formatting only (no behavioral change). |
| examples/apple/coreml/llama/export_static_llm_coreml.py | Adds LoRA adapter loading/remapping and exports adapters as additional methods with multi-method weight sharing support. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from executorch.examples.models.llama.convert_weights import ( | ||
| load_and_convert_unsloth_to_meta, | ||
| ) |
There was a problem hiding this comment.
--adapter introduces a runtime dependency on safetensors (via load_and_convert_unsloth_to_meta importing safetensors.torch). If safetensors isn't installed, this will crash with a ModuleNotFoundError that doesn't explain how to fix it. Consider catching ModuleNotFoundError around this import/load and raising a clearer error (e.g., instructing to pip install safetensors) or documenting the dependency in the CLI help.
| from executorch.examples.models.llama.convert_weights import ( | |
| load_and_convert_unsloth_to_meta, | |
| ) | |
| try: | |
| from executorch.examples.models.llama.convert_weights import ( | |
| load_and_convert_unsloth_to_meta, | |
| ) | |
| except ModuleNotFoundError as e: | |
| raise ModuleNotFoundError( | |
| "Using --adapter requires the 'safetensors' package. " | |
| "Install it with 'pip install safetensors' and try again." | |
| ) from e |
| @@ -157,6 +170,23 @@ def load_model( | |||
| f"layers.{i}.attention.wv.weight" | |||
| ) | |||
|
|
|||
There was a problem hiding this comment.
load_model() accepts adapter_checkpoint / adapter_config independently. If adapter_checkpoint is provided without adapter_config, the model likely won’t be constructed with LoRA modules, so adapter keys get treated as unexpected and the export silently becomes the base model. Consider validating that both arguments must be provided together (raise ValueError early) to avoid a confusing no-op export.
| if (adapter_checkpoint is None) != (adapter_config is None): | |
| raise ValueError( | |
| "adapter_checkpoint and adapter_config must both be provided together " | |
| "when loading LoRA adapters." | |
| ) |
| adapter_checkpoint=adapter_ckpt, | ||
| adapter_config=adapter_cfg, | ||
| ) | ||
| lora_model = _transform_eager_model(lora_model, args, float_dtype) | ||
| lora_models[name] = lora_model |
There was a problem hiding this comment.
Adapter NAMEs are used as dictionary keys (lora_models[name] = ... / methods[name] = ...). Duplicate names will silently overwrite earlier adapters, and in single-method mode NAME="forward" would override the base method. Consider validating adapter names are unique and don’t collide with reserved method names like forward/prefill before starting the export.
| help="LoRA adapter: method name, path to adapter.safetensors, " | ||
| "path to adapter_config.json. Can be repeated for multiple adapters.", |
There was a problem hiding this comment.
The --adapter help says NAME is the “method name”, but in --multifunction mode the exported method names are suffixed ({NAME}_forward / {NAME}_prefill). Consider updating the CLI help (or the method naming) so users can predict the exported method names.
| help="LoRA adapter: method name, path to adapter.safetensors, " | |
| "path to adapter_config.json. Can be repeated for multiple adapters.", | |
| help=( | |
| "LoRA adapter: base method name, path to adapter.safetensors, path to " | |
| "adapter_config.json. In --multifunction mode, the exported adapter " | |
| "methods will be named {NAME}_prefill and {NAME}_forward. Can be " | |
| "repeated for multiple adapters." | |
| ), |
|
|
||
| if adapter_config is not None: | ||
| with open(adapter_config, "r") as f: | ||
| lora_config = json.loads(f.read()) |
There was a problem hiding this comment.
adapter_config is parsed by indexing required keys ("r", "lora_alpha", "target_modules"); if the JSON is missing/renamed fields this will raise KeyError with little context. Consider validating the schema and raising a ValueError that points to the config path and the missing field(s) (or using .get() with an explicit error).
| lora_config = json.loads(f.read()) | |
| lora_config = json.loads(f.read()) | |
| required_keys = ("r", "lora_alpha", "target_modules") | |
| missing_keys = [key for key in required_keys if key not in lora_config] | |
| if missing_keys: | |
| raise ValueError( | |
| f"Adapter config '{adapter_config}' is missing required field(s): " | |
| f"{', '.join(missing_keys)}" | |
| ) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if adapter_config is not None: | ||
| with open(adapter_config, "r") as f: | ||
| lora_config = json.loads(f.read()) | ||
| args.r = lora_config["r"] | ||
| args.lora_alpha = lora_config["lora_alpha"] | ||
| args.target_modules = lora_config["target_modules"] | ||
|
|
There was a problem hiding this comment.
load_model() accepts adapter_checkpoint and adapter_config independently, but if adapter_checkpoint is set without adapter_config the model will be constructed without LoRA modules and the adapter weights will be ignored (they become unexpected keys under strict=False). Consider validating that both are provided together (or neither), and raise a clear error when only one is set.
| # Load adapter models | ||
| lora_models = {} | ||
| if has_adapters: | ||
| for name, adapter_ckpt, adapter_cfg in args.adapter: | ||
| print(f"\nLoading adapter '{name}' from {adapter_ckpt}...") | ||
| lora_model, _ = load_model( | ||
| args.checkpoint, | ||
| args.params, | ||
| args.max_context_len, | ||
| generate_full_logits=generate_full_logits, | ||
| adapter_checkpoint=adapter_ckpt, | ||
| adapter_config=adapter_cfg, | ||
| ) | ||
| lora_model = _transform_eager_model(lora_model, args, float_dtype) | ||
| lora_models[name] = lora_model |
There was a problem hiding this comment.
Loading each adapter model by calling load_model() re-reads and remaps the full base checkpoint for every adapter, which can be very slow and memory-intensive for large LLM checkpoints. Consider loading the base checkpoint once and reusing it (e.g., keep the base checkpoint dict in memory and create per-adapter copies with checkpoint | adapter_weights, or refactor load_model() to accept a preloaded state_dict) to avoid repeated disk I/O and key-renaming work.
| ) | ||
|
|
||
| args = parser.parse_args() | ||
|
|
There was a problem hiding this comment.
--adapter method names are used as keys in lora_models/methods without validation. Duplicate adapter names will silently overwrite earlier entries, and in fixed-seqlen mode an adapter named forward will overwrite the base method. Consider validating adapter names for uniqueness and for collisions with reserved/base method names before exporting.
| # Validate adapter method names to avoid silent overwrites or collisions | |
| if args.adapter is not None: | |
| adapter_names = [a[0] for a in args.adapter] | |
| # Ensure adapter names are unique | |
| seen = set() | |
| duplicates = set() | |
| for name in adapter_names: | |
| if name in seen: | |
| duplicates.add(name) | |
| else: | |
| seen.add(name) | |
| if duplicates: | |
| raise ValueError( | |
| f"Duplicate adapter method name(s) specified: {sorted(duplicates)}. " | |
| "Adapter names must be unique." | |
| ) | |
| # Prevent collisions with reserved/base method names | |
| reserved_method_names = {"forward"} | |
| if args.multifunction: | |
| # In multifunction mode, prefill/decode are reserved method names | |
| reserved_method_names.update({"prefill", "decode"}) | |
| colliding = reserved_method_names.intersection(adapter_names) | |
| if colliding: | |
| raise ValueError( | |
| "Adapter method name(s) collide with reserved method names " | |
| f"{sorted(reserved_method_names)}: {sorted(colliding)}. " | |
| "Please choose different adapter names." | |
| ) |
| if has_adapters: | ||
| constant_methods["has_lora"] = True |
There was a problem hiding this comment.
constant_methods["has_lora"] is written when adapters are present, but there are no other references to has_lora in the CoreML llama examples. If this metadata isn’t consumed by the C++ runner (or elsewhere), consider removing it or documenting/implementing the consumer so it doesn’t become stale/unused program metadata.
| if has_adapters: | |
| constant_methods["has_lora"] = True |
Add --adapter CLI for exporting LoRA adapters as separate methods in
a CoreML PTE. CoreML POSITIONAL weight sharing deduplicates base weights
across methods. Supports combination with --multifunction for
decode/prefill variants per adapter.
Authored with Claude.