-
-
Notifications
You must be signed in to change notification settings - Fork 471
Pull requests: Blaizzy/mlx-vlm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Expose presence_penalty, frequency_penalty, and per-penalty context_size on the server API
#1023
opened Apr 14, 2026 by
esaruoho
Loading…
refactor: improve model loading and resource handling in utils.py
#1019
opened Apr 13, 2026 by
SyedaAnshrahGillani
Loading…
server: indicate finish reason properly when model made a tool call.
#1014
opened Apr 12, 2026 by
viktike
Contributor
Loading…
Resolve no images crash for qwen3_vl and qwen3_vl_moe generate call
#1013
opened Apr 11, 2026 by
urimem
Loading…
perf: close 5.5% decode gap vs mlx_lm.server on streaming chat endpoint
#1012
opened Apr 11, 2026 by
chilang
Loading…
fix: use OpenAI chat-completion field names in /chat/completions usage
#1009
opened Apr 10, 2026 by
chilang
Loading…
fix: replace NaN from all-masked SDPA padding rows in Gemma 4 vision
#1006
opened Apr 10, 2026 by
fabiopili
Loading…
4 tasks done
feat: OpenAI Responses API with structured tool calling and multi-turn support
#996
opened Apr 9, 2026 by
eloe
Loading…
feat: prompt prefix caching with TTL eviction and TurboQuant support
#995
opened Apr 9, 2026 by
eloe
Loading…
fix: return finish_reason=tool_calls when tool calls detected
#990
opened Apr 9, 2026 by
eloe
Loading…
Add TriAttention KV cache compression
#985
opened Apr 9, 2026 by
Blaizzy
Owner
Loading…
3 of 4 tasks
fix(trainer): pass images to prepare_inputs for Gemma, Qwen, and SmolVLM
#979
opened Apr 8, 2026 by
ukint-vs
Loading…
fix(trainer): flatten input_ids before measuring length in batch padding
#977
opened Apr 8, 2026 by
ukint-vs
Loading…
3 tasks done
Strip tool-call markup from streamed delta.content
#974
opened Apr 7, 2026 by
michaelstingl
Contributor
•
Draft
Previous Next
ProTip!
Updated in the last three days: updated:>2026-04-11.