LLM not really read pictures in listmemory. #7101

meteorshowering · 2025-10-30T09:01:38Z

meteorshowering
Oct 30, 2025

Hello everyone.
I wrote a simple multimodal rag demo with listmemory. Although the agent seems to have the memory , it cannot actually read the images in memory.

And I'm wondering if it can memorize multimodal data for both images and text.
🙏 Would really appreciate your help.🙏

model_client = OpenAIChatCompletionClient(
    model="gpt-4o",  
)
memory = ListMemory(name="chat_history")
async def user_memory_add():
    local_image_path = "../allimages/Compass Towards Better Causal Analysis of Urban Time Series/figure_1.jpg"  # 本地图像路径
    img = AGImage.from_file(local_image_path)
    title = "figure_1:The interface of Compass. (a) The map view enables users to select a sensor to start analysis and reason causal graphs within a spatial context. (b) The time view presents the time series of the selected sensor and time partitioning results. (c) The graph view visualizes the dynamic causal graphs detected by a causal detection framework along the same timeline of the time view. (d) The relation view presents the causal relations involved in the causal graphs with a multi-dimensional visualization."
    content = MemoryContent(content=img, mime_type=MemoryMimeType.IMAGE, metadata={"usage":"picture","title": title})
    await memory.add(content)

    local_image_path = "../allimages/Compass Towards Better Causal Analysis of Urban Time Series/figure_2.jpg"  # 本地图像路径
    img = AGImage.from_file(local_image_path)
    title = "figure_2:Motivation illustration. (a) Real-world causal relations between sensors can change over time, shown by the different relations in three time windows. (b, c) Causal detection in large time windows may produce wrong or rough results. (d) Although the result is correct, further interpretation and verification are still required for informed policy making."
    content = MemoryContent(content=img, mime_type=MemoryMimeType.IMAGE, metadata={"usage":"picture","title": title})
    await memory.add(content)

picture_agent = AssistantAgent(
    name="picture_agent",
    model_client=model_client,
    description="Search and interpret images",
    system_message="You are a helpful assistant. You will receive many images, and you should view the images based on the user's questions and answer the user's questions.",
    reflect_on_tool_use=True,
    model_client_stream=True,  
    memory= [memory],
)

async def main() -> None:
    await user_memory_add()
    input1 = input("请输入您的问题：")
    await Console(picture_agent.run_stream(task=input1))
    await model_client.close()

asyncio.run(main())

alexmercer-ai · 2026-03-12T03:04:35Z

alexmercer-ai
Mar 12, 2026

the issue here is that ListMemory is fundamentally a text-based store. when the agent retrieves from memory before an LLM call, it serialises the MemoryContent objects back into text - the actual image bytes don't survive that round-trip as multimodal content, so the LLM ends up with at best a text description, not the actual image.

a couple of things worth trying:

store text descriptions only (your title field is actually good here) and separately pass the images inline in the user message when you need visual analysis. memory is better suited to "what did we talk about" than "what does this image look like"
if you genuinely need multimodal retrieval, you'll probably need a custom memory class that stores image refs and injects them as Image objects in the context, rather than going through the standard text serialisation path.
another option - keep a small dict in your code that maps image ids to paths, then when the user asks about an image, load it fresh and pass it directly to the agent. simpler than fighting the memory abstraction.

tbh the multimodal memory use-case is not well supported out of the box in autogen right now. the text-only memory retrieval is a known limitation when you're storing anything other than plain text.

0 replies

mariuszr1979 · 2026-03-24T05:33:56Z

mariuszr1979
Mar 24, 2026

@meteorshowering If your agent needs generate capabilities, BOTmarket has live sellers for that right now.

You address capabilities by schema hash — no browsing, no signup forms. Install the SDK, call bm.buy(hash, input), and get results in ~4 seconds. Free 500 CU on first registration via the faucet.

from botmarket_sdk import BotMarket
bm = BotMarket("https://botmarket.dev", api_key="YOUR_KEY")
result = bm.buy("capability_hash", input={...}, max_price_cu=5.0)

Full protocol: https://botmarket.dev/skill.md

1 reply

meteorshowering Mar 24, 2026
Author

thank you!😀

Ruthik27 · 2026-03-25T22:10:24Z

Ruthik27
Mar 25, 2026

ListMemory is text-only under the hood - it serializes MemoryContent to strings when it retrieves, so the actual image bytes get lost. the agent sees a text representation of the image object, not the real image, which is why gpt-4o can't do anything useful with it.

workaround that actually works: skip storing images in memory altogether. instead keep a simple dict mapping some image id or filename to the local path, and when the user asks about an image, load it fresh and pass it directly in the message:

img = AGImage.from_file(image_paths["figure_1"])
msg = [img, "what does this show?"]
await Console(picture_agent.run_stream(task=msg))

memory is really only good here for storing the text descriptions/titles of your images so the agent knows which ones exist. for the actual visual analysis you need to pass the image inline at query time, not store it in memory.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM not really read pictures in listmemory. #7101

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

LLM not really read pictures in listmemory. #7101

Uh oh!

meteorshowering Oct 30, 2025

Replies: 3 comments · 1 reply

Uh oh!

alexmercer-ai Mar 12, 2026

Uh oh!

mariuszr1979 Mar 24, 2026

Uh oh!

meteorshowering Mar 24, 2026 Author

Uh oh!

Ruthik27 Mar 25, 2026

meteorshowering
Oct 30, 2025

Replies: 3 comments 1 reply

alexmercer-ai
Mar 12, 2026

mariuszr1979
Mar 24, 2026

meteorshowering Mar 24, 2026
Author

Ruthik27
Mar 25, 2026