CLI: send screenshots from `studio code` to Telegram remote sessions by gcsecsey · Pull Request #3272 · Automattic/studio

gcsecsey · 2026-04-28T15:55:20Z

Related issues

Related to STU-1652
Builds on the Telegram remote-session bridge from apps/cli: Telegram remote-session bridge for studio code (PoC) #3196
Complements the PR 212845-ghe-Automattic/wpcom

How AI was used in this PR

Claude wrote the bulk of the implementation and the tests, on top of a handoff prompt I drafted for the wpcom backend agent. I reviewed and tested every change end-to-end against a sandbox before opening the PR; the manual test plan is in the testing instructions below.

Proposed Changes

The Telegram remote-session bridge is currently text-only. When the agent finishes a visible task, the user gets a prose summary but no image. This PR lets the local agent deliver screenshots inline:

New share_screenshot tool. Captures a 16:9 above-the-fold view of a URL by default and emits a media.share JSON event. fullPage: true is opt-in for the rare case where the user wants the whole scroll length. take_screenshot stays unchanged as the model-internal reasoning tool.
Remote-session controller (turn-runner + poll-loop) collects media.share events from the spawned studio code --json child and posts each photo before the text reply.
respondMessage now picks transport based on payload. Photo present means multipart/form-data with a photo file part (matches the wpcom contract); text-only stays on the existing JSON path.
Spawned child gets STUDIO_REMOTE_SESSION=1 so the system prompt knows to keep replies short, deliver visible work via share_screenshot, follow up with a "Want me to publish this as a preview site?" line, and stop fabricating "gist stored / preview link saved" epilogues that aren't backed by any actual storage.

Testing Instructions

Prerequisites: be an Automattician (backend gates on is_automattician()), be logged in via studio auth login so the bearer falls through from ~/.studio/shared.json, and have a Telegram bot routing into your account.

Build the CLI: npm run cli:build
Start the bridge: node apps/cli/dist/cli/main.mjs code --remote-session
In a second terminal, tail the log: tail -F ~/.studio/remote-session.log
From Telegram, send: "send to my local agent: take a screenshot of and show me"
Verify in Telegram:
- A 1280x720 above-the-fold screenshot arrives inline (not the full-page strip).
- The caption is a short one-liner that does NOT mention "full page" or "viewport".
- A follow-up text message asks about publishing a preview site.
- No "Screenshot shared with the user" progress message before the photo.
- No "gist stored" or "preview link saved" epilogue.
Verify the text-only regression by sending a non-visual request like "list my local sites": text reply arrives, no photo.
Optional: ask explicitly for "the full page" and confirm share_screenshot is called with fullPage: true and the long capture is delivered.
Optional backend-direct check: after the routing has set the auth key, POST a multipart photo with curl as documented in the wpcom PR; should return { "success": true, "photo_sent": true }.

Pre-merge Checklist

Have you checked for TypeScript, React or other console errors?

…ions Adds a `share_screenshot` tool that captures a 16:9 above-the-fold view of a URL and emits a `media.share` JSON event the remote-session controller forwards to Telegram via the existing `/local-agent-respond` endpoint as multipart/form-data with a `photo` part. The agent uses this to deliver visible results back to the user; `take_screenshot` stays internal for visual reasoning. Also threads `STUDIO_REMOTE_SESSION=1` to the spawned child so the system prompt can favor short, visual replies and steers the agent away from fabricating "gist stored / preview link saved" epilogues.

Copilot

Pull request overview

This PR extends the existing Telegram remote-session bridge for studio code to support inline screenshot delivery by introducing a new user-facing screenshot tool and a new JSON event type that the remote-session controller forwards as Telegram photos.

Changes:

Add a new share_screenshot tool that emits a media.share JSON event (plus returns the image to the agent) for user-visible screenshot delivery.
Extend the remote-session controller to collect media.share events and post photos before the text reply.
Update Telegram response transport to use multipart/form-data when a photo is present (JSON for text-only), plus add remote-session-specific system prompt guidance toggled by STUDIO_REMOTE_SESSION=1.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tools/common/ai/tools.ts	Adds display name + URL detail extraction for the new `share_screenshot` tool.
tools/common/ai/json-events.ts	Introduces `MediaShareEvent` and extends `JsonEvent` union with `media.share`.
apps/cli/ai/tools.ts	Implements `share_screenshot`, refactors screenshot capture into `captureScreenshotPng`, and registers the new tool.
apps/cli/ai/system-prompt.ts	Adds Telegram remote-session guidance addendum (including `share_screenshot` usage expectations).
apps/cli/ai/agent.ts	Enables the remote-session system prompt addendum when `STUDIO_REMOTE_SESSION=1`.
apps/cli/remote-session/turn-runner.ts	Collects `media.share` events from the subprocess and returns them in `TurnOutcome`.
apps/cli/remote-session/poll-loop.ts	Posts collected media shares to Telegram before posting the text reply; avoids “no result” warning when media exists.
apps/cli/remote-session/telegram-client.ts	Updates `respondMessage` to support multipart photo uploads + caption; logs partial failures without throwing.
apps/cli/remote-session/tests/*	Adds/updates unit tests for media collection, ordering, and multipart photo transport behavior.
apps/cli/remote-session/tests/fixtures/mock-studio-code.mjs	Adds a `media-share` fixture scenario emitting `media.share` events.
apps/cli/ai/tests/system-prompt.test.ts	New tests verifying remote-session prompt addendum is included/excluded appropriately.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T10:13:14Z

+		// Scroll through the page to trigger lazy-loaded images, then wait
+		// for all images to finish loading (with a timeout so we don't hang
+		// on images that never settle).
+		await page.evaluate( async () => {
+			const delay = ( ms: number ) =>
+				new Promise< void >( ( resolve ) => setTimeout( resolve, ms ) );
+			const scrollHeight = document.body.scrollHeight;
+			const viewportHeight = window.innerHeight;
+			for ( let y = 0; y < scrollHeight; y += viewportHeight ) {
+				window.scrollTo( 0, y );
+				await delay( 100 );
+			}
+			window.scrollTo( 0, 0 );
+
+			const timeout = new Promise< void >( ( resolve ) => setTimeout( resolve, 5000 ) );
+			const allImages = Promise.all(
+				Array.from( document.images )
+					.filter( ( img ) => ! img.complete )
+					.map(
+						( img ) =>
+							new Promise< void >( ( resolve ) => {
+								img.addEventListener( 'load', () => resolve() );
+								img.addEventListener( 'error', () => resolve() );
+							} )
+					)
+			);
+			await Promise.race( [ allImages, timeout ] );
+		} );


captureScreenshotPng() scrolls the entire document to the bottom to trigger lazy-loading even when options.fullPage is false. For the default share_screenshot above-the-fold use case this can add significant latency on long pages and undermines the goal of a quick viewport capture. Consider skipping the full-page scroll/wait logic when fullPage is false (or limiting it to the first viewport), and only doing the full scroll pass for full-page captures.

Suggested change

// Scroll through the page to trigger lazy-loaded images, then wait

// for all images to finish loading (with a timeout so we don't hang

// on images that never settle).

await page.evaluate( async () => {

const delay = ( ms: number ) =>

new Promise< void >( ( resolve ) => setTimeout( resolve, ms ) );

const scrollHeight = document.body.scrollHeight;

const viewportHeight = window.innerHeight;

for ( let y = 0; y < scrollHeight; y += viewportHeight ) {

window.scrollTo( 0, y );

await delay( 100 );

}

window.scrollTo( 0, 0 );

const timeout = new Promise< void >( ( resolve ) => setTimeout( resolve, 5000 ) );

const allImages = Promise.all(

Array.from( document.images )

.filter( ( img ) => ! img.complete )

.map(

( img ) =>

new Promise< void >( ( resolve ) => {

img.addEventListener( 'load', () => resolve() );

img.addEventListener( 'error', () => resolve() );

} )

)

);

await Promise.race( [ allImages, timeout ] );

} );

// For full-page screenshots, scroll through the entire page to trigger

// lazy-loaded images. For viewport screenshots, avoid the expensive

// full-document scroll and only wait on images intersecting the first

// viewport so above-the-fold captures stay fast.

await page.evaluate( async ( fullPage ) => {

const delay = ( ms: number ) =>

new Promise< void >( ( resolve ) => setTimeout( resolve, ms ) );

if ( fullPage ) {

const scrollHeight = document.body.scrollHeight;

const viewportHeight = window.innerHeight;

for ( let y = 0; y < scrollHeight; y += viewportHeight ) {

window.scrollTo( 0, y );

await delay( 100 );

}

window.scrollTo( 0, 0 );

}

const timeout = new Promise< void >( ( resolve ) => setTimeout( resolve, 5000 ) );

const pendingImages = Array.from( document.images ).filter( ( img ) => {

if ( img.complete ) {

return false;

}

if ( fullPage ) {

return true;

}

const rect = img.getBoundingClientRect();

return rect.bottom > 0 && rect.top < window.innerHeight;

} );

const allImages = Promise.all(

pendingImages.map(

( img ) =>

new Promise< void >( ( resolve ) => {

img.addEventListener( 'load', () => resolve(), { once: true } );

img.addEventListener( 'error', () => resolve(), { once: true } );

} )

)

);

await Promise.race( [ allImages, timeout ] );

}, options.fullPage );

Copilot · 2026-04-29T10:13:15Z

+				...logContext,
+				media_type: media.mediaType,
+				mime_type: media.mimeType,
+				bytes: media.dataBase64.length,


In the media.share debug log, bytes: media.dataBase64.length is reporting base64 character count, not decoded byte length. To avoid misleading telemetry, consider renaming this field (e.g., base64_chars) or computing the decoded byte length when you actually need “bytes”.

Suggested change

bytes: media.dataBase64.length,

base64_chars: media.dataBase64.length,

github-actions Bot assigned gcsecsey Apr 28, 2026

gcsecsey changed the title ~~apps/cli: send screenshots from studio code to Telegram remote sessions~~ CLI: send screenshots from studio code to Telegram remote sessions Apr 28, 2026

gcsecsey requested a review from Copilot April 29, 2026 10:06

Copilot started reviewing on behalf of gcsecsey April 29, 2026 10:07 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI: send screenshots from `studio code` to Telegram remote sessions#3272

CLI: send screenshots from `studio code` to Telegram remote sessions#3272
gcsecsey wants to merge 1 commit intotrunkfrom
gcsecsey/screenshot-support

gcsecsey commented Apr 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	bytes: media.dataBase64.length,
	base64_chars: media.dataBase64.length,

Conversation

gcsecsey commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issues

How AI was used in this PR

Proposed Changes

Testing Instructions

Pre-merge Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcsecsey commented Apr 28, 2026 •

edited

Loading