Skip to content

解决 #138 的错误html识别报错#141

Open
Mr-xn wants to merge 5 commits intoxxnuo:mainfrom
Mr-xn:main
Open

解决 #138 的错误html识别报错#141
Mr-xn wants to merge 5 commits intoxxnuo:mainfrom
Mr-xn:main

Conversation

@Mr-xn
Copy link
Copy Markdown

@Mr-xn Mr-xn commented Apr 2, 2026

我也遇到了同样的问题 #138 经过vibe coding 和自测 issues 里的翻译文本 目前正常。此pr 仅供参考

Copilot AI and others added 5 commits April 2, 2026 05:05
Agent-Logs-Url: https://github.com/Mr-xn/MTranServer/sessions/f51bebab-7d96-4c04-9ff4-9cccaa57cc40

Co-authored-by: Mr-xn <18260135+Mr-xn@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Mr-xn/MTranServer/sessions/f51bebab-7d96-4c04-9ff4-9cccaa57cc40

Co-authored-by: Mr-xn <18260135+Mr-xn@users.noreply.github.com>
…ability

Harden HTML translation against malformed advisory-style input
Agent-Logs-Url: https://github.com/Mr-xn/MTranServer/sessions/a753b62b-24bf-4ff5-9abe-249ffbff8fed

Co-authored-by: Mr-xn <18260135+Mr-xn@users.noreply.github.com>
Add manual target selection for Docker image publishing
Copilot AI review requested due to automatic review settings April 2, 2026 07:21
@Mr-xn Mr-xn changed the title 解决 https://github.com/xxnuo/MTranServer/issues/138 的错误html识别报错 解决 #138 的错误html识别报错 Apr 2, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to address issue #138 where “HTML mode” translation can fail on inputs that contain HTML-like characters (e.g., <=, &something[0]) but are not actually valid HTML, causing WASM/HTML parsing aborts.

Changes:

  • Add HTML-likeness detection + stronger HTML sanitization in TranslationEngine, and avoid forcing HTML mode for plain-text inputs.
  • Add a service-level retry path that falls back to plain text when an “HTML parse error” is detected, and refactor engine teardown into destroyEngine.
  • Add Bun tests covering the new HTML/plain-text behavior; update Docker publishing workflow to support selectable publish targets on manual runs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/core/engine.ts Detects whether text “looks like HTML”, sanitizes stray &/<, adjusts placeholder tagging behavior, and wraps certain HTML-mode abort errors as “HTML parse error”.
src/services/engine.ts Adds destroyEngine() helper and retries HTML parse failures as plain text.
tests/engine-html.test.ts Adds regression tests for plain-text fallback and HTML sanitization behavior.
.github/workflows/docker.yml Adds workflow_dispatch input to choose publish target(s) and conditionally logs into registries / sets images.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/services/engine.ts
Comment on lines +49 to +59
function destroyEngine(fromLang: string, toLang: string, reason: string) {
const key = `${fromLang}-${toLang}`;
const info = engines.get(key);
if (!info) {
return;
}

logger.warn(`${reason}: ${key}`);
info.engine.destroy();
engines.delete(key);
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destroyEngine deletes the engine entry but doesn’t clear the idle timer stored on EngineInfo. That timeout will still fire later, keeping the engine object alive longer than necessary and potentially logging a second “idle timeout” shutdown after a crash/parse failure. Consider clearTimeout(info.timer) (and unset it) before destroying/deleting the engine.

Copilot uses AI. Check for mistakes.
Comment thread src/services/engine.ts
Comment on lines 176 to +194
@@ -174,16 +187,15 @@ async function translateSingleLanguageText(
error.message.includes('abort')
);

if (isHTMLParseError) {
logger.warn(`HTML parsing failed during translation (${fromLang}->${toLang}), retrying as plain text`);
destroyEngine(fromLang, toLang, 'Destroying engine after HTML parse failure');
return translateSingleLanguageText(fromLang, toLang, text, false);
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isHTMLParseError is checked before isMemoryError, and core engine wraps any HTML-mode abort/aborted as HTML parse error. This means a genuine WASM abort/memory failure in HTML mode will be treated as a parse failure and immediately fall back to plain-text translation (skipping the retry loop). Consider tightening the parse-error signal (e.g., error code/name) or checking memory errors first when the message contains both.

Copilot uses AI. Check for mistakes.
Comment thread src/core/engine.ts
Comment on lines 232 to +238
} catch (error: any) {
console.error(`WASM Error Context: TextLength=${cleanedText.length}, Options=${JSON.stringify(options)}`);
if (options.html && error?.message && /aborted|abort/i.test(error.message)) {
const wrappedError = new Error(`HTML parse error: ${error.message}`);
(wrappedError as Error & { cause?: unknown }).cause = error;
throw wrappedError;
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML error wrapping triggers for any abort/aborted when options.html is true. Since abort is also used for non-HTML failures (e.g., OOM/unreachable), this can mislabel unrelated failures as “HTML parse error” and change recovery behavior upstream. Consider narrowing this condition to a more specific HTML-parse signature from Bergamot, or attach a structured flag (name/code) instead of relying on the substring “abort”.

Copilot uses AI. Check for mistakes.
Comment thread src/core/engine.ts
Comment on lines 391 to 411
private _tagPlaceholders(
text: string,
htmlEnabled: boolean = false
): { taggedText: string; replacements: string[]; forceHtml: boolean } {
const placeholderRegex = /(\{\d+\}|\[\d+\])/g;
if (!placeholderRegex.test(text)) {
return { taggedText: text, replacements: [], forceHtml: false };
}

const replacements: string[] = [];
const taggedText = text.replace(placeholderRegex, (match) => {
const index = replacements.length;
replacements.push(match);
return `<mt${index} />`;
if (htmlEnabled) {
return `<mt${index} />`;
}
return `MT_PLACEHOLDER_${index}_TOKEN`;
});

return { taggedText, replacements, forceHtml: !htmlEnabled };
return { taggedText, replacements, forceHtml: false };
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_tagPlaceholders still returns forceHtml but always sets it to false, so forceHtml can no longer influence effectiveOptions and is now dead state. Consider removing forceHtml from the return type and the translate() flow (or reintroducing the behavior intentionally) to reduce confusion.

Copilot uses AI. Check for mistakes.
Comment on lines 6 to +16
workflow_dispatch:
inputs:
publish_target:
description: Publish target
required: true
type: choice
default: both
options:
- docker
- github
- both
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow adds a new workflow_dispatch input and conditional publishing logic, which is unrelated to the PR’s stated purpose (fixing HTML parse/recognition errors). Consider splitting the workflow change into a separate PR (or update the PR description to explicitly include CI/publishing changes) to keep the fix reviewable and revertable independently.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants