解决 #138 的错误html识别报错 by Mr-xn · Pull Request #141 · xxnuo/MTranServer

Mr-xn · 2026-04-02T07:21:56Z

我也遇到了同样的问题 #138 经过vibe coding 和自测 issues 里的翻译文本目前正常。此pr 仅供参考

Agent-Logs-Url: https://github.com/Mr-xn/MTranServer/sessions/f51bebab-7d96-4c04-9ff4-9cccaa57cc40 Co-authored-by: Mr-xn <18260135+Mr-xn@users.noreply.github.com>

…ability Harden HTML translation against malformed advisory-style input

Agent-Logs-Url: https://github.com/Mr-xn/MTranServer/sessions/a753b62b-24bf-4ff5-9abe-249ffbff8fed Co-authored-by: Mr-xn <18260135+Mr-xn@users.noreply.github.com>

Add manual target selection for Docker image publishing

Copilot

Pull request overview

This PR aims to address issue #138 where “HTML mode” translation can fail on inputs that contain HTML-like characters (e.g., <=, &something[0]) but are not actually valid HTML, causing WASM/HTML parsing aborts.

Changes:

Add HTML-likeness detection + stronger HTML sanitization in TranslationEngine, and avoid forcing HTML mode for plain-text inputs.
Add a service-level retry path that falls back to plain text when an “HTML parse error” is detected, and refactor engine teardown into destroyEngine.
Add Bun tests covering the new HTML/plain-text behavior; update Docker publishing workflow to support selectable publish targets on manual runs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`src/core/engine.ts`	Detects whether text “looks like HTML”, sanitizes stray `&`/`<`, adjusts placeholder tagging behavior, and wraps certain HTML-mode abort errors as “HTML parse error”.
`src/services/engine.ts`	Adds `destroyEngine()` helper and retries HTML parse failures as plain text.
`tests/engine-html.test.ts`	Adds regression tests for plain-text fallback and HTML sanitization behavior.
`.github/workflows/docker.yml`	Adds `workflow_dispatch` input to choose publish target(s) and conditionally logs into registries / sets images.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T07:26:02Z

+function destroyEngine(fromLang: string, toLang: string, reason: string) {
+  const key = `${fromLang}-${toLang}`;
+  const info = engines.get(key);
+  if (!info) {
+    return;
+  }
+
+  logger.warn(`${reason}: ${key}`);
+  info.engine.destroy();
+  engines.delete(key);
+}


destroyEngine deletes the engine entry but doesn’t clear the idle timer stored on EngineInfo. That timeout will still fire later, keeping the engine object alive longer than necessary and potentially logging a second “idle timeout” shutdown after a crash/parse failure. Consider clearTimeout(info.timer) (and unset it) before destroying/deleting the engine.

Copilot · 2026-04-02T07:26:03Z

@@ -174,16 +187,15 @@ async function translateSingleLanguageText(
        error.message.includes('abort')
      );

+      if (isHTMLParseError) {
+        logger.warn(`HTML parsing failed during translation (${fromLang}->${toLang}), retrying as plain text`);
+        destroyEngine(fromLang, toLang, 'Destroying engine after HTML parse failure');
+        return translateSingleLanguageText(fromLang, toLang, text, false);
+      }


isHTMLParseError is checked before isMemoryError, and core engine wraps any HTML-mode abort/aborted as HTML parse error. This means a genuine WASM abort/memory failure in HTML mode will be treated as a parse failure and immediately fall back to plain-text translation (skipping the retry loop). Consider tightening the parse-error signal (e.g., error code/name) or checking memory errors first when the message contains both.

Copilot · 2026-04-02T07:26:03Z

    } catch (error: any) {
      console.error(`WASM Error Context: TextLength=${cleanedText.length}, Options=${JSON.stringify(options)}`);
+      if (options.html && error?.message && /aborted|abort/i.test(error.message)) {
+        const wrappedError = new Error(`HTML parse error: ${error.message}`);
+        (wrappedError as Error & { cause?: unknown }).cause = error;
+        throw wrappedError;
+      }


The HTML error wrapping triggers for any abort/aborted when options.html is true. Since abort is also used for non-HTML failures (e.g., OOM/unreachable), this can mislabel unrelated failures as “HTML parse error” and change recovery behavior upstream. Consider narrowing this condition to a more specific HTML-parse signature from Bergamot, or attach a structured flag (name/code) instead of relying on the substring “abort”.

Copilot · 2026-04-02T07:26:03Z

  private _tagPlaceholders(
    text: string,
    htmlEnabled: boolean = false
  ): { taggedText: string; replacements: string[]; forceHtml: boolean } {
    const placeholderRegex = /(\{\d+\}|\[\d+\])/g;
    if (!placeholderRegex.test(text)) {
      return { taggedText: text, replacements: [], forceHtml: false };
    }

    const replacements: string[] = [];
    const taggedText = text.replace(placeholderRegex, (match) => {
      const index = replacements.length;
      replacements.push(match);
-      return `<mt${index} />`;
+      if (htmlEnabled) {
+        return `<mt${index} />`;
+      }
+      return `MT_PLACEHOLDER_${index}_TOKEN`;
    });

-    return { taggedText, replacements, forceHtml: !htmlEnabled };
+    return { taggedText, replacements, forceHtml: false };
  }


_tagPlaceholders still returns forceHtml but always sets it to false, so forceHtml can no longer influence effectiveOptions and is now dead state. Consider removing forceHtml from the return type and the translate() flow (or reintroducing the behavior intentionally) to reduce confusion.

Copilot · 2026-04-02T07:26:04Z

  workflow_dispatch:
+    inputs:
+      publish_target:
+        description: Publish target
+        required: true
+        type: choice
+        default: both
+        options:
+          - docker
+          - github
+          - both


This workflow adds a new workflow_dispatch input and conditional publishing logic, which is unrelated to the PR’s stated purpose (fixing HTML parse/recognition errors). Consider splitting the workflow change into a separate PR (or update the PR description to explicitly include CI/publishing changes) to keep the fix reviewable and revertable independently.

Copilot AI and others added 5 commits April 2, 2026 05:05

fix: harden html translation fallback

4487383

Agent-Logs-Url: https://github.com/Mr-xn/MTranServer/sessions/f51bebab-7d96-4c04-9ff4-9cccaa57cc40 Co-authored-by: Mr-xn <18260135+Mr-xn@users.noreply.github.com>

test: cover html advisory fallback

b02ebf2

Agent-Logs-Url: https://github.com/Mr-xn/MTranServer/sessions/f51bebab-7d96-4c04-9ff4-9cccaa57cc40 Co-authored-by: Mr-xn <18260135+Mr-xn@users.noreply.github.com>

Merge pull request #1 from Mr-xn/copilot/analyze-file-download-vulner…

1c7c366

…ability Harden HTML translation against malformed advisory-style input

ci: add manual docker publish targets

5fbb70c

Agent-Logs-Url: https://github.com/Mr-xn/MTranServer/sessions/a753b62b-24bf-4ff5-9abe-249ffbff8fed Co-authored-by: Mr-xn <18260135+Mr-xn@users.noreply.github.com>

Merge pull request #2 from Mr-xn/copilot/update-docker-publish-action

391775c

Add manual target selection for Docker image publishing

Copilot AI review requested due to automatic review settings April 2, 2026 07:21

Mr-xn changed the title ~~解决 https://github.com/xxnuo/MTranServer/issues/138 的错误html识别报错~~ 解决 #138 的错误html识别报错 Apr 2, 2026

Copilot started reviewing on behalf of Mr-xn April 2, 2026 07:22 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

解决 #138 的错误html识别报错#141

解决 #138 的错误html识别报错#141
Mr-xn wants to merge 5 commits intoxxnuo:mainfrom
Mr-xn:main

Mr-xn commented Apr 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Mr-xn commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mr-xn commented Apr 2, 2026 •

edited

Loading