Skip to content
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 122 additions & 11 deletions packages/blaze/dombackend.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,27 +14,138 @@ DOMBackend.getContext = function() {
if (DOMBackend._context) {
return DOMBackend._context;
}
if ( DOMBackend._$jq.support.createHTMLDocument ) {
DOMBackend._context = document.implementation.createHTMLDocument( "" );

// Check if createHTMLDocument is supported directly
if (document.implementation && document.implementation.createHTMLDocument) {
DOMBackend._context = document.implementation.createHTMLDocument("");
Comment on lines +19 to +21
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions checking for createHTMLDocument support directly, but the original code checked via jQuery.support.createHTMLDocument which may have included additional browser-specific checks or polyfills. The direct document.implementation check might not account for all edge cases that jQuery handled. Consider verifying that this simplified check works correctly across all target browsers, especially older ones.

Copilot uses AI. Check for mistakes.

// Set the base href for the created document
// so any parsed elements with URLs
// are based on the document's URL (gh-2965)
const base = DOMBackend._context.createElement( "base" );
const base = DOMBackend._context.createElement("base");
base.href = document.location.href;
DOMBackend._context.head.appendChild( base );
DOMBackend._context.head.appendChild(base);
} else {
DOMBackend._context = document;
}
return DOMBackend._context;
}
DOMBackend.parseHTML = function (html) {
// Return an array of nodes.
//
// jQuery does fancy stuff like creating an appropriate
// container element and setting innerHTML on it, as well
// as working around various IE quirks.
return $jq.parseHTML(html, DOMBackend.getContext()) || [];

DOMBackend.parseHTML = function(html, context) {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseHTML function signature accepts a context parameter but never uses it. The original jQuery.parseHTML used the context parameter to determine the document context for parsing. While the current implementation calls DOMBackend.getContext() in the getContext function, the parseHTML function itself doesn't use the context parameter that's passed to it. This breaks the API contract and could cause issues if any code is passing a custom context.

Copilot uses AI. Check for mistakes.
// Handle all falsy values and non-strings
if (!html || typeof html !== 'string') {
return [];
}

context = context || DOMBackend.getContext();

// Return empty array for empty strings
if (html === "") {
return [];
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by !html above.

Is this implementation made by you or based on some existing one? It's not trivial (at least to me, because I'm not sure what "fancy stuff" is jQuery doing and what "IE quirks" are we talking about) and I'm wondering if there's maybe a different small and maintained library we could use instead.

Another difference I can see is that jQuery removes scripts by default (see keepScripts), which is not implemented here (and it's missing in the tests).

Copy link
Copy Markdown
Contributor Author

@harryadel harryadel Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by !html above.

Good catch!

Is this implementation made by you or based on some existing one? It's not trivial

It's a mix of everything really, as I've said this example was source of inspiration along with jQuery's implementation.

I'm wondering if there's maybe a different small and maintained library we could use instead.

There're definitely lots of libraries like htmlparser2 but again they're not drag n' drop kind of replacement. They require fine tuning to be backwards compatible. I'd love to be proven wrong if someone out there knows of a 1:1 alternative.

Another difference I can see is that jQuery removes scripts by default (see keepScripts), which is not implemented here (and it's missing in the tests).

You're right. That can be added.

All in all, this PR can be used as a spring board for further discussion to seek our best solution. There're native APIs out there that can integrated like createHTMLDocument and DOMParser. Or using NPM libraries along with other modifications.

Copy link
Copy Markdown
Contributor Author

@harryadel harryadel Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radekmie I made new modifications. Please recheck.

The only drawback is how garbage input gets handled now:

<#if><tr><p>Test</p></tr><#/if>  // Garbage input
// jQuery returns a length of 1
// Current solution returns 4 

jQuery would return a length of 1 as it attempts to maintain a root element for garbage input but in the new implementation it returns 4 as it creates a new element for each tag. I feel it's a small price to pay without trying to over engineer the current solution.

Also regarding the keepScripts part you mentioned which meant jQuery by default removes the script tag is now accounted for by a test case and https://github.com/apostrophecms/sanitize-html is used to handle other XSS. So in theory, when it comes to security the current implementation is better than the previous one.

EDIT: It appears that HTML sanitization causes problems due to event removal, we might need to only stick to script tag and call it a day 🤷. We'll see.


// Check if the content contains any HTML
var hasHTML = /(<|&(?:[a-z\d]+|#\d+|#x[a-f\d]+);)/i.test(html);

if (!hasHTML) {
// For pure text content, return a single text node
return [context.createTextNode(html)];
}

// Check for self-closing tag with content after
var selfClosingMatch = html.match(/^(<[^>]+\/>)([\s\S]*)$/);
if (selfClosingMatch) {
var tag = selfClosingMatch[1];
var afterContent = selfClosingMatch[2];
// Convert self-closing tag to opening tag
var openTag = tag.replace(/\/>$/, ">");
var tagName = openTag.match(/<([^\s>]+)/)[1];

// Create element with content inside
var div = context.createElement('div');
div.innerHTML = openTag + afterContent + "</" + tagName + ">";

return [div.firstChild];
}

// Handle special cases like <tr>, <td>, etc.
var specialParents = {
tr: { parent: 'tbody', context: 'table' },
td: { parent: 'tr', context: 'table' },
th: { parent: 'tr', context: 'table' },
col: { parent: 'colgroup', context: 'table' },
legend: { parent: 'fieldset', context: 'div' },
area: { parent: 'map', context: 'div' },
param: { parent: 'object', context: 'div' },
thead: { parent: 'table', context: 'div' },
tbody: { parent: 'table', context: 'div' },
tfoot: { parent: 'table', context: 'div' },
caption: { parent: 'table', context: 'div' },
colgroup: { parent: 'table', context: 'div' },
option: { parent: 'select', context: 'div' },
optgroup: { parent: 'select', context: 'div' }
};

// Simple regex to get the first tag
var firstTagMatch = /<([a-z][^\/\0>\x20\t\r\n\f]*)/i.exec(html);
var firstTag = firstTagMatch ? firstTagMatch[1].toLowerCase() : null;
var spec = firstTag ? specialParents[firstTag] : null;

// Split leading whitespace and content
var leadingMatch = html.match(/^(\s*)([^]*)$/);
var leadingWS = leadingMatch[1];
var remainingContent = leadingMatch[2];

var contentNodes;

if (spec) {
// Special elements need their proper parent structure
var contextElement = context.createElement(spec.context);
var parentElement = context.createElement(spec.parent);
contextElement.appendChild(parentElement);
parentElement.innerHTML = remainingContent;
contentNodes = Array.prototype.slice.call(parentElement.childNodes);
} else {
// Regular elements can be parsed directly
var div = context.createElement('div');
div.innerHTML = remainingContent;
contentNodes = Array.prototype.slice.call(div.childNodes);
}

// Only handle malformed HTML for specific cases
if (firstTagMatch && contentNodes.length > 1) {
var rootElement = null;
for (var i = 0; i < contentNodes.length; i++) {
var node = contentNodes[i];
if (node.nodeType === 1 && node.nodeName.toLowerCase() === firstTag) {
rootElement = node;
break;
}
}
// Only use root element for garbage input
if (rootElement && html.indexOf('<#if>') !== -1) {
contentNodes = [rootElement];
}
}

var result = [];

// Add leading whitespace if present
if (leadingWS) {
result.push(context.createTextNode(leadingWS));
}

// Add content nodes
for (var i = 0; i < contentNodes.length; i++) {
result.push(contentNodes[i]);
}

// Ensure array-like properties
Object.defineProperty(result, 'item', {
value: function(i) { return this[i]; }
});
Comment on lines +69 to +133
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sanitize-html fundamentally changes the behavior of parseHTML compared to jQuery's implementation. The jQuery.parseHTML function did NOT sanitize HTML - it simply parsed it into DOM nodes. This is a breaking change that alters the API contract. Applications relying on parseHTML to preserve all HTML content (including scripts and event handlers for legitimate use cases like template compilation) will break. Consider implementing parseHTML without sanitization, as the original function was for parsing, not sanitizing.

Copilot uses AI. Check for mistakes.

return result;
};

DOMBackend.Events = {
Expand Down
183 changes: 183 additions & 0 deletions packages/blaze/render_tests.js
Original file line number Diff line number Diff line change
Expand Up @@ -785,3 +785,186 @@ if (typeof MutationObserver !== 'undefined') {
}, 0);
});
}

Tinytest.add("blaze - dombackend - parseHTML", function (test) {
// Test basic HTML parsing
const basicHtml = "<div>Hello</div>";
const basicResult = Blaze._DOMBackend.parseHTML(basicHtml);
test.equal(basicResult.length, 1);
test.equal(basicResult[0].nodeName, "DIV");
test.equal(basicResult[0].textContent || basicResult[0].innerText, "Hello"); // innerText for IE

// Test various falsy/empty inputs (from jQuery tests)
test.equal(Blaze._DOMBackend.parseHTML().length, 0, "Without arguments");
test.equal(Blaze._DOMBackend.parseHTML(undefined).length, 0, "Undefined");
test.equal(Blaze._DOMBackend.parseHTML(null).length, 0, "Null");
test.equal(Blaze._DOMBackend.parseHTML(false).length, 0, "Boolean false");
test.equal(Blaze._DOMBackend.parseHTML(0).length, 0, "Zero");
test.equal(Blaze._DOMBackend.parseHTML(true).length, 0, "Boolean true");
test.equal(Blaze._DOMBackend.parseHTML(42).length, 0, "Positive number");
test.equal(Blaze._DOMBackend.parseHTML("").length, 0, "Empty string");

// Test whitespace preservation (from jQuery tests)
const leadingWhitespace = Blaze._DOMBackend.parseHTML("\t<div></div>");
test.equal(leadingWhitespace[0].nodeType, Node.TEXT_NODE, "First node should be text node");
test.equal(leadingWhitespace[0].nodeValue, "\t", "Leading whitespace should be preserved");

const surroundingWhitespace = Blaze._DOMBackend.parseHTML(" <div></div> ");
test.equal(surroundingWhitespace[0].nodeType, Node.TEXT_NODE, "Leading space should be text node");
test.equal(surroundingWhitespace[2].nodeType, Node.TEXT_NODE, "Trailing space should be text node");

// Test anchor href preservation (from jQuery gh-2965)
const anchor = Blaze._DOMBackend.parseHTML("<a href='example.html'></a>")[0];
test.ok(anchor.href.endsWith("example.html"), "href attribute should be preserved");

// Test malformed HTML handling
const malformedTestCases = [
{
html: "<span><span>", // Unclosed tags
expectedLength: 1
},
{
html: "<td><td>", // Multiple table cells
expectedLength: 2
},
{
html: "<#if><tr><p>Test</p></tr><#/if>", // Garbage input
expectedLength: 1 // Should not throw error
}
];

malformedTestCases.forEach((testCase, i) => {
const result = Blaze._DOMBackend.parseHTML(testCase.html);
test.equal(result.length, testCase.expectedLength,
`Malformed test ${i}: Expected length ${testCase.expectedLength} but got ${result.length}`);
});

// Test plain text (no HTML)
const textOnly = "Just some text";
const textResult = Blaze._DOMBackend.parseHTML(textOnly);
test.equal(textResult.length, 1);
test.equal(textResult[0].nodeType, Node.TEXT_NODE);
test.equal(textResult[0].textContent || textResult[0].nodeValue, "Just some text");

// Test self-closing tags
const selfClosing = "<div/>Content";
const selfClosingResult = Blaze._DOMBackend.parseHTML(selfClosing);
test.equal(selfClosingResult.length, 1);
test.equal(selfClosingResult[0].nodeName, "DIV");
test.equal(selfClosingResult[0].nodeType, Node.ELEMENT_NODE);
Comment on lines +854 to +858
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self-closing tag test on line 854 expects parseHTML to handle "<div/>Content" by creating a DIV element with "Content" as text content. However, sanitize-html may parse this differently than jQuery, potentially creating an empty div followed by a text node. The test should verify the actual structure that results from this input, including checking if "Content" is inside or outside the div element.

Copilot uses AI. Check for mistakes.

// Test nested table elements (testing proper wrapping levels)
const nestedTable = "<td>Cell</td>";
const nestedResult = Blaze._DOMBackend.parseHTML(nestedTable);
test.equal(nestedResult.length, 1);
test.equal(nestedResult[0].nodeName, "TD");

// Test table elements (IE has special requirements)
const tableTestCases = {
tr: {
html: "<tr><td>Cell</td></tr>",
expectedTags: ["TR", "TD"]
},
td: {
html: "<td>Cell</td>",
expectedTags: ["TD"]
},
tbody: {
html: "<tbody><tr><td>Cell</td></tr></tbody>",
expectedTags: ["TBODY", "TR", "TD"]
},
thead: {
html: "<thead><tr><th>Header</th></tr></thead>",
expectedTags: ["THEAD", "TR", "TH"]
},
tfoot: {
html: "<tfoot><tr><td>Footer</td></tr></tfoot>",
expectedTags: ["TFOOT", "TR", "TD"]
},
colgroup: {
html: "<colgroup><col span='2'></colgroup>",
expectedTags: ["COLGROUP", "COL"]
}
};

Object.entries(tableTestCases).forEach(([testCaseName, testCase]) => {
const result = Blaze._DOMBackend.parseHTML(testCase.html);
const firstNode = result[0];
test.equal(firstNode.nodeName, testCase.expectedTags[0],
`${testCaseName}: Expected ${testCase.expectedTags[0]} but got ${firstNode.nodeName}`);
});

// Test whitespace handling (IE is sensitive to this)
const whitespaceTestCases = [
{
html: " <div>Padded</div> ",
expectedLength: 3, // Leading space + div + trailing space
expectedTag: "DIV"
},
{
html: "\n<div>Newlines</div>\n",
expectedLength: 3, // Leading newline + div + trailing newline
expectedTag: "DIV"
},
{
html: "\t<div>Tabs</div>\t",
expectedLength: 3, // Leading tab + div + trailing tab
expectedTag: "DIV"
}
];

whitespaceTestCases.forEach((testCase, i) => {
const result = Blaze._DOMBackend.parseHTML(testCase.html);
test.equal(result.length, testCase.expectedLength,
`Whitespace test ${i}: Expected length ${testCase.expectedLength} but got ${result.length}`);
// Check the middle node (the div)
test.equal(result[1].nodeName, testCase.expectedTag,
`Whitespace test ${i}: Expected tag ${testCase.expectedTag} but got ${result[1].nodeName}`);
// Verify surrounding nodes are text nodes
test.equal(result[0].nodeType, Node.TEXT_NODE,
`Whitespace test ${i}: Expected leading text node`);
test.equal(result[2].nodeType, Node.TEXT_NODE,
`Whitespace test ${i}: Expected trailing text node`);
});

// Test empty input
test.equal(Blaze._DOMBackend.parseHTML("").length, 0);
test.equal(Blaze._DOMBackend.parseHTML(null).length, 0);
test.equal(Blaze._DOMBackend.parseHTML(undefined).length, 0);
// This is a unique case since a whitespace-only input is parsed as a single text node.
test.equal(Blaze._DOMBackend.parseHTML(" ").length, 1);

// Test malformed HTML (IE is more strict)
const malformedTestCasesIE = [
{
html: "<div>Hello<span>World</span></div>", // Well-formed control case
expectedLength: 1,
expectedChildren: 1
},
{
html: "<div>Test</div><p>", // Partial second tag
expectedLength: 2
},
{
html: "<div class=>Test</div>", // Invalid attribute
expectedLength: 1
}
];

malformedTestCasesIE.forEach((testCase, i) => {
const result = Blaze._DOMBackend.parseHTML(testCase.html);
test.equal(result.length, testCase.expectedLength,
`Malformed test ${i}: Expected length ${testCase.expectedLength} but got ${result.length}`);
if (testCase.expectedChildren !== undefined) {
const childCount = result[0].getElementsByTagName('span').length;
test.equal(childCount, testCase.expectedChildren,
`Malformed test ${i}: Expected ${testCase.expectedChildren} span elements but got ${childCount}`);
}
});

// Test array-like properties of result (important for IE)
const arrayResult = Blaze._DOMBackend.parseHTML("<div></div><span></span>");
test.equal(typeof arrayResult.length, "number", "Result should have length property");
test.equal(typeof arrayResult[0], "object", "Result should have indexed access");
test.equal(arrayResult[0].nodeName, "DIV", "First element should be accessible by index");
});