Skip to content

Performance: std::regex with picomatch-generated patterns causes 10-17x slowdown on large repos #244

@pastelsky

Description

@pastelsky

wrapper.js converts globs like **/node_modules/** into regex via picomatch.makeRe(), producing patterns with nested negative lookaheads:

^(?:(?:^|\/|(?:(?:(?!(?:^|\/)\.{1,2}(?:\/|$)).)*?)\/)node_modules(?:\/(?!\.{1,2}(?:\/|$))(?:(?:(?!(?:^|\/)\.{1,2}(?:\/|$)).)*?)|$))$

These are pathological for std::regex (libstdc++ backtracking NFA). On a monorepo with ~650K files, subscribe() with just 4 such patterns takes 120s — 111s of which is CPU time in std::regex_match(). The actual fts traversal + inotify_add_watch() takes ~5s.

The lookaheads prevent ** from matching ./.. path components, but paths from fts_read() never contain those — they're resolved by the kernel.

A simpler regex like (^|.*/)node_modules(/.*|$) is functionally equivalent for real paths and runs 11x faster:

Config Time User CPU
picomatch regex (current) 120s 111s
Simple regex, no lookaheads 11s 6.6s
ignorePaths only, no regex 9.5s 2.8s

Suggestion: It would be worth documenting in the README that glob patterns with leading ** (e.g. **/node_modules/**) generate complex regex with negative lookaheads that can cause significant performance degradation on large directory trees via std::regex. Users can avoid this by using simpler patterns like node_modules/** or by passing absolute paths via ignorePaths instead.

I also looked into using glob-to-regexp as a replacement for picomatch — it generates lookahead-free regex and is a much simpler library (130 lines, zero deps). However, it's not a full drop-in replacement: it doesn't support extglob features like (a|b) alternation, [a-e] character classes as glob syntax, or {a,b} brace expansion. See #245 for a PR that uses glob-to-regexp for simple patterns and falls back to picomatch for patterns containing extglob syntax.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions