Conversation
a2ae684 to
1a2f7f0
Compare
ae0ae76 to
cc117c3
Compare
|
This should be ready for review now, one open question: In v3, when an env var was set to an empty string (
The number behavior was almost certainly a bug — nobody sets In the new Zod-based Configuration, Option A: Option B: Option C: I'm leaning toward A — clean break, simple code, and nobody intentionally relies on this. Thoughts? There is also a connected SDK PR: apify/apify-sdk-js#583 |
Hard to disagree with that. My hunch would be that this should mean "unset", i.e., use the default.
True, I always find myself wondering how to set an environment variable to boolean false. But I think that we can accept the fact that environment variables are strings.
True, on the other hand, an empty string is a string, too. And if Of the three proposed solutions, A is the elegant one, C is backwards compatible and B tries to bolt a type system onto environment variables, which might either be just the solution that we're looking for, or a road to hell of confusion and unforeseen edge cases. Is there a known use case that would break with A? Do people run |
|
I think/hope people don't do this on purpose, I'd expect 99% of users will just go with 1/0. So a break here feels pretty fine. |
|
Just a passing-by comment:
This doesn't seem to be true currently; see the snippet below (the default value for both fields is crawlee/packages/core/test/core/configuration.test.ts Lines 109 to 115 in 4063bf6 Anyway, I'm also on board with Option A, it seems like the most logical / least surprising one. |
|
Yeah, the current implementation differs, I initially wanted to resolve the booleans mapping from |
c62a42e to
0747cc8
Compare
Replace Map-based get()/set() API with Zod-based field definitions and direct property access. Configuration is now immutable, and constructor options take precedence over environment variables. Priority order: constructor options > env vars > crawlee.json > defaults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… and z Allow `envVar` to accept `string | string[]` so the Apify SDK can map ACTOR_/APIFY_/CRAWLEE_ prefixed env vars to a single config field. Also export `coerceNumber` and `z` for SDK reuse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Constructor options and crawlee.json values were returned raw from _resolve(), bypassing schema validation that env vars already went through. Now all four resolution paths (constructor, env, file, defaults) consistently run through fieldDef.schema.parse(). Adds tests for crawlee.json loading and constructor option coercion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The declaration merging pattern is intentional — properties are defined at runtime via Object.defineProperties in registerAccessors(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…link The ConfigurationOptions interface was replaced by ConfigurationInput type alias in v4. Updated docs to reference the Configuration class directly and reflect the new immutable API (no more get/set methods). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…x broken refs - Cache all resolved Configuration values at construction time instead of re-running Zod parsing on every property access (perf fix for hot paths like emitSystemInfoEvent which reads config every ~1s) - Replace 13-method manual serviceLocator proxy with a Proxy object - Fix broken Configuration.getStorageClient() calls in test and e2e files to use serviceLocator.getStorageClient() - Move EventEmitter.defaultMaxListeners to module scope (run once, not per instance) - Extract readEnvVar() helper, remove TOCTOU pathExistsSync before readFileSync - Add systemInfoV2 to "Removed symbols" in upgrade guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unnecessary exports (coerceNumber, logLevelSchema, z from zod) - Add set trap to serviceLocator proxy for a helpful error message - Document resetGlobalState() removal in upgrade guide - Fix mock logger method name (internal → logWithLevel) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…et() API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…v var coercion Empty-string env vars now reach the schema coercion layer (matching v3 behavior where '' produced false for booleans and 0 for numbers). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sync versioned_docs/version-4.0 with current /docs/ and regenerate api-typedoc.json from the latest TypeScript sources. The snapshot still showed the removed Configuration.get/set methods and stale guide copy; regenerating pulls in the new property-based getters and the updated configuration guide. Also adds the custom-logger guide that landed in v4 after the snapshot was taken. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- docs: mention serviceLocator as the source of truth behind getGlobalConfig - configuration: expand comment explaining EventEmitter.defaultMaxListeners - configuration: make resolveAll private by passing fields as an argument - tests: document why disableBrowserSandbox/containerized/logLevel stay undefined - tests: cover crawlee.json in the priority describe (env > json > defaults) - tests: rename "accesses all fields as properties" for clearer wording
- Default disableBrowserSandbox to false; only containerized and logLevel remain optional (their undefined value carries runtime semantics). - Inline the serviceLocator mention into the existing getGlobalConfig paragraph instead of appending a separate one.
769ea05 to
22802e2
Compare
janbuchar
left a comment
There was a problem hiding this comment.
A couple of nits. Let's test this live once they're resolved.
| // Now we need to pass the configuration to the crawler | ||
| const crawler = new CheerioCrawler({}, config); | ||
| // Pass the configuration to the crawler | ||
| const crawler = new CheerioCrawler({ configuration: config }); |
There was a problem hiding this comment.
Missed opportunity to use an object shorthand - just rename config to configuration
There was a problem hiding this comment.
Yeah, when I was going through the diff and saw this, I was actually going to write here that I'd vote for renaming the option to just config 🙃
But then I realized we already have proxyConfiguration, so better keep the long form.
| ### Global Configuration | ||
|
|
||
| By default, there is a global singleton instance of `Configuration` class, it is used by the crawlers and some other classes that depend on a configurable behavior. In most cases you don't need to adjust any options there, but if needed - you can get access to it via <ApiLink to="core/class/Configuration#getGlobalConfig">`Configuration.getGlobalConfig()`</ApiLink> function. Now you can easily <ApiLink to="core/class/Configuration#get">`get`</ApiLink> and <ApiLink to="core/class/Configuration#set">`set`</ApiLink> the <ApiLink to="core/interface/ConfigurationOptions">`ConfigurationOptions`</ApiLink>. | ||
| By default, there is a global singleton instance of `Configuration` class, it is used by the crawlers and some other classes that depend on a configurable behavior. In most cases you don't need to adjust any options there, but if needed - you can access it via <ApiLink to="core/class/Configuration#getGlobalConfig">`Configuration.getGlobalConfig()`</ApiLink>, which delegates to the global <ApiLink to="core/class/ServiceLocator">`serviceLocator`</ApiLink> — the single source of truth for Crawlee's shared services (configuration, event manager, storage client, logger). You can also reach the same instance directly via `serviceLocator.getConfiguration()` or swap services globally with `serviceLocator.setConfiguration(...)` before any crawler is created. Configuration values are accessible directly as properties on the instance. |
There was a problem hiding this comment.
I would add "for example" to the list of services tracked by service locator so that it doesn't go stale later on.
| // Crawlee attaches many listeners to shared EventEmitters (one per crawler/session/autoscaled pool), | ||
| // which can exceed Node's default limit of 10 and trigger spurious MaxListenersExceededWarning logs. | ||
| // Raising the global default avoids false positives; real leaks will still manifest as unbounded growth. | ||
| EventEmitter.defaultMaxListeners = 50; |
There was a problem hiding this comment.
Thanks for adding the explanation. I still think there might be a better place for this, I don't like modules to have side effects.
Perhaps you could make an issue so that we could resolve this later (way later)?
- docs: switch the custom-configuration examples to object-shorthand
(`{ configuration }`) and clarify that the serviceLocator service list
is illustrative ("for example ...").
- configuration: link issue #3615 next to the EventEmitter side effect
so the long-term cleanup is tracked.
Per the resolution on the open question in #3484, empty-string env vars now fall through to crawlee.json or schema defaults instead of being coerced (booleans → false, numbers → 0, strings → ''). `readEnvVar` skips `''` alongside `undefined`, and `coerceBoolean` no longer needs `''` in its falsy list (env vars filter it out before reach, and constructor options get the same fall-through via Zod's `.default()`). This is a deliberate v3 → v4 break — the v3 number-coercion case (`CRAWLEE_PERSIST_STATE_INTERVAL_MILLIS='' → 0ms`) was almost certainly a bug, and nobody intentionally relies on the boolean/string coercions.
|
Almost forgot to resolve the priorities (#3484 (comment)), handled in the latest commit. |
Configuration class for v4
Summary
get()/set()with direct property access —config.headlessinstead ofconfig.get('headless'). Both methods are removed (v4 breaking change).new Configuration({ headless: false })works even whenCRAWLEE_HEADLESS=trueis set. New priority: constructor > env vars > crawlee.json > schema defaults.TypeError.crawleeConfigFields).protected static fieldsto register additional config fields (e.g. Apify SDK).Closes #3080
🤖 Generated with Claude Code