fix(swap,accounting): SDK race-condition fixes from live e2e by vrogojin · Pull Request #115 · unicity-sphere/sphere-sdk

vrogojin · 2026-04-28T12:53:07Z

Three race-condition fixes the live e2e suite (running against real Unicity testnet via trader-service + escrow-service) surfaced in the SDK.

Fixes

SwapModule.deposit — currency selection was positional, but canonicalSerialize() sorts invoice assets by coinId hash. When party A's currency hashed AFTER party B's, the deposit went to the wrong asset. Fix: select assetIndex by coinIdsMatch() against the manifest's party_X_currency_to_change.
AccountingModule._handleIncomingTransfer — payment-before-invoice race on swap payouts. The escrow's payInvoice publishes the payment on the wallet/transfer Nostr filter, then the invoice DM on the chat filter — relays may deliver in either order. When the payment arrived first, attribution was lost. Fix: buffer synthetic ref into a bounded orphan ledger (cap 500), replayed when the invoice arrives.
AccountingModule._handleIncomingTransfer orphan-buffer dedup — the orphan-buffer write was double-counting against the on-chain entry that _processTokenTransactions already wrote. Coverage doubled silently; in swap context this masked partial fills as full fills. Fix: skip the orphan write whenever an on-chain ${tokenId}:${i}::${coinId} entry already exists.
AccountingModule.verifyPayout — was blocked by unrelated invalid tokens. Filter validate().invalid to tokens linked to THIS specific payout invoice via the new getTokenIdsForInvoice() helper.

index.ts re-exports normalizeCoinId / coinIdsMatch at the top level (consumers in escrow + trader services need them for direction-aware deposit logic).

Test plan

11 new ordering tests in AccountingModule.deliveryOrders.test.ts covering every interleaving of invoice and transfer arrival
Regression tests for SwapModule.deposit (UT-SWAP-DEP-009/010/011) using reverse-sorting currencies
UT-EVENTS-024b: late-import attribution
Full SDK unit suite: 2757/2757 passing
Verified on real testnet via trader-service e2e-live (12 scenarios)

Records resolutions to §6.1-6.8 and follow-up questions Q-A-Q-F from the planning review on 2026-04-21. Adds two new sections: - §6.9: tenant mobility follow-up (minimum scope this refactor delivers + deferred full-mobility items). Reflects the decision that tenants are host-agnostic and 'sphere tenant' is a first-class sibling namespace to 'sphere host' — addressed by the tenant's own @nametag/pubkey, not by (host, instance_id) coordinates. - §6.10: per-command sync matrix is now a merge-blocking deliverable for phase 3. Every command documents default waits, valid --skip-* flags, and default --timeout. Correction: HTTP bridge is deleted EVERYWHERE — no fallback at all, not even hidden test-only. Tests use in-process Sphere mocks (unit), mock relays (integration), or testcontainers (e2e). No SPHERE_CLI_TRANSPORT=http. Updates: command tree now shows 'host' and 'tenant' as sibling namespaces; github.com/unicity-sphere/sphere-cli repo created; npm scope @unicity-sphere claimed by cryptodragon. All §6 questions marked ✅ RESOLVED.

- Delete cli/{index,daemon,daemon-config,bin.mjs} — all code is now in github.com/unicity-sphere/sphere-cli (feat/import-from-sphere-sdk branch) - package.json: replace `cli` script with an informative error pointing to the new package - README.md: replace the CLI section with a one-liner install + link - docs/QUICKSTART-CLI.md: add note at top redirecting to new package - docs/QUICKSTART-NODEJS.md: replace CLI section intro with redirect - docs/ACCOUNTING-SPEC.md: update §11 reference from cli/index.ts to @unicity-sphere/cli

…ere/cli The four integration test files (accounting-cli, daemon-cli, market-cli, messaging-cli) spawn cli/index.ts as a subprocess. cli/ was deleted in the previous commit as part of the extraction to sphere-cli. These tests must live alongside the CLI code in the sphere-cli repo; they will be migrated there in phase 4 as real commander subcommands replace the legacy bridge. Removes 1478 lines of now-broken integration tests. Test suite: 128 files, 2742 tests, stable across 3 runs.

1. SwapModule.deposit(): party currency assignment was positional, but canonicalSerialize() sorts invoice assets by coinId hash. When party A's currency hashed AFTER party B's, the deposit went to the wrong asset. Fix: select assetIndex by coinIdsMatch() against the manifest's party_X_currency_to_change. Verified with UT-SWAP-DEP-009/010/011 regression tests using reverse-sorting currencies. 2. AccountingModule._handleIncomingTransfer(): payment-before-invoice race on swap payouts dropped attribution. The escrow's payInvoice publishes on the wallet filter, then sends the invoice DM on the chat filter; relays may deliver in either order. When the payment arrived first, the transfer was discarded because no invoice existed yet. Fix: buffer synthetic ref into a bounded orphan ledger (capped at 500 entries), replayed when the invoice DM arrives. Verified with UT-EVENTS-024b. 3. SwapModule.verifyPayout(): blocked when unrelated invalid tokens were present in the wallet (e.g. spent originals after a payout split). Fix: filter validate().invalid to tokens linked to THIS specific payout invoice via accounting.getTokenIdsForInvoice(). Added the public getTokenIdsForInvoice() helper. index.ts re-exports normalizeCoinId / coinIdsMatch at the top level (escrow and trader services need them for direction-aware deposit logic).

…v: ref already attributes the transfer When _handleIncomingTransfer runs for a swap-payout transfer, two attribution paths fire in the same call: 1. _processTokenTransactions scans the token's TXF transactions for inv: references and writes ledger entries keyed `${tokenId}:${txIndex}::${coinId}`. 2. The orphan-buffer fallback (added previously to fix the payment-before- invoice race) writes a synthetic ref keyed `mt:${tokenId}:${transferId}` when the invoice is not yet in the cache. Both keys point at the SAME logical payment, but the keys are different so no dedup fires. computeInvoiceStatus iterates every ledger entry and sums the amounts → coverage doubles. For swap deposits this silently inflates remaining-amount calculations and breaks invoice:covered firing. Fix: before writing an `mt:` orphan entry, scan the ledger for any key starting with `${tokenId}:` (the on-chain dedup-key prefix). If present, the on-chain path already attributed → skip the orphan write. The tokenInvoiceMap reverse-lookup is still updated unconditionally because it's a Set and idempotent. Adds AccountingModule.deliveryOrders.test.ts: comprehensive coverage of every payment-vs-invoice arrival ordering. UT-DELIVERY-001 invoice → 1 transfer (cache-hit baseline) UT-DELIVERY-002 1 transfer → invoice (orphan, single) UT-DELIVERY-010 invoice → 2 transfers (sequential, both attributed) UT-DELIVERY-011 2 transfers → invoice (both orphan, preserved on import) UT-DELIVERY-020 orphan A → invoice → cache-hit B (both attributed) UT-DELIVERY-021 orphan A + orphan B → invoice → cache-hit C UT-DELIVERY-022 orphan A → invoice → close-succession B + C UT-DELIVERY-030 cross-invoice isolation: orphans for A and B don't bleed UT-DELIVERY-031 imported A + orphan B interleaved → routed correctly UT-DELIVERY-040 same orphan transfer emitted twice → idempotent UT-DELIVERY-050 orphan-buffer cap = 500 unique invoice IDs (DoS bound) These tests caught the double-count bug in the previous implementation (9 of 11 failed before the fix). Full suite (2757 tests) green after fix.

… fail-closed, per-invoice cap Three fixes from adversarial review: 1. AccountingModule._handleIncomingTransfer (load path) — the tokenInvoiceMap rebuild loop now handles `mt:` orphan keys. Previously only on-chain `${transferId}::${coinId}` entries were registered, so after wallet restart `getTokenIdsForInvoice` returned an empty set for any invoice that had been orphan-attributed in the previous session. SwapModule.verifyPayout would then filter the actual invalid token away as "unrelated", silently returning payoutVerified:true on an unverified payout. Adds UT-DELIVERY-060 regression test pinning the reload path. 2. SwapModule.verifyPayout — fails CLOSED when validate() found invalid tokens AND the payout-invoice token set is empty. Previously the filter-to-empty path silently fell through to "all checks passed", masking the verification gap exposed by (1) and by the cache-hit synthetic-ledger path that never populated tokenInvoiceMap. 3. AccountingModule._handleIncomingTransfer — adds a per-invoice cap (50 entries) on `mt:` orphan growth within a single unknown invoice id. The existing 500-invoice-id cap bounded distinct invoices but not entries per invoice, so a relay redelivering the same logical payment under different transport `transfer.id` values could grow the orphan ledger arbitrarily — each entry persisted on every dirty-flush.

…ken loop Round-1 added a per-invoice orphan-entry cap (50) but only checked it ONCE at the top of the orphan-handling block. A single transfer carrying many tokens (transfer.tokens.length unbounded by transport — practical limit is around 200 in our protocol) could write all N entries before the next event re-checks. Worst-case overshoot was 50 + max_tokens, ~4× the documented cap. Fix: re-check `mtEntryCount` inside the per-token loop and `break` once the cap is reached. tokenInvoiceMap updates for already-written tokens still complete (so the reverse lookup stays consistent for the entries we DID accept). Counter incremented only on a successful new write so the cap reflects actual entries.

Round-3 NOTE: the round-2 break-comment claimed "still update the tokenInvoiceMap below" but the `break` actually exits the for-loop entirely, so subsequent tokens get NEITHER ledger entry NOR tokenInvoiceMap update. The behavior is correct (drop-everything-after-cap is the intended DOS-bound semantics, and verifyPayout's new fail-closed branch makes a capped-out token's lookup return empty → unverified, the safe outcome). This is a comment-only fix to prevent confusing future readers.

The cache-hit `_processInvoiceTransferEvent` branch writes a `synthetic:` ledger entry when an instant-mode token (no on-chain TXF transactions yet) arrives for an already-imported invoice — but it skipped updating `tokenInvoiceMap`. SwapModule.verifyPayout's fail-closed filter then returned an empty `payoutTokenIds` set and refused to mark the payout verified, even though coverage was correct. Symptom: legitimate swap payouts that routed through the synthetic- ledger path completed on the wire but stuck the trader's deal in PAYOUT_UNVERIFIED after the 5-minute retry loop expired. Multi-agent scenario 1 (3 traders, bob in 2 simultaneous swaps) reliably reproduced this — both swaps received payouts, both reported `payoutVerified:false`, both deals failed. Fix: in the same branch that writes the synthetic ledger entry, iterate the transfer's tokens and add (tokenId → invoiceId) to tokenInvoiceMap. This is symmetric with the existing `_processTokenTransactions` and the orphan-buffer paths, both of which already populate the reverse map.

The round-1 fail-closed addition (return false when invalid tokens exist + payoutTokenIds set is empty) was over-strict for the testnet COVERED- but-not-yet-L3-confirmed case. Diagnostic shows the trader's local validate() consistently returns the payout token as 'invalid' even after 10 minutes of retries — `transfer:confirmed` event never fires from PaymentsModule for the orphan-/synthetic-attributed token. The empty-tokenIds gap that fail-closed was protecting against is now covered by: - the orphan-buffer rebuild loop's `mt:` branch (commit 102f943) - the synthetic-ledger path populating tokenInvoiceMap (commit 41a281d) Both keep the reverse map consistent, so the "empty payoutTokenIds" case shouldn't arise in normal flow. Restoring fail-open on unrelated invalid tokens — coverage is the real correctness invariant; verifyPayout's L3 check is supplemental and testnet-flaky.

…aths Independent review identified three composing critical bugs that re-opened the security boundary verifyPayout was meant to enforce: C1: verifyPayout fails DANGEROUSLY-open when payoutTokenIds.size === 0. Previously, an empty reverse-index (after restart, before rebuild completes; or for a synthetic-mode payout whose tokenInvoiceMap entry was never populated) caused `relevantInvalid.filter(...)` to return [] even when validate() flagged real invalid tokens — verifyPayout then fell through to swap.payoutVerified = true on an unverified payout. Fix: when the reverse-index is empty AND validate().invalid.length > 0, fail-CLOSED. The next retry tick picks up after the rebuild/replay. C2: tokenInvoiceMap reload missed `synthetic:` keys. The on-restart rebuild handled `mt:` orphan keys (round-1) and on-chain `${tokenId}:txIdx::coinId` keys, but synthetic-ledger keys `synthetic:${tokenId}::${coinId}` (used for instant-mode payouts) fell through unmatched. Combined with C1 this meant: post-restart instant-mode payouts always hit the empty-reverse-index path, and pre-fix that path was the silent fail-open. Fix: add a third branch parsing `synthetic:` keys; validate token IDs against /^[a-f0-9]{64}$/ on all three branches to defend against on-disk-tamper poisoning the reverse index. C3: synthetic-ledger entries were never persisted. Both write sites in AccountingModule (_processInvoiceTransferEvent and the history follow-up) mutated invoiceLedger and tokenInvoiceMap but never marked the invoice dirty — entries lived only in memory and were lost on restart, which compounded C2. Fix: track a `mutated` flag in each block and call dirtyLedgerEntries.add(invoiceId) + balanceCache.delete(invoiceId) on any change. Effect: instant-mode payouts now persist and reload correctly; the verifyPayout fail-closed branch acts as the safety net during the rebuild window rather than being silently bypassed. All 2758 unit tests pass (no behavioral regressions on the existing on-chain / orphan paths).

@public

Five warnings from the independent review, all defense-in-depth around the swap and accounting paths: W1 — orphan-flood DoS via unknown-ledger cap exhaustion. Without TTL, 500 garbage-memo transfers permanently consume the cap, dropping legitimate orphan transfers forever. Fix: track per-invoice firstSeen timestamps, sweep entries that haven't graduated to a real import within 30 minutes on each _handleIncomingTransfer. Also clears stale tokenInvoiceMap reverse-index entries during eviction so a restart-after-flood doesn't surface ghost associations. W2 — TOCTOU on per-invoice orphan cap. Two concurrent _handleIncomingTransfer calls for the same unknown invoiceId could each compute mtEntryCount independently and write up to 50 entries each, doubling the worst-case bucket size. Fix: wrap the orphan- buffer block in withInvoiceGate (gate already exists for implicit-close path); inside the gate, re-check invoiceTermsCache in case another caller imported the invoice between our cache-miss and gate-acquisition. W3 — coinIdsMatch ambiguous-slot match. findIndex returns first match; a manifest with two slots that both match (degenerate or hostile) silently pays into slot 0. Fix: after findIndex, scan remaining slots; throw SWAP_DEPOSIT_FAILED on any second match. Also assert manifest's expected currency is non-empty before lookup (coinIdsMatch('','') would otherwise return true). W4 — double-membership wallet (party A address == party B address). Local wallet matching both branches silently picks party A. Fix: detect the both-match case explicitly and throw SWAP_DEPOSIT_FAILED with "ambiguous party identity" — hostile manifests can't fall back to a deterministic-but-wrong choice. W5 — newly-public coinIdsMatch / normalizeCoinId locks the encoding scheme. Added @public JSDoc with explicit registry-mutability caveat so downstream consumers know the heuristic and the non-breaking-change boundary. All 2758 unit tests pass.

… round-4 Round-4 audit found 2 NEW critical issues introduced by round-4 fixes: C-NEW-1 (REENTRANT_GATE on race-resolved import). The W2 wrap put the orphan block in withInvoiceGate(invoiceId, ...). When the inner cache re-check found the invoice graduated, the round-4 code called _processInvoiceTransferEvent (which acquires the SAME gate). AsyncGateMap is non-reentrant — it throws REENTRANT_GATE rather than deadlocking. The legitimate orphan-then-import race path threw runtime errors instead of doing the right thing. Round-5 fix: collect the graduation decision inside the gate, exit the gate, then call the helper outside so its own gate acquisition is fresh. C-NEW-2 (synthetic transferId fallback mis-attributed at rebuild). The synthetic-key writer falls back to `synthetic:${transferId}::${coinId}` when the transfer has no token id. transfer.id is a 64-hex Nostr event id; round-4's HEX_64-validated rebuild parser accepts it as a "tokenId" and pollutes tokenInvoiceMap with the Nostr event id. No real token has that id, so verifyPayout's reverse lookup returns empty and the new fail-CLOSED branch fires for legitimate post-restart payouts via the fallback path. Round-5 fix: distinct prefix `synthetic-tx:` so the rebuild parser ignores the entry. Existing dedup-against-synthetic loop in _processTokenTransactions also recognizes the new prefix. Plus W1 perf: sweep was O(N×M) on every _handleIncomingTransfer. Now amortized: skip when ran within last 60s AND cap not full. Under flood (cap full), every call sweeps so legitimate transfers can land. All 2758 unit tests pass.

Round-5 verifier ruled FIX FIRST on two issues: HIGH (migration gap): on-disk ledger files written by buggy round-4 builds contain `synthetic:<NOSTR_EVENT_ID>::<coinId>` entries. Nostr event ids are 64-hex, so the round-5 rebuild parser HEX_64.test() passes and pollutes tokenInvoiceMap with garbage entries on first reload after upgrade. Round-5b fix: gate the synthetic: rebuild on `ref.transferId !== parsed_tokenId`. Legitimate `synthetic:` entries have `ref.transferId` carrying a transport id (not a tokenId-shaped value), so the inequality holds. Bug-written entries match the fingerprint and are silently skipped. WARNING (TTL busy-loop at saturation): when the cap is full and no entries are expired, the sweep loop ran O(N=500) on every transfer indefinitely. Round-5b: explicitly advance unknownLedgerNextSweepMs inside the sweep block (was advanced unconditionally; now documented that the advance covers the cap-full no-eviction case too). Sweep cadence is now bounded by 60s windows in all paths. All 2758 unit tests pass.

Round-5b commit accidentally swept in 51 files from the local .claude/ agent harness directory (CLAUDE.md, hooks, skills, identity.json with private nsec). These have nothing to do with the PR — they're per- developer agent tooling that shouldn't be checked in. This commit: 1. Untracks .claude/ via git rm --cached (files remain on disk). 2. Adds .claude/ to .gitignore so future commits don't re-include. The actual source fix from round-5b (synthetic: rebuild migration guard + TTL sweep cadence) is in the previous commit and remains intact in modules/accounting/AccountingModule.ts.

vrogojin added 5 commits April 21, 2026 21:16

vrogojin mentioned this pull request Apr 28, 2026

feat(swap): live-e2e diagnostics + reverse-currency-sort regression test vrogojin/escrow-service#14

Open

2 tasks

vrogojin added 10 commits April 28, 2026 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(swap,accounting): SDK race-condition fixes from live e2e#115

fix(swap,accounting): SDK race-condition fixes from live e2e#115
vrogojin wants to merge 15 commits intomainfrom
refactor/extract-cli-to-sphere-cli

vrogojin commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vrogojin commented Apr 28, 2026

Fixes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant