Skip to content

fix(swap,accounting): SDK race-condition fixes from live e2e#115

Open
vrogojin wants to merge 15 commits intomainfrom
refactor/extract-cli-to-sphere-cli
Open

fix(swap,accounting): SDK race-condition fixes from live e2e#115
vrogojin wants to merge 15 commits intomainfrom
refactor/extract-cli-to-sphere-cli

Conversation

@vrogojin
Copy link
Copy Markdown
Contributor

Three race-condition fixes the live e2e suite (running against real Unicity testnet via trader-service + escrow-service) surfaced in the SDK.

Fixes

  • SwapModule.deposit — currency selection was positional, but canonicalSerialize() sorts invoice assets by coinId hash. When party A's currency hashed AFTER party B's, the deposit went to the wrong asset. Fix: select assetIndex by coinIdsMatch() against the manifest's party_X_currency_to_change.
  • AccountingModule._handleIncomingTransfer — payment-before-invoice race on swap payouts. The escrow's payInvoice publishes the payment on the wallet/transfer Nostr filter, then the invoice DM on the chat filter — relays may deliver in either order. When the payment arrived first, attribution was lost. Fix: buffer synthetic ref into a bounded orphan ledger (cap 500), replayed when the invoice arrives.
  • AccountingModule._handleIncomingTransfer orphan-buffer dedup — the orphan-buffer write was double-counting against the on-chain entry that _processTokenTransactions already wrote. Coverage doubled silently; in swap context this masked partial fills as full fills. Fix: skip the orphan write whenever an on-chain ${tokenId}:${i}::${coinId} entry already exists.
  • AccountingModule.verifyPayout — was blocked by unrelated invalid tokens. Filter validate().invalid to tokens linked to THIS specific payout invoice via the new getTokenIdsForInvoice() helper.

index.ts re-exports normalizeCoinId / coinIdsMatch at the top level (consumers in escrow + trader services need them for direction-aware deposit logic).

Test plan

  • 11 new ordering tests in AccountingModule.deliveryOrders.test.ts covering every interleaving of invoice and transfer arrival
  • Regression tests for SwapModule.deposit (UT-SWAP-DEP-009/010/011) using reverse-sorting currencies
  • UT-EVENTS-024b: late-import attribution
  • Full SDK unit suite: 2757/2757 passing
  • Verified on real testnet via trader-service e2e-live (12 scenarios)

Records resolutions to §6.1-6.8 and follow-up questions Q-A-Q-F from the
planning review on 2026-04-21. Adds two new sections:

- §6.9: tenant mobility follow-up (minimum scope this refactor delivers +
  deferred full-mobility items). Reflects the decision that tenants are
  host-agnostic and 'sphere tenant' is a first-class sibling namespace to
  'sphere host' — addressed by the tenant's own @nametag/pubkey, not by
  (host, instance_id) coordinates.
- §6.10: per-command sync matrix is now a merge-blocking deliverable for
  phase 3. Every command documents default waits, valid --skip-* flags,
  and default --timeout.

Correction: HTTP bridge is deleted EVERYWHERE — no fallback at all, not
even hidden test-only. Tests use in-process Sphere mocks (unit), mock
relays (integration), or testcontainers (e2e). No SPHERE_CLI_TRANSPORT=http.

Updates: command tree now shows 'host' and 'tenant' as sibling namespaces;
github.com/unicity-sphere/sphere-cli repo created; npm scope @unicity-sphere
claimed by cryptodragon.

All §6 questions marked ✅ RESOLVED.
- Delete cli/{index,daemon,daemon-config,bin.mjs} — all code is now in
  github.com/unicity-sphere/sphere-cli (feat/import-from-sphere-sdk branch)
- package.json: replace `cli` script with an informative error pointing
  to the new package
- README.md: replace the CLI section with a one-liner install + link
- docs/QUICKSTART-CLI.md: add note at top redirecting to new package
- docs/QUICKSTART-NODEJS.md: replace CLI section intro with redirect
- docs/ACCOUNTING-SPEC.md: update §11 reference from cli/index.ts to
  @unicity-sphere/cli
…ere/cli

The four integration test files (accounting-cli, daemon-cli, market-cli,
messaging-cli) spawn cli/index.ts as a subprocess. cli/ was deleted in the
previous commit as part of the extraction to sphere-cli. These tests must
live alongside the CLI code in the sphere-cli repo; they will be migrated
there in phase 4 as real commander subcommands replace the legacy bridge.

Removes 1478 lines of now-broken integration tests.
Test suite: 128 files, 2742 tests, stable across 3 runs.
1. SwapModule.deposit(): party currency assignment was positional, but
   canonicalSerialize() sorts invoice assets by coinId hash. When party A's
   currency hashed AFTER party B's, the deposit went to the wrong asset.
   Fix: select assetIndex by coinIdsMatch() against the manifest's
   party_X_currency_to_change. Verified with UT-SWAP-DEP-009/010/011
   regression tests using reverse-sorting currencies.

2. AccountingModule._handleIncomingTransfer(): payment-before-invoice race
   on swap payouts dropped attribution. The escrow's payInvoice publishes on
   the wallet filter, then sends the invoice DM on the chat filter; relays
   may deliver in either order. When the payment arrived first, the
   transfer was discarded because no invoice existed yet. Fix: buffer
   synthetic ref into a bounded orphan ledger (capped at 500 entries),
   replayed when the invoice DM arrives. Verified with UT-EVENTS-024b.

3. SwapModule.verifyPayout(): blocked when unrelated invalid tokens were
   present in the wallet (e.g. spent originals after a payout split).
   Fix: filter validate().invalid to tokens linked to THIS specific payout
   invoice via accounting.getTokenIdsForInvoice(). Added the public
   getTokenIdsForInvoice() helper.

index.ts re-exports normalizeCoinId / coinIdsMatch at the top level (escrow
and trader services need them for direction-aware deposit logic).
…v: ref already attributes the transfer

When _handleIncomingTransfer runs for a swap-payout transfer, two attribution
paths fire in the same call:

  1. _processTokenTransactions scans the token's TXF transactions for inv:
     references and writes ledger entries keyed `${tokenId}:${txIndex}::${coinId}`.
  2. The orphan-buffer fallback (added previously to fix the payment-before-
     invoice race) writes a synthetic ref keyed `mt:${tokenId}:${transferId}`
     when the invoice is not yet in the cache.

Both keys point at the SAME logical payment, but the keys are different so
no dedup fires. computeInvoiceStatus iterates every ledger entry and sums
the amounts → coverage doubles. For swap deposits this silently inflates
remaining-amount calculations and breaks invoice:covered firing.

Fix: before writing an `mt:` orphan entry, scan the ledger for any key
starting with `${tokenId}:` (the on-chain dedup-key prefix). If present,
the on-chain path already attributed → skip the orphan write. The
tokenInvoiceMap reverse-lookup is still updated unconditionally because
it's a Set and idempotent.

Adds AccountingModule.deliveryOrders.test.ts: comprehensive coverage of
every payment-vs-invoice arrival ordering.
  UT-DELIVERY-001  invoice → 1 transfer (cache-hit baseline)
  UT-DELIVERY-002  1 transfer → invoice (orphan, single)
  UT-DELIVERY-010  invoice → 2 transfers (sequential, both attributed)
  UT-DELIVERY-011  2 transfers → invoice (both orphan, preserved on import)
  UT-DELIVERY-020  orphan A → invoice → cache-hit B (both attributed)
  UT-DELIVERY-021  orphan A + orphan B → invoice → cache-hit C
  UT-DELIVERY-022  orphan A → invoice → close-succession B + C
  UT-DELIVERY-030  cross-invoice isolation: orphans for A and B don't bleed
  UT-DELIVERY-031  imported A + orphan B interleaved → routed correctly
  UT-DELIVERY-040  same orphan transfer emitted twice → idempotent
  UT-DELIVERY-050  orphan-buffer cap = 500 unique invoice IDs (DoS bound)

These tests caught the double-count bug in the previous implementation
(9 of 11 failed before the fix). Full suite (2757 tests) green after fix.
vrogojin added 10 commits April 28, 2026 15:17
… fail-closed, per-invoice cap

Three fixes from adversarial review:

1. AccountingModule._handleIncomingTransfer (load path) — the
   tokenInvoiceMap rebuild loop now handles `mt:` orphan keys. Previously
   only on-chain `${transferId}::${coinId}` entries were registered, so
   after wallet restart `getTokenIdsForInvoice` returned an empty set for
   any invoice that had been orphan-attributed in the previous session.
   SwapModule.verifyPayout would then filter the actual invalid token
   away as "unrelated", silently returning payoutVerified:true on an
   unverified payout. Adds UT-DELIVERY-060 regression test pinning the
   reload path.

2. SwapModule.verifyPayout — fails CLOSED when validate() found invalid
   tokens AND the payout-invoice token set is empty. Previously the
   filter-to-empty path silently fell through to "all checks passed",
   masking the verification gap exposed by (1) and by the cache-hit
   synthetic-ledger path that never populated tokenInvoiceMap.

3. AccountingModule._handleIncomingTransfer — adds a per-invoice cap
   (50 entries) on `mt:` orphan growth within a single unknown invoice
   id. The existing 500-invoice-id cap bounded distinct invoices but not
   entries per invoice, so a relay redelivering the same logical payment
   under different transport `transfer.id` values could grow the orphan
   ledger arbitrarily — each entry persisted on every dirty-flush.
…ken loop

Round-1 added a per-invoice orphan-entry cap (50) but only checked it
ONCE at the top of the orphan-handling block. A single transfer carrying
many tokens (transfer.tokens.length unbounded by transport — practical
limit is around 200 in our protocol) could write all N entries before
the next event re-checks. Worst-case overshoot was 50 + max_tokens, ~4×
the documented cap.

Fix: re-check `mtEntryCount` inside the per-token loop and `break` once
the cap is reached. tokenInvoiceMap updates for already-written tokens
still complete (so the reverse lookup stays consistent for the entries
we DID accept). Counter incremented only on a successful new write so
the cap reflects actual entries.
Round-3 NOTE: the round-2 break-comment claimed "still update the
tokenInvoiceMap below" but the `break` actually exits the for-loop
entirely, so subsequent tokens get NEITHER ledger entry NOR
tokenInvoiceMap update.

The behavior is correct (drop-everything-after-cap is the intended
DOS-bound semantics, and verifyPayout's new fail-closed branch makes a
capped-out token's lookup return empty → unverified, the safe outcome).
This is a comment-only fix to prevent confusing future readers.
The cache-hit `_processInvoiceTransferEvent` branch writes a `synthetic:`
ledger entry when an instant-mode token (no on-chain TXF transactions
yet) arrives for an already-imported invoice — but it skipped updating
`tokenInvoiceMap`. SwapModule.verifyPayout's fail-closed filter then
returned an empty `payoutTokenIds` set and refused to mark the payout
verified, even though coverage was correct.

Symptom: legitimate swap payouts that routed through the synthetic-
ledger path completed on the wire but stuck the trader's deal in
PAYOUT_UNVERIFIED after the 5-minute retry loop expired. Multi-agent
scenario 1 (3 traders, bob in 2 simultaneous swaps) reliably reproduced
this — both swaps received payouts, both reported `payoutVerified:false`,
both deals failed.

Fix: in the same branch that writes the synthetic ledger entry, iterate
the transfer's tokens and add (tokenId → invoiceId) to tokenInvoiceMap.
This is symmetric with the existing `_processTokenTransactions` and the
orphan-buffer paths, both of which already populate the reverse map.
The round-1 fail-closed addition (return false when invalid tokens exist
+ payoutTokenIds set is empty) was over-strict for the testnet COVERED-
but-not-yet-L3-confirmed case. Diagnostic shows the trader's local
validate() consistently returns the payout token as 'invalid' even after
10 minutes of retries — `transfer:confirmed` event never fires from
PaymentsModule for the orphan-/synthetic-attributed token.

The empty-tokenIds gap that fail-closed was protecting against is now
covered by:
  - the orphan-buffer rebuild loop's `mt:` branch (commit 102f943)
  - the synthetic-ledger path populating tokenInvoiceMap (commit 41a281d)
Both keep the reverse map consistent, so the "empty payoutTokenIds"
case shouldn't arise in normal flow.

Restoring fail-open on unrelated invalid tokens — coverage is the real
correctness invariant; verifyPayout's L3 check is supplemental and
testnet-flaky.
…aths

Independent review identified three composing critical bugs that re-opened
the security boundary verifyPayout was meant to enforce:

C1: verifyPayout fails DANGEROUSLY-open when payoutTokenIds.size === 0.
    Previously, an empty reverse-index (after restart, before rebuild
    completes; or for a synthetic-mode payout whose tokenInvoiceMap entry
    was never populated) caused `relevantInvalid.filter(...)` to return []
    even when validate() flagged real invalid tokens — verifyPayout then
    fell through to swap.payoutVerified = true on an unverified payout.
    Fix: when the reverse-index is empty AND validate().invalid.length > 0,
    fail-CLOSED. The next retry tick picks up after the rebuild/replay.

C2: tokenInvoiceMap reload missed `synthetic:` keys. The on-restart
    rebuild handled `mt:` orphan keys (round-1) and on-chain
    `${tokenId}:txIdx::coinId` keys, but synthetic-ledger keys
    `synthetic:${tokenId}::${coinId}` (used for instant-mode payouts)
    fell through unmatched. Combined with C1 this meant: post-restart
    instant-mode payouts always hit the empty-reverse-index path, and
    pre-fix that path was the silent fail-open. Fix: add a third branch
    parsing `synthetic:` keys; validate token IDs against /^[a-f0-9]{64}$/
    on all three branches to defend against on-disk-tamper poisoning the
    reverse index.

C3: synthetic-ledger entries were never persisted. Both write sites in
    AccountingModule (_processInvoiceTransferEvent and the history
    follow-up) mutated invoiceLedger and tokenInvoiceMap but never marked
    the invoice dirty — entries lived only in memory and were lost on
    restart, which compounded C2. Fix: track a `mutated` flag in each
    block and call dirtyLedgerEntries.add(invoiceId) +
    balanceCache.delete(invoiceId) on any change.

Effect: instant-mode payouts now persist and reload correctly; the
verifyPayout fail-closed branch acts as the safety net during the
rebuild window rather than being silently bypassed.

All 2758 unit tests pass (no behavioral regressions on the existing
on-chain / orphan paths).
Five warnings from the independent review, all defense-in-depth around
the swap and accounting paths:

W1 — orphan-flood DoS via unknown-ledger cap exhaustion. Without TTL,
500 garbage-memo transfers permanently consume the cap, dropping
legitimate orphan transfers forever. Fix: track per-invoice
firstSeen timestamps, sweep entries that haven't graduated to a real
import within 30 minutes on each _handleIncomingTransfer. Also clears
stale tokenInvoiceMap reverse-index entries during eviction so a
restart-after-flood doesn't surface ghost associations.

W2 — TOCTOU on per-invoice orphan cap. Two concurrent
_handleIncomingTransfer calls for the same unknown invoiceId could
each compute mtEntryCount independently and write up to 50 entries
each, doubling the worst-case bucket size. Fix: wrap the orphan-
buffer block in withInvoiceGate (gate already exists for
implicit-close path); inside the gate, re-check invoiceTermsCache
in case another caller imported the invoice between our cache-miss
and gate-acquisition.

W3 — coinIdsMatch ambiguous-slot match. findIndex returns first
match; a manifest with two slots that both match (degenerate or
hostile) silently pays into slot 0. Fix: after findIndex, scan
remaining slots; throw SWAP_DEPOSIT_FAILED on any second match.
Also assert manifest's expected currency is non-empty before lookup
(coinIdsMatch('','') would otherwise return true).

W4 — double-membership wallet (party A address == party B address).
Local wallet matching both branches silently picks party A. Fix:
detect the both-match case explicitly and throw SWAP_DEPOSIT_FAILED
with "ambiguous party identity" — hostile manifests can't fall back
to a deterministic-but-wrong choice.

W5 — newly-public coinIdsMatch / normalizeCoinId locks the encoding
scheme. Added @public JSDoc with explicit registry-mutability
caveat so downstream consumers know the heuristic and the
non-breaking-change boundary.

All 2758 unit tests pass.
… round-4

Round-4 audit found 2 NEW critical issues introduced by round-4 fixes:

C-NEW-1 (REENTRANT_GATE on race-resolved import). The W2 wrap put
the orphan block in withInvoiceGate(invoiceId, ...). When the inner
cache re-check found the invoice graduated, the round-4 code called
_processInvoiceTransferEvent (which acquires the SAME gate). AsyncGateMap
is non-reentrant — it throws REENTRANT_GATE rather than deadlocking.
The legitimate orphan-then-import race path threw runtime errors instead
of doing the right thing. Round-5 fix: collect the graduation decision
inside the gate, exit the gate, then call the helper outside so its own
gate acquisition is fresh.

C-NEW-2 (synthetic transferId fallback mis-attributed at rebuild). The
synthetic-key writer falls back to `synthetic:${transferId}::${coinId}`
when the transfer has no token id. transfer.id is a 64-hex Nostr event
id; round-4's HEX_64-validated rebuild parser accepts it as a "tokenId"
and pollutes tokenInvoiceMap with the Nostr event id. No real token has
that id, so verifyPayout's reverse lookup returns empty and the new
fail-CLOSED branch fires for legitimate post-restart payouts via the
fallback path. Round-5 fix: distinct prefix `synthetic-tx:` so the
rebuild parser ignores the entry. Existing dedup-against-synthetic
loop in _processTokenTransactions also recognizes the new prefix.

Plus W1 perf: sweep was O(N×M) on every _handleIncomingTransfer.
Now amortized: skip when ran within last 60s AND cap not full. Under
flood (cap full), every call sweeps so legitimate transfers can land.

All 2758 unit tests pass.
Round-5 verifier ruled FIX FIRST on two issues:

HIGH (migration gap): on-disk ledger files written by buggy round-4
builds contain `synthetic:<NOSTR_EVENT_ID>::<coinId>` entries. Nostr
event ids are 64-hex, so the round-5 rebuild parser HEX_64.test() passes
and pollutes tokenInvoiceMap with garbage entries on first reload after
upgrade. Round-5b fix: gate the synthetic: rebuild on
`ref.transferId !== parsed_tokenId`. Legitimate `synthetic:` entries
have `ref.transferId` carrying a transport id (not a tokenId-shaped
value), so the inequality holds. Bug-written entries match the
fingerprint and are silently skipped.

WARNING (TTL busy-loop at saturation): when the cap is full and no
entries are expired, the sweep loop ran O(N=500) on every transfer
indefinitely. Round-5b: explicitly advance unknownLedgerNextSweepMs
inside the sweep block (was advanced unconditionally; now documented
that the advance covers the cap-full no-eviction case too). Sweep
cadence is now bounded by 60s windows in all paths.

All 2758 unit tests pass.
Round-5b commit accidentally swept in 51 files from the local .claude/
agent harness directory (CLAUDE.md, hooks, skills, identity.json with
private nsec). These have nothing to do with the PR — they're per-
developer agent tooling that shouldn't be checked in.

This commit:
1. Untracks .claude/ via git rm --cached (files remain on disk).
2. Adds .claude/ to .gitignore so future commits don't re-include.

The actual source fix from round-5b (synthetic: rebuild migration
guard + TTL sweep cadence) is in the previous commit and remains
intact in modules/accounting/AccountingModule.ts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant