Skip to content

WP Origin: async initial-import seeder with progress UI#31

Merged
adamziel merged 8 commits intotrunkfrom
adamziel/origin-async-seed
Apr 28, 2026
Merged

WP Origin: async initial-import seeder with progress UI#31
adamziel merged 8 commits intotrunkfrom
adamziel/origin-async-seed

Conversation

@adamziel
Copy link
Copy Markdown
Contributor

Stacks on #28. Once that lands, GitHub will retarget this PR to trunk.

The wpdb-filesystem PR makes the plugin's first request build a single "Sync from WordPress" commit covering every post on the site. On a small dev site that's instant; on a real site with thousands of posts it dies on max_execution_time. This adds a resumable state-machine seeder that runs from WP-Cron and only opens the repository for clone/push/pull once the import has finished.

How it works

State machine, persisted in wp_options:

  • pending → activation queues a one-off cron event.
  • in_progress → each tick converts a batch of posts to Markdown, creates a "Seed batch" commit on refs/heads/_wp_origin_seed, updates the progress option, and reschedules itself when 15 s elapses or memory hits 70% of the limit.
  • finalizing → the next tick reads the seed branch's tree, creates a single parent-less "Initial import from WordPress" commit pointing at it, sets refs/heads/trunk to that commit, and drops the seed branch. Clones now see one clean root commit.
  • done → repository is open for business.
  • failed → admin can retry from the Tools → WP Origin page.

A transient lock prevents concurrent ticks. No Action Scheduler dependency — plain WP-Cron only.

What clients see

While state is anything but done, every Smart-HTTP request returns HTTP 503 + Retry-After: 15 with a plaintext body:

WP Origin is preparing the repository (40%, 200/500 posts). Please try again shortly.

Better than a half-built history.

What admins see

A new Tools → WP Origin page shows a live progress bar (state, percent, processed / total, last message) by polling /wp-json/wp-origin/v1/seed-status every 2 s. A "Retry import" button POSTs to /wp-json/wp-origin/v1/seed-retry for the failed-state case.

Test plan

The existing wp-origin e2e job in CI is the proof:

  • Workflow now updates the seeded posts before activating the plugin, then drives wp cron event run --due-now in a 30-iteration loop until wp_origin_seed_state reaches done. Failure of that loop fails the job.
  • Two new PHPUnit assertions in EndToEndTest.php:
    • testSeedStatusReportsDone/wp-json/wp-origin/v1/seed-status returns state=done, percent=100, total>0.
    • testInitialCommitIsParentless — the first commit on trunk is "Initial import from WordPress" with zero parents and no "Seed batch" subjects leaked through.
  • Existing round-trip test still passes (clone, push, fresh-clone-sees-history, etc).

@adamziel adamziel force-pushed the adamziel/origin-async-seed branch from 8415bf7 to 14abe68 Compare April 28, 2026 15:27
@adamziel adamziel changed the base branch from adamziel/wpdb-filesystem to trunk April 28, 2026 15:27
The plugin's first run on an existing WordPress install needs to
convert every supported post and page into Markdown. On a site with
many posts that easily blows past max_execution_time, so this
introduces a state-machine seeder driven by WP-Cron — no Action
Scheduler dependency.

Each cron tick converts a chunk of posts to Markdown, creates a
"Seed batch" commit on a side branch, and re-schedules itself when
the time/memory budget is up. Once every post is staged the next
tick collapses the side branch into a single parent-less "Initial
import from WordPress" commit on trunk and the side branch is
dropped, so clones see one clean initial commit no matter how many
batches it took.

Smart-HTTP requests are rejected with HTTP 503 + Retry-After until
state reaches `done`, so a client cloning during seeding sees
"WP Origin is preparing the repository (40%, 200/500 posts). Please
try again shortly." rather than a half-built history.

A new Tools → "WP Origin" admin page polls a small REST status
endpoint for live progress (state, percent, processed/total, last
message). The same endpoint backs the e2e test's "wait until done"
loop in CI.
Until now the e2e job seeded 2 posts and finished the import in a
single tick — which left "the seeder reschedules itself when the
budget runs out and resumes correctly across cron runs" completely
untested. Real production sites will hit that path, so make CI hit
it too.

The seeder grows three filters — `wp_origin_seed_batch_size`,
`wp_origin_seed_time_budget_seconds`, and
`wp_origin_seed_tick_reschedule_seconds` — and ships a mu-plugin in
`plugins/wp-origin/Tests/ci-mu-test-helper.php` that the workflow
drops into `wp-content/mu-plugins/`. The mu-plugin shrinks the batch
to 5, the time budget to 0 seconds, and the reschedule delay to 0
seconds, which guarantees the seeder reschedules itself after every
batch.

The workflow now generates 28 bulk posts (30 total with the seeded
two), so the import requires 6 batches plus a finalize tick. The
existing `wp cron event run --due-now` polling loop drives them,
with no Action Scheduler dependency.

Tracking is exposed: a `tick_count` field is incremented in `tick()`
and surfaced through `/wp-json/wp-origin/v1/seed-status`. A new
PHPUnit assertion, `testSeedingSpansMultipleCronTicks`, fails if the
import ever finishes in a single tick — the resumability guarantee
is now part of CI.
@adamziel adamziel force-pushed the adamziel/origin-async-seed branch from b7d2281 to 3392231 Compare April 28, 2026 18:28
A self-contained docker compose stack that spins up MySQL + WordPress
on http://localhost:8090, generates 120 posts, mounts the WP Origin
plugin source from the host (so live edits work), and leaves the
plugin DEACTIVATED. The user clicks Activate from
/wp-admin/plugins.php and watches the seeder build the initial
import at /wp-admin/tools.php?page=wp-origin.

A small mu-plugin shrinks the seeder's batch size to 5 and its time
budget to 0 seconds, so finishing 120 posts takes ~24 visible cron
ticks instead of one — the progress bar actually has work to show.
After clicking Activate, the user landed on the plugins screen with
the seeder still in `pending` because WP-Cron only fires when a
visitor hits the site. On a fresh install with no traffic, the
progress bar would sit at 0% indefinitely until the user discovered
the Tools → WP Origin page on their own.

Two small changes fix that:

The activated_plugin hook now redirects single-plugin activations to
/wp-admin/tools.php?page=wp-origin so progress is the first thing the
user sees. Bulk activations and CLI/AJAX flows are skipped.

The seed-status REST endpoint also runs `WP_Origin_Seeder::tick()`
when state isn't `done`. The transient lock inside tick() makes that
safe to call on every poll, so the admin page itself drives the
import forward — one batch every time the JS polls — without
depending on outside traffic. End result: open the page, watch the
bar move, no waiting.
The activated_plugin redirect lands the user on the progress page,
but render_page() previously read state from the option as it stood
at activation time — pending, "Queued. Waiting for the first cron
tick." The user briefly saw a stalled-looking page before the JS
poll fired the first tick.

Run a tick synchronously at the top of render_page() so the option
is already advanced when we read it. By the time the HTML hits the
browser the bar is in motion and the message reads "Imported N / M
posts." instead of "Queued."
Three fixes for the seeder admin UX.

The duplicate-key errors on re-activate were the WpdbFilesystem
rename() running UPDATE files SET path=$dest WHERE path=$tmp into a
row that already existed at $dest. Git's atomic-write pattern
(write to .tmp, rename to the content-addressed final path) replays
the same final path many times, once per identical blob, so the
collision is normal and idempotent — the file already there has the
same content. Fix: DELETE $dest before the UPDATE, and REPLACE INTO
the directory_entries row instead of INSERT. Without this the .tmp
row sticks around forever, corrupting the object store.

Activation and the Retry button now drop the wp_*_files /
wp_*_directory_entries tables before scheduling the seeder, so a
fresh import always starts on an empty repository even if a previous
attempt left half-written objects behind. (The docker demo re-runs
this path every time we reset the container.)

The seed-status REST endpoint and render_page() now drive the
seeder in a 1.5-second loop instead of one tick per call. With the
demo's 0-second batch budget that's many batches per page hit, so
the progress bar shows real movement on every paint instead of
stalling between JS polls. The first render lands at ~80/123 posts
with a populated commit log instead of "Imported 5 / 123 posts."
and stalling.

A new "Commit log" section on the admin page lists the most recent
commits on the staging branch (or trunk after finalization), backed
by a new WP_Origin_Seeder::get_commit_log() helper that walks back
from the branch tip. The list updates live as the JS poll runs.
@adamziel adamziel merged commit 9ef97f9 into trunk Apr 28, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant