WP Origin: async initial-import seeder with progress UI#31
Merged
Conversation
8415bf7 to
14abe68
Compare
The plugin's first run on an existing WordPress install needs to convert every supported post and page into Markdown. On a site with many posts that easily blows past max_execution_time, so this introduces a state-machine seeder driven by WP-Cron — no Action Scheduler dependency. Each cron tick converts a chunk of posts to Markdown, creates a "Seed batch" commit on a side branch, and re-schedules itself when the time/memory budget is up. Once every post is staged the next tick collapses the side branch into a single parent-less "Initial import from WordPress" commit on trunk and the side branch is dropped, so clones see one clean initial commit no matter how many batches it took. Smart-HTTP requests are rejected with HTTP 503 + Retry-After until state reaches `done`, so a client cloning during seeding sees "WP Origin is preparing the repository (40%, 200/500 posts). Please try again shortly." rather than a half-built history. A new Tools → "WP Origin" admin page polls a small REST status endpoint for live progress (state, percent, processed/total, last message). The same endpoint backs the e2e test's "wait until done" loop in CI.
Until now the e2e job seeded 2 posts and finished the import in a single tick — which left "the seeder reschedules itself when the budget runs out and resumes correctly across cron runs" completely untested. Real production sites will hit that path, so make CI hit it too. The seeder grows three filters — `wp_origin_seed_batch_size`, `wp_origin_seed_time_budget_seconds`, and `wp_origin_seed_tick_reschedule_seconds` — and ships a mu-plugin in `plugins/wp-origin/Tests/ci-mu-test-helper.php` that the workflow drops into `wp-content/mu-plugins/`. The mu-plugin shrinks the batch to 5, the time budget to 0 seconds, and the reschedule delay to 0 seconds, which guarantees the seeder reschedules itself after every batch. The workflow now generates 28 bulk posts (30 total with the seeded two), so the import requires 6 batches plus a finalize tick. The existing `wp cron event run --due-now` polling loop drives them, with no Action Scheduler dependency. Tracking is exposed: a `tick_count` field is incremented in `tick()` and surfaced through `/wp-json/wp-origin/v1/seed-status`. A new PHPUnit assertion, `testSeedingSpansMultipleCronTicks`, fails if the import ever finishes in a single tick — the resumability guarantee is now part of CI.
b7d2281 to
3392231
Compare
A self-contained docker compose stack that spins up MySQL + WordPress on http://localhost:8090, generates 120 posts, mounts the WP Origin plugin source from the host (so live edits work), and leaves the plugin DEACTIVATED. The user clicks Activate from /wp-admin/plugins.php and watches the seeder build the initial import at /wp-admin/tools.php?page=wp-origin. A small mu-plugin shrinks the seeder's batch size to 5 and its time budget to 0 seconds, so finishing 120 posts takes ~24 visible cron ticks instead of one — the progress bar actually has work to show.
After clicking Activate, the user landed on the plugins screen with the seeder still in `pending` because WP-Cron only fires when a visitor hits the site. On a fresh install with no traffic, the progress bar would sit at 0% indefinitely until the user discovered the Tools → WP Origin page on their own. Two small changes fix that: The activated_plugin hook now redirects single-plugin activations to /wp-admin/tools.php?page=wp-origin so progress is the first thing the user sees. Bulk activations and CLI/AJAX flows are skipped. The seed-status REST endpoint also runs `WP_Origin_Seeder::tick()` when state isn't `done`. The transient lock inside tick() makes that safe to call on every poll, so the admin page itself drives the import forward — one batch every time the JS polls — without depending on outside traffic. End result: open the page, watch the bar move, no waiting.
The activated_plugin redirect lands the user on the progress page, but render_page() previously read state from the option as it stood at activation time — pending, "Queued. Waiting for the first cron tick." The user briefly saw a stalled-looking page before the JS poll fired the first tick. Run a tick synchronously at the top of render_page() so the option is already advanced when we read it. By the time the HTML hits the browser the bar is in motion and the message reads "Imported N / M posts." instead of "Queued."
Three fixes for the seeder admin UX. The duplicate-key errors on re-activate were the WpdbFilesystem rename() running UPDATE files SET path=$dest WHERE path=$tmp into a row that already existed at $dest. Git's atomic-write pattern (write to .tmp, rename to the content-addressed final path) replays the same final path many times, once per identical blob, so the collision is normal and idempotent — the file already there has the same content. Fix: DELETE $dest before the UPDATE, and REPLACE INTO the directory_entries row instead of INSERT. Without this the .tmp row sticks around forever, corrupting the object store. Activation and the Retry button now drop the wp_*_files / wp_*_directory_entries tables before scheduling the seeder, so a fresh import always starts on an empty repository even if a previous attempt left half-written objects behind. (The docker demo re-runs this path every time we reset the container.) The seed-status REST endpoint and render_page() now drive the seeder in a 1.5-second loop instead of one tick per call. With the demo's 0-second batch budget that's many batches per page hit, so the progress bar shows real movement on every paint instead of stalling between JS polls. The first render lands at ~80/123 posts with a populated commit log instead of "Imported 5 / 123 posts." and stalling. A new "Commit log" section on the admin page lists the most recent commits on the staging branch (or trunk after finalization), backed by a new WP_Origin_Seeder::get_commit_log() helper that walks back from the branch tip. The list updates live as the JS poll runs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on #28. Once that lands, GitHub will retarget this PR to trunk.
The wpdb-filesystem PR makes the plugin's first request build a single "Sync from WordPress" commit covering every post on the site. On a small dev site that's instant; on a real site with thousands of posts it dies on max_execution_time. This adds a resumable state-machine seeder that runs from WP-Cron and only opens the repository for clone/push/pull once the import has finished.
How it works
State machine, persisted in
wp_options:pending→ activation queues a one-off cron event.in_progress→ each tick converts a batch of posts to Markdown, creates a "Seed batch" commit onrefs/heads/_wp_origin_seed, updates the progress option, and reschedules itself when 15 s elapses or memory hits 70% of the limit.finalizing→ the next tick reads the seed branch's tree, creates a single parent-less "Initial import from WordPress" commit pointing at it, setsrefs/heads/trunkto that commit, and drops the seed branch. Clones now see one clean root commit.done→ repository is open for business.failed→ admin can retry from the Tools → WP Origin page.A transient lock prevents concurrent ticks. No Action Scheduler dependency — plain WP-Cron only.
What clients see
While state is anything but
done, every Smart-HTTP request returns HTTP 503 +Retry-After: 15with a plaintext body:Better than a half-built history.
What admins see
A new Tools → WP Origin page shows a live progress bar (state, percent, processed / total, last message) by polling
/wp-json/wp-origin/v1/seed-statusevery 2 s. A "Retry import" button POSTs to/wp-json/wp-origin/v1/seed-retryfor the failed-state case.Test plan
The existing wp-origin e2e job in CI is the proof:
wp cron event run --due-nowin a 30-iteration loop untilwp_origin_seed_statereachesdone. Failure of that loop fails the job.EndToEndTest.php:testSeedStatusReportsDone—/wp-json/wp-origin/v1/seed-statusreturnsstate=done, percent=100, total>0.testInitialCommitIsParentless— the first commit ontrunkis "Initial import from WordPress" with zero parents and no "Seed batch" subjects leaked through.