This is a content migration tool which migrates the content differential from one or more remote WordPress sites on top of the destination site, while keeping the existing destination site's content intact.
The Newspack Content Diff Migrator is designed to synchronize content between remote sites (also addressed as "live site", after Newspack's own migration workflow) and a target destination site (also addressed as "staging site", or "local site") by importing only the new and modified content from the remote site. This is useful for maintaining staging environments which need to stay current with production content without overwriting staging-specific changes.
The plugin migrates all the database content, while files synchronization should be done additionally.
- Incremental "diff" Migration: Can be run multiple times to migrate the entire new and modified content differential, and the subsequent migrations will resume from the last successful step
- Preserves Local Content: Keeps existing local content intact during migration
- Multi-Source Migration: Supports importing content from multiple source hostnames, each with its own database tables and content, all the while preserving the integrity of the local content
- Selective Import: Only imports new or modified content from the live site
- Content Coverage: Handles posts, pages, attachments, users, comments, basic taxonomies (category,post_tag,author) as well as custom taxonomies and custom post types, all the while preserving the integrity of the local content
- Error Handling, Logging, CSV user friendly reports and Graceful Degradation: Comprehensive error logging and recovery mechanisms, provides detailed logging for troubleshooting, and continues processing even when individual items fail
- Side-by-side Tables: Works with remote site's database tables temporarily imported alongside local tables with a different table prefix
Use latest release from the Newspack Plugins Repository, or install from repository:
- Clone or download the plugin to your WordPress plugins directory
- Navigate to the plugin directory
- Run
composer installto install dependencies
- Import Live Tables: Import live site database tables with a specific prefix (e.g.,
cdiff_) - Run Migration (two commands, run in sequence):
# Step 1: Search for new/modified content
wp newspack-content-diff-migrator search-new-content-on-live \
--live-table-prefix=cdiff_ \
--source-hostname=www.example-1.com \
--data-dir=/tmp/cdiff_data
# Step 2: Migrate the identified content
wp newspack-content-diff-migrator migrate-live-content \
--live-table-prefix=cdiff_ \
--source-hostname=www.example-1.com \
--data-dir=/tmp/cdiff_data
⚠️ Important:
- Always run
search-new-content-on-livebeforemigrate-live-contentfor each migration cycle, it's a prerequisite -- unless when resuming a previously interrupted migration, then re-run the migrate command which has been interrupted using that same--data-dir - Always use the same
--data-dirfor search and migrate commands in the same migration cycle, it's used to store the migration run-state data and logs. See recommendation below - Always use a new
--data-dirfor a new migration cycle, to keep separate records and preserve previous logs for debugging or resuming a previous migration which was interrupted
New to the plugin or prefer a guided interactive mode? Use the index command:
wp newspack-content-diff-migrator indexThis launches an interactive menu that guides you through selecting any of the available commands (such as search-new-content-on-live and migrate-live-content), with their descriptions, and prompts you for all required arguments step-by-step. Perfect for learning the plugin or running commands without memorizing syntax.
The two migration commands must be executed in sequence: first search-new-content-on-live, then migrate-live-content.
Searches and identifies for new and modified posts in the live site tables, and notes their IDs to be migrated/updated by the migrate command. This command must be run before migrate-live-content.
How the search command works:
- Auto-attribution: Before checking for new/modified content, the command scans for unattributed local content (i.e. local content without
newspackcontentdiff_oldid_*metas, which serve as labels for migration, and contain original content's ID and source hostname) and automatically matches it against live tables using ID fields comparison (like title, slug, date, type, etc.). This handles scenarios like cloned sites where content exists locally but hasn't been attributed yet. It works automatically, with brief message prompts to CLI and detailed logs about auto-attribution ops. Content is matched against live tables in this way:
- Posts/CPTs: Matched by title + slug + date + type
- Attachments: Matched by title + slug + date
- Users: Matched by user_login
- Terms: Matched by slug + taxonomy
-
New content detection: The command determines "new" content by checking if each live content object is referenced in the local
newspackcontentdiff_oldid_*metas. If a live ID is not in this local meta mapping, it's considered new and queued for import. -
Modified content detection: Following the Newspack Migration Data Consistency Standard (described in detail below), specific objects and fields are examined for changes (like post_modified date, post_status, post_author, featured image, and taxonomies). If any of these fields have changed, the content is considered modified and queued for update (either by a full reimport such as for posts, or individual field updates, depending on the object type).
wp newspack-content-diff-migrator search-new-content-on-live \
--live-table-prefix=<prefix> \ // Prefix of the live site tables in DB (e.g., `cdiff_`)
--source-hostname=<hostname> \ // e.g. www.example-1.com
--data-dir=<path> \ // Directory to store migration run-state data, progress and logs
[--post-types-csv] // Optionally include extend default values with `guest-author` for CAP's Guest AuthorsReviewing "modified" posts:
After running the search command, and before running the migrate command, you can review which posts were flagged as modified. The file <data-dir>/run-state/modified_ids.json contains all "modified" posts which will be deleted and fully reimported by the migrate command (and they will preserve the same local ID on this reimport, to ensure any references to that reimported local post ID remain valid). Example of the file contents:
[
{"live_id": 123, "local_id": 456, "changes": {"post_modified": {"live": "2025-03-10 12:00:00", "local": "2025-01-15 08:30:00"}}},
{"live_id": 789, "local_id": 101, "changes": {"post_status": {"live": "publish", "local": "draft"}}}
]To exclude specific "modified" posts from being reimported, simply remove the entries from modified_ids.json before running migrate-live-content. This is useful when content was manually attributed via attribute-ids and field differences are expected (e.g., timezone-shifted dates, transformed slugs from external migration tools).
Note: Excluding posts from
modified_ids.jsononly prevents the full post reimport. Other objects (users, attachments, terms) and their fields are still checked against Newspack Migration Data Consistency Standard (MDCS) during the migrate command — those per-field updates are presently not ovrerridable (but could be with a custom flag in the future, if needed) and will still be applied if their field values differ. See the Migration Data Consistency Standard for which fields are updated on each object type.
Imports the content differential identified by search-new-content-on-live. Must be run after the search command.
wp newspack-content-diff-migrator migrate-live-content \
--live-table-prefix=<prefix> \ // Prefix of the live site tables in DB (e.g., `cdiff_`)
--source-hostname=<hostname> \ // e.g. www.example-1.com
--data-dir=<path> \ // Same directory used in the search command
[--custom-taxonomies-csv] // List of optional custom taxonomies to migrateGenerated Reports
At the end of each migration import run, CSV reports are created in the reports/ subfolder within your --data-dir. These CSVs are human-friendly summaries of all the key migration activity, and they are derived from (duplicated from) the run-state JSONL files for your easy review.
| File | Description | Columns |
|---|---|---|
reports/posts.csv |
All migrated or reimported posts, pages, attachments, and custom post types | status, post_type, id_old, id_new |
reports/users.csv |
All imported, merged, or modified users | status, id_old, id_new |
reports/terms.csv |
All imported or merged terms (categories, tags, custom taxonomies) | status, term_id_old, term_id_new, taxonomy |
status column possible values:
- imported: Record was newly created during this migration
- modified: Record was deleted and reimported, or had fields updated
- merged: Record with the same unique identifier already existed locally and was merged/reused
Note: The ID update operations (post parent, featured image, block attachment IDs) process all previously imported content from the source hostname, not just the current batch. This serves as a self-healing mechanism if an attachment import failed in a previous run, but succeeds in a later run, all posts that reference that attachment will have their IDs correctly updated.
Simply lists all previously migrated source hostnames (by looking up metas newspackcontentdiff_oldid_{hostname} in local postmeta, usermeta, and termmeta tables). Useful for quickly checking what sources have already been migrated.
wp newspack-content-diff-migrator list-previously-migrated-source-hostnamesThis command was created for custom migration scenarios, where multiple migration tools are used for same source site, and it's used to "attribute" (i.e. label, set metas) the content to a source hostname, so that the CDiff can properly work with it.
Use case: Let's say that some content from a source hostname was migrated by using a different migration tool (for example the Ghost CMS migrator, a custom migrator, WP Importer, etc.) which might have transformed some of the content identifier fields (e.g. it got different slugs, dates, etc.). And let's say that then for some reason you also wish to use the CDiff to migrate the second part of this same source hostname site's content. If you have the ID mappings from the external tool (old IDs => new IDs), you just need to run this command to "attribute" (i.e. set the "old the IDs and source hostname" metas) to the custom-migrated content, and the CDiff will be able to migrate the rest, without creating duplicates.
How it works:
-
New content. New content is determined solely by whether local content ID has the "old ID meta" with the live content. If a live content ID is present in local meta, it is not new; if a live ID is not present in local meta, it will be marked as new by the search command, and imported by the migrate command. It does not re-compare content fields for new content (title, slug, date). This is intentional: as mentioned, external migration tools may have transformed those fields, making field-based matching impossible.
-
Modified content. Modified content is determined by comparing the local content ID with the live content ID, and if they differ, the content is marked as modified and will be deleted and reimported by the migrate command. See "Reviewing "modified" posts" above for more details.
Arguments At least one of these arguments is required:
--source-hostname(required): Source hostname (e.g., www.example.com)--data-dir(required): Data directory for logs and reports--post-ids(optional): Path to JSONL file with post ID pairs--attachment-ids(optional): Path to JSONL file with attachment ID pairs--user-ids(optional): Path to JSONL file with user ID pairs--term-ids(optional): Path to JSONL file with term ID pairs
wp newspack-content-diff-migrator attribute-ids \
--source-hostname=www.example.com \
--data-dir=/tmp/migration_data \
--post-ids=/tmp/post_ids.jsonl \
--user-ids=/tmp/user_ids.jsonlDisplays a comparison table of collations between live and local WordPress tables. Useful for diagnosing character encoding issues.
wp newspack-content-diff-migrator display-collations-comparison \
--live-table-prefix=<prefix> \
[--skip-tables=<csv>] \
[--different-collations-only]Automatically fixes collation mismatches between live and local tables. Speed is auto-determined based on total table size.
wp newspack-content-diff-migrator correct-collations-for-live-wp-tables \
--live-table-prefix=<prefix> \
[--skip-tables=<csv>]Note: The migration commands (search and migrate) automatically run collation fixes when needed, so you typically don't need to run this manually.
The --data-dir parameter stores logs and run-state data in a run-state subfolder (e.g., /tmp/cdiff_data/run-state/).
The run-state data includes:
manifest.json— Migration summary (see fields below)new_ids.json/modified_ids.json— IDs to be imported or reimportedimported_posts.jsonl,updated_*.jsonl— Progress tracking files for resume capability
Example manifest.json:
{
"created_at": "2025-03-15 14:30:00",
"source_hostname": "www.example.com",
"search_status": "completed",
"migrate_status": "completed",
"live_table_prefix": "cdiff_",
"post_types": ["post", "page", "attachment", "wp_block"],
"taxonomies": ["category", "post_tag", "author", "brand"],
"counts": {
"new_ids": 150,
"modified_ids": 25
}
}If a migration command is interrupted (e.g., timeout, crash), simply run the same command again with the same --data-dir. The migration will pick up from where it left off.
Once a migration completes successfully, you can start a fresh migration cycle at any time. Use a new --data-dir to keep separate records and preserve previous logs for debugging.
If you attempt to run search-new-content-on-live with a --data-dir that contains existing run-state files, the command will exit with a message to use a new --data-dir to protect your previous migration data and logs (useful for debugging or resuming a previous migration which was interrupted).
The Migration Data Consistency Standard (see internal P2 for more details defines how new and modified content is handled, which fields get updated on subsequent migration runs and which fields are ignored.
It contains specific filtering rules, which are optimal for Newspack's own migration workflow, and ensures that changes made on the live site are reflected on the local site with a curated set of rules.
The plugin uses two complementary update strategies internally, depending on the object type:
| Object Type | Detection | Update Method |
|---|---|---|
| Posts | Checked on each run | Full reimport (delete + reimport) |
| Custom Post Types | Same fields as posts | Full reimport (delete + reimport) |
| Pages | Not checked | Import once only (first run) |
| Attachments | Checked on each run | Individual field updates |
| Users | Checked on each run | Individual field updates |
| Terms (categories, tags) | Checked on each run | Individual field updates |
During the search-new-content-on-live command, each object type is checked for modifications according to the Migration Data Consistency Standard. Understanding this logic may be important for debugging why a particular object was (or wasn't) flagged as modified.
The search command runs 6 checks in order for each previously imported post. Once one fo these checks triggers, the post is marked as modified and will be deleted and reimported (with same existing local ID, to preserve any references to that reimported local post ID).
| # | Field | Comparison |
|---|---|---|
| 1 | post_modified |
Local and live values directly compared |
| 2 | post_status |
Local and live values directly compared |
| 3 | comment_count |
Local and live values directly compared |
| 4 | post_author |
Old VS New ID mapping is compared to detect a change |
| 5 | _thumbnail_id |
Old ID VS New ID mapping is compared to detect a change |
| 6 | Taxonomies | Old IDs VS New IDs mapping is compared to detect any changes |
Pages and attachments are excluded from these modification checks as per the Migration Data Consistency Standard.
Note that locally added terms (such as Newspack Brands assigned during Newspackification) will not trigger the modification detection, thanks to the fact that they don't have a newspackcontentdiff_oldid_{hostname} termmeta. Such local terms without the metas are skipped during the taxonomy comparison to allow for some Newspack customizations to posts in migration.
All users are processed on every migration run, and modification is detected by comparing:
- Email — live VS local
user_email - Display name — live VS local
display_name - Avatar — Simple Local Avatars attachment ID (Old VS New ID mapping is compared to detect a change)
Users are uniquely matched by user_login -- not by old_id meta. If any of the above fields differ, the user is marked as modified and those changed fields get updated individually/directly (not by a full reimport, only posts are reimported this way).
Attachments are not checked for modification in the same way as posts. Instead, specific fields are compared individually and updated directly:
- Caption (
post_excerpt) - Alt text (
_wp_attachment_image_alt) - Description (
post_content) - Credit (
_media_credit) - Credit URL (
_media_credit_url)
Each field is compared independently, and only the fields that actually differ are updated.
Terms are matched by name + taxonomy + parent (not by old_id meta). Specific fields are compared and updated individually:
- Slug — updated if different
- Description — updated if different
Name and parent are identifier fields and are not updated.
The following fields are directly scanned for changes (see check order above) and a change on any of these fields triggers a full post reimport ("modified" post is deleted and reimported):
- Date modified — compared directly (
post_modified), and live timestamp must be newer - Status — compared directly (
post_status) - Comment count — compared directly (
comment_count) - Author — local
post_author(Old VS New ID mapping is compared to detect a change) - Featured image — local
_thumbnail_id(Old VS New ID mapping is compared to detect a change) - Category/tags/taxonomies — local term IDs (Old IDs VS New IDs mapping is compared to detect any changes), which also covers changes in 'author' (Guest Author) taxonomy term changes
The following fields are not scanned directly, but changes to them will bump post_modified (when edited through Gutenberg), which triggers the post_modified check above and causes a full reimport:
- Content — detected indirectly via
post_modifiedchange - Excerpt — detected indirectly via
post_modifiedchange
The following fields are not detected (changes on live will not trigger reimport, unless another scanned field also changed):
- Postmeta
- Comments
The following are identifier fields — changes to these fields are ignored unless another field triggers reimport:
- Title — not scanned; only updated if post is reimported
- Slug — not scanned; only updated if post is reimported
- Date published — not scanned; only updated if post is reimported
When any directly-scanned field has changed, or when post_modified is newer on live, the post is marked as modified and will be deleted and fully reimported. Such a full reimport preserves its local wp_posts.ID to ensure any references to the reimported local post ID remain valid.
The directly-scanned fields for author, featured image, and taxonomies exist because WordPress does NOT update post_modified when these are changed in Gutenberg — so they must be checked independently.
Same rules as posts.
- Pages are migrated only once during the first migration run
- Changes to page fields on live do not get updated on local
- New pages are not imported during consecutive migration runs (this specifically serves Newspack's own migration workflow best)
- Name — does not get updated (identifier field)
- Parent — does not get updated (identifier field)
- Slug — gets updated directly
- Description — gets updated directly
Same rules as categories.
- Name — does not get updated (identifier field)
- Slug — gets updated directly
- Description — gets updated directly
Same rules as post tags.
All users get migrated fully during every migration run (to migrate subscribers and subscription data).
- Username/login — identifier field; if changed, a new user gets inserted (original user remains)
- Email — gets updated directly
- Display name — gets updated directly
- Avatar (Simple Local Avatars) — gets updated directly
- Caption — gets updated directly (
post_excerpt) - Alt text — gets updated directly (
_wp_attachment_image_alt) - Description — gets updated directly (
post_content) - Credit — gets updated directly (
_media_credit) - Credit URL — gets updated directly (
_media_credit_url)
Why Full Reimport?
Modified posts use a "full reimport" strategy: the local post is deleted and then reimported fresh from the live site.
However this reimport of the modified post preserves its local wp_posts.ID. The post gets reimported with the same ID, ensuring any external references to the reimported local post ID remain valid. The original live ID is also preserved in the newspackcontentdiff_oldid_{hostname} postmeta for mapping.
This approach elegantly handles the complexity of post updates:
- Block content: Attachment IDs embedded in Gutenberg blocks are automatically updated to local IDs
- Featured images: Thumbnail references are properly mapped to local attachment IDs
- Taxonomies: All term relationships are reimported fresh
- Postmeta: All post metadata is synchronized (note: postmeta changes do not trigger reimport)
- Comments: All comments and comment metadata are reimported (note: comment-only changes do not trigger reimport)
- Parent references: Post parent IDs are updated to local IDs
This single operation ensures all related data is consistent, rather than attempting to diff and update individual fields which could miss embedded ID references in content. Additionally it cause no performance overhead compared to the alternative of updating individual fields which could miss embedded ID references in content.
When importing from multiple sources, certain entities with the same unique identifiers get merged into a single local entity rather than creating duplicates. This is by design and handled gracefully without crashes.
Fields that cause merging:
| Entity | Unique Field(s) | Merge Behavior |
|---|---|---|
| Users | user_login |
Same username from different sources → merged to one user |
| Categories | name + parent |
Same category name under same parent → merged to one |
| Tags | name |
Same tag name → merged to one |
What happens when entities are merged:
- The first import creates the entity (user, term) with the source's old_id meta
- Subsequent imports from other sources find the existing entity and:
- Add their own source-specific old_id meta (e.g.,
newspackcontentdiff_oldid_www.source-b.com), and so these records will have multiple old_id metas — one per contributing source - Log a WARNING (once per source per entity) noting the merge, so you can review the merged entities and their old_id metas
- Add their own source-specific old_id meta (e.g.,
- The result is a single entity with multiple old_id metas — one per contributing source
Example: If www.source-a.com and www.source-b.com both have a user "admin", after importing both:
- One local "admin" user exists
- That user has TWO old_id metas:
newspackcontentdiff_oldid_www.source-a.com= 123newspackcontentdiff_oldid_www.source-b.com= 456
Entities that are NOT merged (always create new):
- Posts — different sources = different posts (even with same title/slug)
- Attachments — different sources = different attachments (even with same filename)
Some object types rely on the source hostname meta to be properly compared/cdiff-ed against the live tables, and this meta does affect whether they're being imported, while some other objects have it purely for tracking of origin purpose.
| Entity | Meta Table | Meta used for import decision? | What happens WITHOUT the meta? |
|---|---|---|---|
| Posts (all CPTs) | wp_postmeta |
YES | Object is considered "new" → DUPLICATE CREATED |
| Attachments | wp_postmeta |
YES | Object is considered "new" → DUPLICATE CREATED |
| Users | wp_usermeta |
No | Matched by user_login → merged/reused existing object |
| Terms | wp_termmeta |
No | Matched by name+taxonomy+parent → merged/reused existing object |
Posts and Attachments — The source hostname meta is critical for these. During migration, the plugin builds a mapping of live_id => local_id from the metas. If a live post's ID is not in this map, it's considered "new" and will be imported — potentially creating a duplicate if that content already exists locally but wasn't attributed.
Users — The meta is for tracking/mapping only. User lookup during import is done by user_login match, not by meta, and if a user with the same login exists locally, it will be reused regardless of whether it has the source hostname meta (with an appropriate warning in the log), while the meta is added for internal reference.
Terms (Categories, Tags, etc.) — The meta is for tracking/mapping only. Term lookup during import is done by name + taxonomy + parent, not by meta. And if a term with the same name exists with the same taxonomy, and under the same parent, it will be reused regardless of whether it has the source hostname meta, while the meta is added for internal reference.
- Backup First: Always backup your local staging site before running migrations
- Monitor Logs: Check log files for any issues or warnings
Run ./vendor/bin/phpcs --standard=phpcs.xml {File} to check for coding standards issues.
Run ./vendor/bin/phpcbf --standard=phpcs.xml {File} to apply automatic fixes.
This plugin points to the Newspack Migration Tools dev-trunk branch. Whenever newer code has been merged to trunk in the NMT, run composer update automattic/newspack-migration-tools to update the lockfile and get the latest from the NMT. If nothing happens when you update, then run composer clear-cache and try again.
Here is a one-liner (well – there are multiple lines for readability) that is safe to use even if you have the NMT symlinked into the vendor directory. From your PR's branch run:
rm -rf vendor/automattic/newspack-migration-tools && \
composer update automattic/newspack-migration-tools && \
git add composer.lock && \
git commit -m 'Updating NMT composer pointer' && \
git push origin $(git symbolic-ref --short HEAD)- Check the log files in the
--data-dirdirectory - Review the error log for specific error messages
- CLI Timeout Issues: Consider running migrations in smaller batches, or simply rerun to resume the migration from the last successful step
- Memory Exhaustion: Increase PHP memory limits or reduce batch size
This plugin is part of the Newspack ecosystem and follows the same licensing terms as other Newspack plugins.
This plugin is provided as-is without any warranty or support. Use at your own risk. The authors and contributors are not responsible for any data loss or any kind of damage caused by the use of this plugin.