Consistently encode DRR_BEGIN packed nvlist payloads with NV_ENCODE_XDR#18372
Consistently encode DRR_BEGIN packed nvlist payloads with NV_ENCODE_XDR#18372GarthSnyder wants to merge 1 commit intoopenzfs:masterfrom
Conversation
This is a fix for openzfs#18360. Currently, zfs send generates a mix of nvlist encodings in DRR_BEGIN records, some XDR and some in native byte order. The result is that most streams currently can't be zfs received on opposite-endian systems. zfs send generates the outer wrappers for compound streams in userspace, and it explicitly requests NV_ENCODE_XDR format for those records. But the BEGIN records for individual datasets are generated on the kernel side, in dmu_send.c, where fnvlist_pack() is used for encoding. That routine hard-wires NV_ENCODE_NATIVE format. This PR replaces the fnvlist_pack() call with a direct call to nvlist_pack() that specifies NV_ENCODE_XDR. Signed-off-by: Garth Snyder <garth@garthsnyder.com>
6bac4a7 to
0b521d6
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes cross-endian zfs send | zfs recv incompatibility by ensuring packed nvlist payloads in kernel-generated DRR_BEGIN records are encoded using XDR, consistent with userspace-generated compound stream wrapper records.
Changes:
- Replace
fnvlist_pack()(native-endian) with an explicitnvlist_pack(..., NV_ENCODE_XDR, ...)forDRR_BEGINpayload encoding indmu_send_impl(). - Preserve existing behavior of including optional BEGIN nvlist fields (redaction/resume/crypto metadata), but now with a consistent on-wire encoding.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0b521d6 to
4b4aaf7
Compare
4b4aaf7 to
6d369aa
Compare
|
I think Copilot's comment regarding payloads not being rounded up to an 8-byte boundary is correct. Internally, NV_ENCODE_NATIVE seems to work with 8-byte rounding, but NV_ENCODE_XDR uses 4-byte rounding. In theory, send_prelim_records() in libzfs_sendrecv.c should raise this same issue, as it does not appear to do any rounding. However, that file has its own private implementation of dump_record() that does not double-check payload sizes for 8-byte granularity. The receiving side is also special-cased and does not run through the usual assertions. I have verified that zfs send does in fact generate non-rounded DRR_BEGIN payloads in some cases. This PR does the stupidest possible thing: if a payload needs to be rounded up, a second, slightly longer buffer is allocated and the payload is copied into it, with the extra bytes being zeroed. The alternative would be to define a customized nv_alloc_t that does rounding on allocation, thus sparing the copy. That's how I did it at first, but it's inelegant. It is, in effect, a monkey patch, and requires a separate allocator function. It's also harder to make it compile in userspace since some of the normal nv_alloc_t infrastructure isn't available there. I suggest staying with the copy. BEGIN record payloads are typically small and infrequent, there's at most one per dataset, and the allocation is only active momentarily. But that other version exists and I'd be happy to sub it in if that's preferable. |
161e7a4 to
774b697
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
774b697 to
88bdf7e
Compare
88bdf7e to
3294a4d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3294a4d to
91bfcc9
Compare
behlendorf
left a comment
There was a problem hiding this comment.
Looks good, thanks for the fix.
91bfcc9 to
f5364b6
Compare
|
Since you've already written the test case for this in #18355 we should include it in this PR. |
This is a fix for #18360.
Currently, zfs send generates a mix of nvlist encodings in DRR_BEGIN records, some XDR and some in native byte order. The result is that many streams currently can't be zfs received on opposite-endian systems.
zfs send generates the outer wrappers for compound streams in userspace, and it explicitly requests NV_ENCODE_XDR format for those records. But the BEGIN records for individual datasets are generated on the kernel side, in dmu_send.c, where fnvlist_pack() is used for encoding. That routine hard-wires NV_ENCODE_NATIVE format.
This PR replaces the fnvlist_pack() call with a direct call to nvlist_pack() that specifies NV_ENCODE_XDR.
Motivation and Context
Currently, cross-endian zfs receives can fail because there is no facility within ZFS for byteswapping packed nvlists after the fact. When a DRR_BEGIN record with a cross-endian NV_ENCODE_NATIVE nvlist is received, the kernel rejects it with ENOTSUP, aborting the whole receive.
This PR is a step toward making any valid send stream readable and importable on any ZFS system. There are no doubt other stream encoding issues yet to be resolved, but in my limited testing, many opposite-endian-generated streams seem to be received just fine with this patch in place.
How Has This Been Tested?
This change likely affects the majority of nontrivial send streams, so the existing test suite is already a fairly comprehensive vetting, at least as far as same-endian functionality goes.
I have also built with this change on a big-endian system and generated several send streams that formerly were unimportable on little-endian systems. They now import fine.
Of note, this change requires no receiving-end support. All-XDR streams are already supported by the existing nvlist_unpack() infrastructure. There does not appear to be any stream-related code that does anything with packed nvlists other than passing them along to nvlist_unpack().
I will include cross-endian stream testing as part of a separate testing revamp for zstream. There are some interdependencies, so it would be helpful to have this PR in master before that PR is submitted.
Types of changes
Checklist:
Signed-off-by.