Replies: 4 comments 1 reply
-
|
Random 8KB write with record-/volblocksize of 32KB requite read-modify-write on every request with predictable performance results. Enabled ZFS caching is the only way to avoid it, but if the active data set is too big, then it won't help much. Obviously you can't do Direct I/O in that situation, as data has to be aggregated somewhere. Direct I/O indeed is not implemented for ZVOLs. Passing O_DIRECT there will make Linux to bypass its page cache, which is not the same. Direct I/O tries to save on memory copies. Doing 8KB I/Os the memory copies is the least of your bottlenecks. The first you must do is to avoid read-modify-write by making application writes multiple and aligned to record-/volblocksize. After that you may experiment with either Uncached I/O (primarycache=metadata) or Direct I/O (O_DIRECT). |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the suggestion! My ultimate goal is to find the best practices for ZFS on NVMe. My use case is Oracle, which defaults to 8K blocks, so I will focus on testing with recordsize=8K and, on 2.4.0, explicitly set direct=always to verify the real benefit of Direct I/O. I will also use the same test method to compare 2.2.4 with 2.4.0, and I’ll update this thread with the full test conditions and results. |
Beta Was this translation helpful? Give feedback.
-
|
I re-ran on 10.10.150.235 , single‑disk pool (ashift=12) and the same fio job from this discussion (8k randwrite; 120s; warmup + 3 runs; median; drop caches between runs). The disk and layout are essentially the same; the main difference is recordsize=8K Dataset settings: recordsize=8K , atime=off, compression=off, dedup=off, primarycache=metadata. Results (median IOPS / BW / clat mean / p99):
he best result is 2.4.0 with direct=disabled (slightly better than 2.2.4). direct=always/standard is much worse in this workload. Could you help confirm why the O_DIRECT improvement in 2.4 doesn’t show up here? Is there a specific alignment/IO pattern requirement, or any tunables/flags needed for the direct property to be effective on datasets? Thanks! I can share full fio outputs if helpful. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the detailed explanation. My understanding is that under my current test conditions (primarycache=metadata, fio iodepth=8, bs=8K), the O_DIRECT optimizations introduced after 2.3 are unlikely to be visible. In particular, if the zvol block size is 32K while fio writes are 8K, that can trigger RMW; and since zvol does not use a true direct I/O path, any optimization impact can be further masked. I’ll run another comparison in a scenario closer to the intended one: set primarycache=all, align block sizes to 32K (recordsize and fio bs all set to 32K), and increase iodepth to 128. Under the same hardware and kernel conditions, I’ll compare IOPS, throughput, and latency between 2.2.4 and 2.4.0. If my understanding of the applicable scenarios is still off, please feel free to correct me. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
validate whether my test methodology is reasonable.
Environment
Methodology
ZFS configuration
Results (average of 3 runs)
Questions / feedback requested
I adjust to recordsize=8k or bs=32k to properly exercise Direct I/O?
outputs).
Beta Was this translation helpful? Give feedback.
All reactions