Skip to content

Fix native linux device [dev]#1710

Merged
badrishc merged 2 commits intodevfrom
badrishc/update-native-device-dev
Apr 16, 2026
Merged

Fix native linux device [dev]#1710
badrishc merged 2 commits intodevfrom
badrishc/update-native-device-dev

Conversation

@badrishc
Copy link
Copy Markdown
Collaborator

The t64 ABI transition renamed libaio.so.1 to libaio.so.1t64, breaking libnative_device.so which has a hard DT_NEEDED of libaio.so.1. Previously we worked around this with system-wide symlinks in every Dockerfile and CI workflow.

Fix this properly in the loader itself:

  • CMakeLists.txt now sets RPATH=$ORIGIN (via INSTALL_RPATH + BUILD_WITH_INSTALL_RPATH + --disable-new-dtags) so libnative_device.so searches its own directory for dependencies. This lets a managed-side compat symlink next to the native library satisfy the linker without any LD_LIBRARY_PATH contortions from the caller.

  • libaio_compat.h (new) pins the libaio entry points to the specific symbol versions that make libaio's userspace fast paths kick in:
    io_setup @LIBAIO_0.4
    io_destroy @LIBAIO_0.4
    io_getevents@LIBAIO_0.4 (userspace ring fast path)
    io_submit @LIBAIO_0.1
    Older libaio-dev marked LIBAIO_0.4 as the default version so a plain
    link picked these up automatically. On t64 (libaio1t64-dev) the default
    is gone and libaio.h has no .symver redirects for x86_64, so a fresh
    link produces UNVERSIONED references that at runtime resolve to the
    slower LIBAIO_0.1 io_getevents which always syscalls and blocks -
    which caused NativeStorageDevice probe/TryComplete paths to hang.

  • NativeStorageDevice.ImportResolver now resolves NativeLibraryPath to an absolute path (fixing a latent bug where the relative path bypassed .NET's runtimes/ probing) and, on Linux, catches DllNotFoundException referencing libaio.so.1, locates libaio.so.1t64 in standard multiarch paths, and drops a compat symlink next to libnative_device.so. The symlink creation tolerates the race where multiple processes start simultaneously and another process has already created a usable symlink. If repair still fails, the loader throws a descriptive DllNotFoundException explaining the t64 transition and offering three remediation options.

  • VectorManager.Initialize() and ResumePostRecovery() now early-return when IsEnabled is false. Vector Set preview is off by default; there is no reason these paths should touch storage when the feature is disabled.

With the loader + build fixes in place, remove the now-redundant workarounds:

  • Dockerfile and Dockerfile.ubuntu: drop the ln -sf libaio.so.1 line. (Dockerfile.alpine and Dockerfile.azurelinux ship libaio.so.1 natively. Dockerfile.chiseled uses a restricted runtime and was not touched.)

  • .github/workflows/ci.yml and nightly.yml: drop the ubuntu-latest libaio pre-step; the managed ImportResolver now handles repair automatically and the test suite actually exercises the repair path.

  • validate_docker_images.py: accept either libaio.so.1 or libaio.so.1t64, since the former is only materialized lazily (on first native device init) for glibc images now.

The bundled libnative_device.so has been rebuilt against the above sources with '-O3 -g -DNDEBUG' (project Release defaults). Verified via objdump -T that io_* references are correctly versioned.

Copilot AI review requested due to automatic review settings April 16, 2026 21:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Garnet/Tsavorite’s native Linux storage device loading and build configuration to handle the Debian/Ubuntu t64 libaio transition without requiring system-wide symlinks in Docker images or CI.

Changes:

  • Add Linux loader auto-repair for missing libaio.so.1 by creating a local compat symlink next to libnative_device.so, and improve native library path resolution.
  • Pin libaio symbol versions at link time via a new libaio_compat.h, and adjust native build/linker RPATH behavior to prefer $ORIGIN.
  • Remove now-redundant libaio workaround steps from Dockerfiles, GitHub workflows, and relax docker image validation to accept either libaio.so.1 or libaio.so.1t64.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
libs/storage/Tsavorite/cs/src/core/Device/NativeStorageDevice.cs Adds absolute-path resolution for the native library and Linux libaio t64 auto-repair with improved diagnostics.
libs/storage/Tsavorite/cc/src/device/libaio_compat.h New header to force versioned libaio symbol bindings to avoid performance/regression/hang behavior.
libs/storage/Tsavorite/cc/src/device/file_linux.h Includes the new libaio compatibility header for Linux builds.
libs/storage/Tsavorite/cc/src/CMakeLists.txt Adds $ORIGIN RPATH behavior and links options to ensure local dependency resolution for the native device.
libs/server/Resp/Vector/VectorManager.cs Skips initialization/recovery paths when Vector Set preview is disabled.
test/docker-tests/validate_docker_images.py Accepts either libaio.so.1 or libaio.so.1t64 since the symlink may now be created lazily.
Dockerfile Removes build-time libaio symlink workaround.
Dockerfile.ubuntu Removes build-time libaio symlink workaround.
.github/workflows/ci.yml Removes Ubuntu libaio symlink workaround step so CI exercises loader repair.
.github/workflows/nightly.yml Removes Ubuntu libaio symlink workaround step so nightly runs exercise loader repair.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/storage/Tsavorite/cs/src/core/Device/NativeStorageDevice.cs Outdated
Comment thread libs/storage/Tsavorite/cc/src/CMakeLists.txt
@badrishc badrishc force-pushed the badrishc/update-native-device-dev branch from 7de1020 to f47f903 Compare April 16, 2026 21:36
…nsition)

The t64 ABI transition renamed libaio.so.1 to libaio.so.1t64, breaking
libnative_device.so which has a hard DT_NEEDED of libaio.so.1. Fix the
problem in three places so both Docker and non-Docker users on t64 hosts
get a working native device without manual intervention.

1) libaio_compat.h (new) pins the libaio entry points to specific symbol
   versions at link time:
     io_setup    @LIBAIO_0.4
     io_destroy  @LIBAIO_0.4
     io_getevents@LIBAIO_0.4   (userspace ring fast path)
     io_submit   @LIBAIO_0.1
   Older libaio-dev marked LIBAIO_0.4 as the default version so a plain
   link picked these up automatically. On t64 (libaio1t64-dev) the default
   is gone and libaio.h has no .symver redirects for x86_64, so a fresh
   link produces UNVERSIONED references that at runtime resolve to the
   slower LIBAIO_0.1 io_getevents - which always syscalls and blocks -
   causing NativeStorageDevice probe/TryComplete paths to hang. With
   libaio_compat.h included first, any future rebuild on any distro
   reproduces the correct versioned bindings.

2) CMakeLists.txt sets RPATH=$ORIGIN (via INSTALL_RPATH +
   BUILD_WITH_INSTALL_RPATH + --disable-new-dtags) so libnative_device.so
   searches its own directory for dependencies. This enables the managed
   loader's fallback (below).

3) NativeStorageDevice.ImportResolver resolves NativeLibraryPath to an
   absolute path (fixing a latent bug where the relative path bypassed
   .NET's runtimes/ probing) and, on Linux, catches DllNotFoundException
   referencing libaio.so.1, locates libaio.so.1t64 in standard multiarch
   paths, and drops a compat symlink next to libnative_device.so. The
   symlink creation tolerates the race where multiple processes start
   simultaneously and another process has already created a usable
   symlink. If repair still fails, the loader throws a descriptive
   DllNotFoundException explaining the t64 transition and offering three
   remediation options. This path is primarily for non-Docker users
   (developers running dotnet GarnetServer on their own Debian 13 /
   Ubuntu 24.04 machines).

Also:

- VectorManager.Initialize() and ResumePostRecovery() now early-return
  when IsEnabled is false. Vector Set preview is off by default; there
  is no reason these paths should touch storage when the feature is
  disabled.

- Dockerfile and Dockerfile.ubuntu still install libaio1t64 and
  pre-create the libaio.so.1 -> libaio.so.1t64 symlink at build time
  for maximum robustness (works on read-only filesystems and under
  restrictive seccomp profiles that block symlink(2)). The managed
  loader fallback is belt-and-braces for non-Docker users.
  (Dockerfile.alpine and Dockerfile.azurelinux ship libaio.so.1
  natively. Dockerfile.chiseled uses a restricted runtime image and
  was not changed - it already stages libaio.so.1 from a build stage.)

- .github/workflows/ci.yml and nightly.yml drop the ubuntu-latest
  libaio pre-step; the managed ImportResolver now handles repair
  automatically on any host.

- validate_docker_images.py accepts either libaio.so.1 or
  libaio.so.1t64 when checking library presence.

The bundled libnative_device.so has been rebuilt against the above
sources with '-O3 -g -DNDEBUG' (project Release defaults). Verified via
objdump -T that io_* references are correctly versioned.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@badrishc badrishc force-pushed the badrishc/update-native-device-dev branch from f47f903 to 70062be Compare April 16, 2026 21:41
- CMakeLists.txt: fix FNATIVE_DEVICE_HEADERS typo so file_linux.h and
  libaio_compat.h are actually associated with the native_device target
  (cosmetic, does not affect compiled binary).
- NativeStorageDevice: wrap Directory.GetCurrentDirectory() in a
  TryGetCurrentDirectory helper so a deleted/inaccessible CWD cannot
  block native library resolution when the library exists in the
  assembly or AppContext directory.
- NativeStorageDevice.BuildLibaioDiagnostic: expand architecture mapping
  (x64, Arm64, Arm) with a null fallback that emits a distro-agnostic
  fix instruction, and correct the remediation advice to suggest a
  valid DeviceType value ('RandomAccess') instead of the non-existent
  'Managed'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@badrishc badrishc merged commit 8113f56 into dev Apr 16, 2026
30 checks passed
@badrishc badrishc deleted the badrishc/update-native-device-dev branch April 16, 2026 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants