Skip to content

Commit f47f903

Browse files
badrishcCopilot
andcommitted
Fix native device loading on Debian 13 / Ubuntu 24.04 (libaio t64 transition)
The t64 ABI transition renamed libaio.so.1 to libaio.so.1t64, breaking libnative_device.so which has a hard DT_NEEDED of libaio.so.1. Previously we worked around this with system-wide symlinks in every Dockerfile and CI workflow. Fix this properly in the loader itself: - CMakeLists.txt now sets RPATH=$ORIGIN (via INSTALL_RPATH + BUILD_WITH_INSTALL_RPATH + --disable-new-dtags) so libnative_device.so searches its own directory for dependencies. This lets a managed-side compat symlink next to the native library satisfy the linker without any LD_LIBRARY_PATH contortions from the caller. - libaio_compat.h (new) pins the libaio entry points to the specific symbol versions that make libaio's userspace fast paths kick in: io_setup @LIBAIO_0.4 io_destroy @LIBAIO_0.4 io_getevents@LIBAIO_0.4 (userspace ring fast path) io_submit @LIBAIO_0.1 Older libaio-dev marked LIBAIO_0.4 as the default version so a plain link picked these up automatically. On t64 (libaio1t64-dev) the default is gone and libaio.h has no .symver redirects for x86_64, so a fresh link produces UNVERSIONED references that at runtime resolve to the slower LIBAIO_0.1 io_getevents which always syscalls and blocks - which caused NativeStorageDevice probe/TryComplete paths to hang. - NativeStorageDevice.ImportResolver now resolves NativeLibraryPath to an absolute path (fixing a latent bug where the relative path bypassed .NET's runtimes/ probing) and, on Linux, catches DllNotFoundException referencing libaio.so.1, locates libaio.so.1t64 in standard multiarch paths, and drops a compat symlink next to libnative_device.so. The symlink creation tolerates the race where multiple processes start simultaneously and another process has already created a usable symlink. If repair still fails, the loader throws a descriptive DllNotFoundException explaining the t64 transition and offering three remediation options. - VectorManager.Initialize() and ResumePostRecovery() now early-return when IsEnabled is false. Vector Set preview is off by default; there is no reason these paths should touch storage when the feature is disabled. With the loader + build fixes in place, remove the now-redundant workarounds: - Dockerfile and Dockerfile.ubuntu: drop the ln -sf libaio.so.1 line. (Dockerfile.alpine and Dockerfile.azurelinux ship libaio.so.1 natively. Dockerfile.chiseled uses a restricted runtime and was not touched.) - .github/workflows/ci.yml and nightly.yml: drop the ubuntu-latest libaio pre-step; the managed ImportResolver now handles repair automatically and the test suite actually exercises the repair path. - validate_docker_images.py: accept either libaio.so.1 or libaio.so.1t64, since the former is only materialized lazily (on first native device init) for glibc images now. The bundled libnative_device.so has been rebuilt against the above sources with '-O3 -g -DNDEBUG' (project Release defaults). Verified via objdump -T that io_* references are correctly versioned. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 1737c8a commit f47f903

11 files changed

Lines changed: 275 additions & 21 deletions

File tree

.github/workflows/ci.yml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -170,10 +170,6 @@ jobs:
170170
steps:
171171
- name: Check out code
172172
uses: actions/checkout@v4
173-
- name: Set workaround for libaio on Ubuntu 24.04 (see https://askubuntu.com/questions/1512196/libaio1-on-noble/1512197#1512197)
174-
run: |
175-
sudo ln -s /usr/lib/x86_64-linux-gnu/libaio.so.1t64 /usr/lib/x86_64-linux-gnu/libaio.so.1
176-
if: ${{ matrix.os == 'ubuntu-latest' }}
177173
- name: Set environment variable for Linux
178174
run: echo "RunAzureTests=yes" >> $GITHUB_ENV
179175
if: ${{ matrix.os == 'ubuntu-latest' }}

.github/workflows/nightly.yml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,11 +43,6 @@ jobs:
4343
- name: Check out code
4444
uses: actions/checkout@v4
4545

46-
- name: Set workaround for libaio on Ubuntu 24.04 (see https://askubuntu.com/questions/1512196/libaio1-on-noble/1512197#1512197)
47-
run: |
48-
sudo ln -s /usr/lib/x86_64-linux-gnu/libaio.so.1t64 /usr/lib/x86_64-linux-gnu/libaio.so.1
49-
if: ${{ matrix.os == 'ubuntu-latest' }}
50-
5146
- name: Set environment variable for Linux
5247
run: echo "RunAzureTests=yes" >> $GITHUB_ENV
5348
if: ${{ matrix.os == 'ubuntu-latest' }}

Dockerfile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ RUN apt-get update \
4545
liblua5.4-0 \
4646
&& ARCH="$(uname -m)" \
4747
&& case "$ARCH" in x86_64) MULTIARCH="x86_64-linux-gnu";; aarch64) MULTIARCH="aarch64-linux-gnu";; *) MULTIARCH="$ARCH-linux-gnu";; esac \
48-
&& ln -sf "/usr/lib/${MULTIARCH}/libaio.so.1t64" "/usr/lib/${MULTIARCH}/libaio.so.1" \
4948
&& DN_DIR=$(ls -d /usr/share/dotnet/shared/Microsoft.NETCore.App/* 2>/dev/null | head -n1 || true) \
5049
&& if [ -n "$DN_DIR" ]; then ln -sf "/usr/lib/${MULTIARCH}/liblua5.4.so.0" "$DN_DIR/liblua54.so"; fi \
5150
&& rm -rf /var/lib/apt/lists/*

Dockerfile.ubuntu

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@ RUN apt-get update \
4646
liblua5.4-0 \
4747
&& ARCH="$(uname -m)" \
4848
&& case "$ARCH" in x86_64) MULTIARCH="x86_64-linux-gnu";; aarch64) MULTIARCH="aarch64-linux-gnu";; *) MULTIARCH="$ARCH-linux-gnu";; esac \
49-
&& ln -sf "/usr/lib/${MULTIARCH}/libaio.so.1t64" "/usr/lib/${MULTIARCH}/libaio.so.1" \
5049
&& DN_DIR=$(ls -d /usr/share/dotnet/shared/Microsoft.NETCore.App/* 2>/dev/null | head -n1 || true) \
5150
&& if [ -n "$DN_DIR" ]; then ln -sf "/usr/lib/${MULTIARCH}/liblua5.4.so.0" "$DN_DIR/liblua54.so"; fi \
5251
&& rm -rf /var/lib/apt/lists/*

libs/server/Resp/Vector/VectorManager.cs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,8 @@ public VectorManager(int dbId, GarnetServerOptions serverOptions, Func<IMessageC
116116
/// </summary>
117117
public void Initialize()
118118
{
119+
if (!IsEnabled) return;
120+
119121
using var session = (RespServerSession)getCleanupSession();
120122
if (session.activeDbId != dbId && !session.TrySwitchActiveDatabaseSession(dbId))
121123
{
@@ -153,6 +155,8 @@ public void Initialize()
153155
/// </summary>
154156
public void ResumePostRecovery()
155157
{
158+
if (!IsEnabled) return;
159+
156160
using var session = (RespServerSession)getCleanupSession();
157161

158162
ref var ctx = ref session.storageSession.vectorBasicContext;

libs/storage/Tsavorite/cc/src/CMakeLists.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ set (NATIVE_DEVICE_HEADERS ${NATIVE_DEVICE_HEADERS}
2626
else()
2727
set (FNATIVE_DEVICE_HEADERS ${NATIVE_DEVICE_HEADERS}
2828
device/file_linux.h
29+
device/libaio_compat.h
2930
)
3031
endif()
3132

@@ -48,4 +49,15 @@ endif()
4849
add_library(native_device SHARED ${NATIVE_DEVICE_SOURCES} ${NATIVE_DEVICE_HEADERS})
4950
if (UNIX)
5051
target_link_libraries(native_device PRIVATE stdc++fs aio)
52+
# Search the library's own directory for DT_NEEDED dependencies. This lets the C# loader drop
53+
# a compat symlink (e.g., libaio.so.1 -> libaio.so.1t64 for the Debian/Ubuntu t64 ABI
54+
# transition) next to libnative_device.so and have the dynamic linker pick it up without
55+
# requiring LD_LIBRARY_PATH to be set by the caller. Use RPATH (DT_RPATH) instead of RUNPATH
56+
# because RPATH is searched before LD_LIBRARY_PATH and applies transitively to all dependency
57+
# lookups, whereas RUNPATH only applies to the direct library and is overridden by
58+
# LD_LIBRARY_PATH.
59+
set_target_properties(native_device PROPERTIES
60+
INSTALL_RPATH "$ORIGIN"
61+
BUILD_WITH_INSTALL_RPATH TRUE)
62+
target_link_options(native_device PRIVATE "LINKER:--disable-new-dtags")
5163
endif()

libs/storage/Tsavorite/cc/src/device/file_linux.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
#include <cstdint>
88
#include <string>
99
#include <libaio.h>
10+
#include "libaio_compat.h"
1011
#include <sys/types.h>
1112
#include <sys/stat.h>
1213
#include <unistd.h>
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
// Copyright (c) Microsoft Corporation.
2+
// Licensed under the MIT license.
3+
4+
// Force specific versioned libaio symbols at link time.
5+
//
6+
// Background: libaio.so exports multiple versions of the core AIO entry points:
7+
// LIBAIO_0.1 - original syscall-only ABI
8+
// LIBAIO_0.4 - adds a userspace ring-buffer fast path for io_getevents
9+
// LIBAIO_0.5 / LIBAIO_0.6 - later additions
10+
//
11+
// On older distros libaio-dev shipped a library that marked LIBAIO_0.4 as the
12+
// default version, so linking `-laio` recorded `io_getevents@LIBAIO_0.4` etc.
13+
// automatically. Starting with the t64 ABI transition (Debian 13 / Ubuntu
14+
// 24.04+), libaio1t64 no longer advertises a default version for these
15+
// symbols, and libaio.h has no `.symver` redirects for x86_64, so a fresh
16+
// link produces UNVERSIONED references. At runtime the dynamic linker then
17+
// picks the first matching definition - LIBAIO_0.1, whose `io_getevents`
18+
// always goes through the syscall and blocks until `min_nr` events arrive.
19+
// That caused Garnet's NativeStorageDevice probe/TryComplete paths to hang
20+
// indefinitely on an empty context when they expected the LIBAIO_0.4 fast
21+
// path to return 0 immediately.
22+
//
23+
// To keep rebuilds correct regardless of which distro they happen on, we
24+
// pin the exact versions we want:
25+
// io_setup @LIBAIO_0.4
26+
// io_destroy @LIBAIO_0.4
27+
// io_getevents@LIBAIO_0.4
28+
// io_submit @LIBAIO_0.1
29+
// These match the versions that the original (pre-t64) build produced and
30+
// that Garnet has shipped against for years.
31+
//
32+
// The mechanism: declare aliases via `.symver`, then #define the unadorned
33+
// names to route to those aliases. This has no runtime cost; the resulting
34+
// binary simply records versioned UND references.
35+
//
36+
// IMPORTANT: Include this header before any use of io_setup / io_destroy /
37+
// io_getevents / io_submit. (file_linux.h already includes <libaio.h> then
38+
// this header, so including file_linux.h is sufficient.)
39+
40+
#pragma once
41+
42+
#ifdef __linux__
43+
44+
#include <libaio.h>
45+
46+
__asm__(".symver io_setup_0_4, io_setup@LIBAIO_0.4");
47+
__asm__(".symver io_destroy_0_4, io_destroy@LIBAIO_0.4");
48+
__asm__(".symver io_getevents_0_4, io_getevents@LIBAIO_0.4");
49+
__asm__(".symver io_submit_0_1, io_submit@LIBAIO_0.1");
50+
51+
extern "C" {
52+
int io_setup_0_4(int maxevents, io_context_t* ctxp);
53+
int io_destroy_0_4(io_context_t ctx);
54+
int io_getevents_0_4(io_context_t ctx, long min_nr, long nr,
55+
struct io_event* events, struct timespec* timeout);
56+
int io_submit_0_1(io_context_t ctx, long nr, struct iocb** iocbs);
57+
}
58+
59+
#define io_setup io_setup_0_4
60+
#define io_destroy io_destroy_0_4
61+
#define io_getevents io_getevents_0_4
62+
#define io_submit io_submit_0_1
63+
64+
#endif // __linux__

libs/storage/Tsavorite/cs/src/core/Device/NativeStorageDevice.cs

Lines changed: 173 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,179 @@ static NativeStorageDevice()
5454

5555
static IntPtr ImportResolver(string libraryName, Assembly assembly, DllImportSearchPath? searchPath)
5656
{
57-
IntPtr libHandle = IntPtr.Zero;
58-
if (libraryName == NativeLibraryName && NativeLibraryPath != null)
59-
libHandle = NativeLibrary.Load(NativeLibraryPath);
60-
return libHandle;
57+
if (libraryName != NativeLibraryName || NativeLibraryPath == null)
58+
return IntPtr.Zero;
59+
60+
var resolvedPath = ResolveNativeLibraryPath(assembly);
61+
62+
try
63+
{
64+
return NativeLibrary.Load(resolvedPath);
65+
}
66+
catch (DllNotFoundException ex) when (RuntimeInformation.IsOSPlatform(OSPlatform.Linux)
67+
&& ex.Message.Contains("libaio.so.1", StringComparison.Ordinal))
68+
{
69+
// Debian 13 (trixie) / Ubuntu 24.04 (noble) renamed libaio1 to libaio1t64 as part of the
70+
// 64-bit time_t ABI transition. The package now ships libaio.so.1t64 (SONAME "libaio.so.1t64"),
71+
// which does NOT satisfy our DT_NEEDED of "libaio.so.1". Try to repair by dropping a
72+
// libaio.so.1 -> libaio.so.1t64 symlink next to libnative_device.so; this works because the
73+
// native library is built with RPATH=$ORIGIN.
74+
if (TryCreateLibaioCompatSymlink(resolvedPath, out var symlinkedPath))
75+
{
76+
try
77+
{
78+
return NativeLibrary.Load(resolvedPath);
79+
}
80+
catch (DllNotFoundException)
81+
{
82+
// Fall through to the detailed error below.
83+
}
84+
}
85+
86+
throw new DllNotFoundException(BuildLibaioDiagnostic(symlinkedPath, ex), ex);
87+
}
88+
}
89+
90+
/// <summary>
91+
/// Resolve NativeLibraryPath (which is a NuGet-style "runtimes/&lt;rid&gt;/native/&lt;lib&gt;" relative
92+
/// path) to an absolute filesystem path. We probe (in order) the assembly's own directory, the
93+
/// application's base directory, and finally the current working directory. Falls back to the
94+
/// raw relative path if none of these exist, so dlopen's error message surfaces as before.
95+
/// </summary>
96+
static string ResolveNativeLibraryPath(Assembly assembly)
97+
{
98+
string[] searchRoots =
99+
[
100+
Path.GetDirectoryName(assembly?.Location),
101+
AppContext.BaseDirectory,
102+
Directory.GetCurrentDirectory(),
103+
];
104+
105+
foreach (var root in searchRoots)
106+
{
107+
if (string.IsNullOrEmpty(root))
108+
continue;
109+
var candidate = Path.Combine(root, NativeLibraryPath);
110+
if (File.Exists(candidate))
111+
return Path.GetFullPath(candidate);
112+
}
113+
114+
return NativeLibraryPath;
115+
}
116+
117+
/// <summary>
118+
/// Candidate paths for libaio.so.1t64 on Debian/Ubuntu multiarch layouts. These match what
119+
/// libaio1t64 installs on amd64 and arm64; add more here if additional architectures appear.
120+
/// </summary>
121+
static readonly string[] LibaioT64CandidatePaths =
122+
[
123+
"/usr/lib/x86_64-linux-gnu/libaio.so.1t64",
124+
"/usr/lib/aarch64-linux-gnu/libaio.so.1t64",
125+
"/lib/x86_64-linux-gnu/libaio.so.1t64",
126+
"/lib/aarch64-linux-gnu/libaio.so.1t64",
127+
"/usr/lib64/libaio.so.1t64",
128+
"/usr/lib/libaio.so.1t64",
129+
];
130+
131+
/// <summary>
132+
/// Locate libaio.so.1t64 and create a libaio.so.1 symlink next to libnative_device.so so that
133+
/// the dynamic linker (searching RPATH=$ORIGIN) can satisfy the DT_NEEDED entry. Returns true
134+
/// when after the call a usable symlink exists at the expected path - whether we created it or
135+
/// a concurrently-starting process did. Sets <paramref name="createdSymlink"/> to the link path
136+
/// in that case.
137+
/// </summary>
138+
static bool TryCreateLibaioCompatSymlink(string resolvedNativeLibraryPath, out string createdSymlink)
139+
{
140+
createdSymlink = null;
141+
142+
string t64Path = null;
143+
foreach (var candidate in LibaioT64CandidatePaths)
144+
{
145+
if (File.Exists(candidate))
146+
{
147+
t64Path = candidate;
148+
break;
149+
}
150+
}
151+
if (t64Path == null)
152+
return false;
153+
154+
string shimPath;
155+
try
156+
{
157+
var nativeDir = Path.GetDirectoryName(Path.GetFullPath(resolvedNativeLibraryPath));
158+
if (string.IsNullOrEmpty(nativeDir) || !Directory.Exists(nativeDir))
159+
return false;
160+
161+
shimPath = Path.Combine(nativeDir, "libaio.so.1");
162+
}
163+
catch (Exception)
164+
{
165+
return false;
166+
}
167+
168+
try
169+
{
170+
File.CreateSymbolicLink(shimPath, t64Path);
171+
createdSymlink = shimPath;
172+
return true;
173+
}
174+
catch (IOException)
175+
{
176+
// Either a concurrently-starting process already created the symlink (common in
177+
// container fleets where multiple Garnet instances share an image), or a stale file
178+
// of the same name is present. If it's a symlink resolving to libaio.so.1t64, treat
179+
// that as success; otherwise fall through to the diagnostic error.
180+
if (IsUsableLibaioShim(shimPath))
181+
{
182+
createdSymlink = shimPath;
183+
return true;
184+
}
185+
return false;
186+
}
187+
catch (Exception)
188+
{
189+
return false;
190+
}
191+
}
192+
193+
/// <summary>
194+
/// Returns true if <paramref name="shimPath"/> is an existing symlink that points to a
195+
/// libaio.so.1t64 file (possibly via relative or absolute target).
196+
/// </summary>
197+
static bool IsUsableLibaioShim(string shimPath)
198+
{
199+
try
200+
{
201+
var info = new FileInfo(shimPath);
202+
if (!info.Exists) return false;
203+
var target = info.LinkTarget;
204+
if (string.IsNullOrEmpty(target)) return false;
205+
// LinkTarget can be a relative path (e.g., just "libaio.so.1t64"); accept either.
206+
return target.EndsWith("libaio.so.1t64", StringComparison.Ordinal);
207+
}
208+
catch
209+
{
210+
return false;
211+
}
212+
}
213+
214+
static string BuildLibaioDiagnostic(string attemptedSymlinkPath, Exception inner)
215+
{
216+
var arch = RuntimeInformation.ProcessArchitecture == Architecture.Arm64
217+
? "aarch64-linux-gnu" : "x86_64-linux-gnu";
218+
var attempted = attemptedSymlinkPath == null
219+
? "Could not find libaio.so.1t64 in standard multiarch paths; auto-repair skipped."
220+
: $"Attempted to create '{attemptedSymlinkPath}' -> libaio.so.1t64 but the load still failed.";
221+
return
222+
$"Failed to load native storage device library '{NativeLibraryPath}' because its dependency 'libaio.so.1' " +
223+
"is not resolvable by the dynamic linker. This typically happens on Debian 13 (trixie) or " +
224+
"Ubuntu 24.04 (noble) where the libaio1 package was renamed to libaio1t64 (64-bit time_t ABI " +
225+
"transition) and only ships 'libaio.so.1t64'. " + attempted + " " +
226+
"To fix, either (a) install the legacy-named package if available for your distro, " +
227+
$"(b) as root, create the compat symlink: sudo ln -s /usr/lib/{arch}/libaio.so.1t64 /usr/lib/{arch}/libaio.so.1, " +
228+
"or (c) switch to the managed device by setting '--device-type Managed' (or removing '--use-native-device-linux'). " +
229+
"Original loader error: " + inner.Message;
61230
}
62231

63232
/// <summary>
Binary file not shown.

0 commit comments

Comments
 (0)