Java API client version
9.3.2
Java version
21
Elasticsearch Version
9.3.2
Problem description
First of all, we had the same issue with Elasticsearch client/server 18.19.12, but we're now running with 9.3.2 and the exact same thing happens.
We are using the elasticsearch-java client directly.
The behavior of the service when the error occurs is like this:
- At some seemingly random point in time the EPoll.wait exception is logged
- For the next ~30 seconds, the next 5 Elasticsearch requests fail with: "I/O reactor has been shut down"
- After that, all subsequent Elasticsearch requests fail with: "thread waiting for the response was interrupted"
But basically, after the Invalid Argument on EPoll.wait nothing works, and the application needs to be restarted.
Regarding infra, the application is containerized and runs with Kubernetes on AWS nodes with Bottlerocket OS (latest version). The Docker image is based on public.ecr.aws/docker/library/ibm-semeru-runtimes:open-21.0.7_6-jre-focal.
Exception 1:
java.io.IOException: Invalid argument
at java.base/sun.nio.ch.EPoll.wait(Native Method)
at java.base/sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
at java.base/sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
at java.base/sun.nio.ch.SelectorImpl.select(Unknown Source)
at org.apache.hc.core5.reactor.SingleCoreIOReactor.doExecute(SingleCoreIOReactor.java:113)
at org.apache.hc.core5.reactor.AbstractSingleCoreIOReactor.execute(AbstractSingleCoreIOReactor.java:86)
at org.apache.hc.core5.reactor.IOReactorWorker.run(IOReactorWorker.java:44)
at java.base/java.lang.Thread.run(Unknown Source)
Exception 2:
java.lang.RuntimeException: I/O reactor has been shut down
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.extractAndWrapCause(Rest5Client.java:953)
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequest(Rest5Client.java:308)
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequest(Rest5Client.java:293)
at co.elastic.clients.transport.rest5_client.Rest5ClientHttpClient.performRequest(Rest5ClientHttpClient.java:93)
at co.elastic.clients.transport.ElasticsearchTransportBase.performRequest(ElasticsearchTransportBase.java:153)
at co.elastic.clients.elasticsearch.ElasticsearchClient.index(ElasticsearchClient.java:3148)
at co.elastic.clients.elasticsearch.ElasticsearchClient.index(ElasticsearchClient.java:3359)
...
Caused by: org.apache.hc.core5.reactor.IOReactorShutdownException: I/O reactor has been shut down
at org.apache.hc.core5.reactor.IOWorkers.validate(IOWorkers.java:51)
at org.apache.hc.core5.reactor.IOWorkers.access$000(IOWorkers.java:31)
at org.apache.hc.core5.reactor.IOWorkers$PowerOfTwoSelector.next(IOWorkers.java:67)
at org.apache.hc.core5.reactor.AbstractIOReactorBase.connect(AbstractIOReactorBase.java:53)
at org.apache.hc.client5.http.impl.nio.MultihomeIOSessionRequester$2.executeNext(MultihomeIOSessionRequester.java:136)
at org.apache.hc.client5.http.impl.nio.MultihomeIOSessionRequester$2.run(MultihomeIOSessionRequester.java:185)
at org.apache.hc.client5.http.impl.nio.MultihomeIOSessionRequester.connect(MultihomeIOSessionRequester.java:189)
at org.apache.hc.client5.http.impl.nio.DefaultAsyncClientConnectionOperator.connect(DefaultAsyncClientConnectionOperator.java:100)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager.connect(PoolingAsyncClientConnectionManager.java:449)
at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime.connectEndpoint(InternalHttpAsyncExecRuntime.java:216)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec.proceedToNextHop(AsyncConnectExec.java:201)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec.access$000(AsyncConnectExec.java:82)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec$1.completed(AsyncConnectExec.java:153)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec$1.completed(AsyncConnectExec.java:142)
at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime$1.completed(InternalHttpAsyncExecRuntime.java:119)
at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime$1.completed(InternalHttpAsyncExecRuntime.java:110)
at org.apache.hc.core5.concurrent.BasicFuture.completed(BasicFuture.java:123)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3$1.leaseCompleted(PoolingAsyncClientConnectionManager.java:328)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3$1.completed(PoolingAsyncClientConnectionManager.java:313)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3$1.completed(PoolingAsyncClientConnectionManager.java:274)
at org.apache.hc.core5.concurrent.BasicFuture.completed(BasicFuture.java:123)
at org.apache.hc.core5.pool.StrictConnPool.fireCallbacks(StrictConnPool.java:402)
at org.apache.hc.core5.pool.StrictConnPool.lease(StrictConnPool.java:220)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$3.<init>(PoolingAsyncClientConnectionManager.java:271)
at org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager.lease(PoolingAsyncClientConnectionManager.java:266)
at org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime.acquireEndpoint(InternalHttpAsyncExecRuntime.java:105)
at org.apache.hc.client5.http.impl.async.AsyncConnectExec.execute(AsyncConnectExec.java:141)
at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at org.apache.hc.client5.http.impl.async.AsyncProtocolExec.internalExecute(AsyncProtocolExec.java:207)
at org.apache.hc.client5.http.impl.async.AsyncProtocolExec.execute(AsyncProtocolExec.java:172)
at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at org.apache.hc.client5.http.impl.async.AsyncHttpRequestRetryExec.internalExecute(AsyncHttpRequestRetryExec.java:97)
at org.apache.hc.client5.http.impl.async.AsyncHttpRequestRetryExec.execute(AsyncHttpRequestRetryExec.java:184)
at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at org.apache.hc.client5.http.impl.async.AsyncRedirectExec.internalExecute(AsyncRedirectExec.java:112)
at org.apache.hc.client5.http.impl.async.AsyncRedirectExec.execute(AsyncRedirectExec.java:278)
at org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.executeImmediate(InternalAbstractHttpAsyncClient.java:347)
at org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.lambda$doExecute$0(InternalAbstractHttpAsyncClient.java:205)
at org.apache.hc.core5.http.nio.support.BasicRequestProducer.sendRequest(BasicRequestProducer.java:93)
at org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.doExecute(InternalAbstractHttpAsyncClient.java:178)
at org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:97)
at org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:107)
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequest(Rest5Client.java:302)
Exception 3:
java.lang.RuntimeException: thread waiting for the response was interrupted
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.extractAndWrapCause(Rest5Client.java:914)
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequest(Rest5Client.java:308)
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequest(Rest5Client.java:293)
at co.elastic.clients.transport.rest5_client.Rest5ClientHttpClient.performRequest(Rest5ClientHttpClient.java:93)
at co.elastic.clients.transport.ElasticsearchTransportBase.performRequest(ElasticsearchTransportBase.java:153)
at co.elastic.clients.elasticsearch.ElasticsearchClient.healthReport(ElasticsearchClient.java:2936)
...
Caused by: java.lang.InterruptedException: null
at java.base/java.lang.Object.waitImpl(Native Method)
at java.base/java.lang.Object.wait(Unknown Source)
at java.base/java.lang.Object.wait(Unknown Source)
at org.apache.hc.core5.concurrent.BasicFuture.get(BasicFuture.java:83)
at co.elastic.clients.transport.rest5_client.low_level.Rest5Client.performRequest(Rest5Client.java:304)
Java API client version
9.3.2
Java version
21
Elasticsearch Version
9.3.2
Problem description
First of all, we had the same issue with Elasticsearch client/server 18.19.12, but we're now running with 9.3.2 and the exact same thing happens.
We are using the elasticsearch-java client directly.
The behavior of the service when the error occurs is like this:
But basically, after the Invalid Argument on EPoll.wait nothing works, and the application needs to be restarted.
Regarding infra, the application is containerized and runs with Kubernetes on AWS nodes with Bottlerocket OS (latest version). The Docker image is based on
public.ecr.aws/docker/library/ibm-semeru-runtimes:open-21.0.7_6-jre-focal.Exception 1:
Exception 2:
Exception 3: