KAFKA-15100; KRaft data race with the expiration service (#14141)
The KRaft client uses an expiration service to complete FETCH requests that have timed out. This expiration service uses a different thread from the KRaft polling thread. This means that it is unsafe for the expiration service thread to call tryCompleteFetchRequest. tryCompleteFetchRequest reads and updates a lot of states that is assumed to be only be read and updated from the polling thread.
The KRaft client now does not call tryCompleteFetchRequest when the FETCH request has expired. It instead will send the FETCH response that was computed when the FETCH request was first handled.
This change also fixes a bug where the KRaft client was not sending the FETCH response immediately, if the response contained a diverging epoch or snapshot id.
Reviewers: Jason Gustafson <jason@confluent.io>
pull/14177/head
José Armando García Sancio1 year agocommitted byGitHub
@ -879,9 +879,9 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
@@ -879,9 +879,9 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
@ -960,8 +960,9 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
@@ -960,8 +960,9 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
@ -971,7 +972,15 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
@@ -971,7 +972,15 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
// Reply immediately if any of the following is true
// 1. The response contains an errror
// 2. There are records in the response
// 3. The fetching replica doesn't want to wait for the partition to contain new data
// 4. The fetching replica needs to truncate because the log diverged
// 5. The fetching replica needs to fetch a snapshot
returncompletedFuture(response);
}
@ -984,11 +993,16 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
@@ -984,11 +993,16 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
@ -999,6 +1013,9 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
@@ -999,6 +1013,9 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
logger.trace("Completing delayed fetch from {} starting at offset {} at {}",
@ -1048,6 +1065,18 @@ public class KafkaRaftClient<T> implements RaftClient<T> {
@@ -1048,6 +1065,18 @@ public class KafkaRaftClient<T> implements RaftClient<T> {