During fast consecutive rebalances where a task is revoked from one worker and assigned to another one, it has been observed that there is a small time window and thus a race condition during which a RUNNING status record in the new generation is produced and is immediately followed by a delayed UNASSIGNED status record belonging to the same or a previous generation before the worker that sends this message reads the RUNNING status record that corresponds to the latest generation.
Although this doesn't inhibit the actual execution of tasks, it reports an incorrect status for those tasks(i.e UNASSIGNED). If the users have setup some kind of monitoring on tasks status then this could lead to false alarms for example.
This fix addresses this problem by checking if a status message is stale after reading it and updates it's status only when it is safe to.
Reviewers: Lucent-Wong <manchesterfans@live.cn>, Chris Egerton <chrise@aiven.io>, Yash Mayya <yash.mayya@gmail.com>, Konstantine Karantasis <k.karantasis@gmail.com>
It's good for us to add support for Java 20 in preparation for Java 21 - the next LTS.
Given that Scala 2.12 support has been deprecated, a Scala 2.12 variant is not included.
Also remove some branch builds that add load to the CI, but have
low value: JDK 8 & Scala 2.13 (JDK 8 support has been deprecated),
JDK 11 & Scala 2.12 (Scala 2.12 support has been deprecated) and
JDK 17 & Scala 2.12 (Scala 2.12 support has been deprecated).
A newer version of Mockito (4.9.0 -> 4.11.0) is required for Java 20 support, but we
only use it with Scala 2.13+ since it causes compilation errors with Scala 2.12. Similarly,
we upgrade easymock when the Java version is 16 or newer as it's incompatible
with powermock (which doesn't support Java 16 or newer).
Filed KAFKA-15117 for a test that fails with Java 20 (SslTransportLayerTest.testValidEndpointIdentificationCN).
Finally, fixed some lossy conversions that were added after #13582 was submitted.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Fixed a regression described in KAFKA-15053 that security.protocol only allows uppercase values like PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. With this fix, both lower case and upper case values will be supported (e.g. PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL, plaintext, ssl, sasl_plaintext, sasl_ssl)
Reviewers: Chris Egerton <chrise@aiven.io>, Divij Vaidya <diviv@amazon.com>
Discovered while researching KAFKA-14718
Currently, we perform a check during zombie fencing that causes the round of zombie fencing to fail when a rebalance is pending (i.e., when we've detected from a background poll of the config topic that a new connector has been created, that an existing connector has been deleted, or that a new set of connector tasks has been generated).
It's possible but not especially likely that this check causes issues when running vanilla Kafka Connect. Even when it does, it's easy enough to restart failed tasks via the REST API.
However, when running MirrorMaker 2 in dedicated mode, this check is more likely to cause issues as we write three connector configs to the config topic in rapid succession on startup. And in that mode, there is no API to restart failed tasks aside from restarting the worker that they are hosted on.
In either case, this check can lead to test flakiness in integration tests for MirrorMaker 2 both in dedicated mode and when deployed onto a vanilla Kafka Connect cluster.
This check is not actually necessary, and we can safely remove it. Copied from Jira:
>If the worker that we forward the zombie fencing request to is a zombie leader (i.e., a worker that believes it is the leader but in reality is not), it will fail to finish the round of zombie fencing because it won't be able to write to the config topic with a transactional producer.
>If the connector has just been deleted, we'll still fail the request since we force a read-to-end of the config topic and refresh our snapshot of its contents before checking to see if the connector exists.
>And regardless, the worker that owns the task will still do a read-to-end of the config topic and verify that (1) no new task configs have been generated for the connector and (2) the worker is still assigned the connector, before allowing the task to process any data.
In addition, while waiting on a fix for KAFKA-14718 that adds more granularity for diagnosing failures in the DedicatedMirrorIntegrationTest suite (#13284), some of the timeouts in that test are bumped to work better on our CI infrastructure.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Yash Mayya <yash.mayya@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Most of the contents in the README.md was already covered in the docs therefore only had to add the section for Exactly Once support.
Reviewers: Luke Chen <showuon@gmail.com>
In case the Kafka Broker cluster and the Kafka Connect cluster is started together and Connect would want to create its topics, there's a high chance to fail the creation with InvalidReplicationFactorException.
---------
Co-authored-by: Daniel Urban <durban@cloudera.com>
Reviewers: Daniel Urban <durban@cloudera.com>, Mickael Maison <mickael.maison@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Chris Egerton <chrise@aiven.io>, Laszlo Hunyadi <laszlo.istvan.hunyady@gmail.com>
`KafkaBasedLog` is a widely used utility class that provides a generic implementation of a shared, compacted log of records in a Kafka topic. It isn't in Connect's public API, but has been used outside of Connect and we try to preserve backward compatibility whenever possible. KAFKA-14455 modified the two overloaded void `KafkaBasedLog::send` methods to return a `Future`. While this change is source compatible, it isn't binary compatible. We can restore backward compatibility simply by renaming the new Future returning send methods, and reinstating the older send methods to delegate to the newer methods.
This refactoring changes no functionality other than restoring the older methods.
Reviewers: Randall Hauch <rhauch@gmail.com>
#13557 introduced a utils method to close executors silently. This PR leverages that method to close executors in connect runtime. There was duplicate code while closing the executors which isn't the case with this PR.
Note that there are a few more executors used in Connect runtime but their close methods don't follow this pattern of shutdown, await and shutdown. Some of them have some logic like executor like Worker, so not changing at such places.
---------
Co-authored-by: Sagar Rao <sagarrao@Sagars-MacBook-Pro.local>
Reviewers: Daniel Urban <durban@cloudera.com>, Yash Mayya <yash.mayya@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>