Until recently revocation of connectors and tasks was the result of a rebalance that contained a new assignment. Therefore the view of the running assignment was kept consistent outside the call to `RebalanceListener#onRevoke`. However, after KAFKA-9184 the need appeared for the worker to revoke tasks voluntarily and proactively without having received a new assignment.
This commit will allow the worker to restart tasks that have been stopped as a result of voluntary revocation after a rebalance reassigns these tasks to the work.
The fix is tested by extending an existing integration test.
Reviewers: Randall Hauch <rhauch@gmail.com>
This PR provides two fixes:
1. Skip offset validation if the current leader epoch cannot be reliably determined.
2. Raise an out of range error if the leader returns an undefined offset in response to the OffsetsForLeaderEpoch request.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Define SSL configs in all worker config classes, not just distributed
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Nigel Liang <nigel@nigelliang.com>, Randall Hauch <rhauch@gmail.com>
Connector projects may have their own mock or testing implementations of the `SinkTaskContext`, and this newly-added method should be a default method to prevent breaking those projects. Changing this to a default method that returns null also makes sense w/r/t the method semantics, since the method is already defined to return null if the reporter has not been configured.
Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Konstantine Karantasis <konstantine@confluent.io>
Also added a new unit test to verify the functionality and expectations.
Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Konstantine Karantasis <konstantine@confluent.io>
cc omkreddy this should also get backported to 2.6.x
Author: Xavier Léauté <xvrl@apache.org>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8813 from xvrl/fix-jmx-reset
- make task manager agnostic to task state
- make tasks state transitions idempotent
Reviewers: Boyang Chen <boyang@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
There are some broken links of streams javadoc, for example:
ReadOnlyKeyValueStore
ReadOnlyWindowStore
Only javadoc is update.
Reviewers: Bill Bejeck <bbejeck@apache.org>
This PR changes the way `PreferredReplicaImbalanceCount` is computed. It moves from re-computing after the processing of each event in the controller, which requires a full pass over all partitions, to incrementally maintaining the count as assignments and leaders are changing.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
Fixes KAFKA-10033.
Replace AdminOperationException with UnknownTopicOrPartitionException if topic does not exist when validating topic configs in AdminZkClient.
Author: gnkoshelev <gnkoshelev@gmail.com>
Author: Gregory <gnkoshelev@gmail.com>
Reviewers: Brian Byrne <bbyrne@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8715 from gnkoshelev/KAFKA-10033
This applies to the producer, consumer, admin client, connect worker
and inter broker communication.
`ClientDnsLookup.DEFAULT` has been deprecated and a warning
will be logged if it's explicitly set in a client config.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>
With the recent introduction of predicated SMTs, properties named "predicate" and "negate" should be ignored and removed in case they are present in transformation configs.
This commit fixes the equality check to be with the key of the config to apply proper removal.
Reviewers: Tom Bentley <tbentley@redhat.com>, Konstantine Karantasis <konstantine@confluent.io>
Sensor objects are stored in the Kafka metrics registry and keyed by name. If a new sensor is created with the same name as an existing one, the existing one is returned rather than a new object being created. The partition load time sensors for the transaction and group coordinators used the same name, so data recorded to either was stored in the same object. This meant that the metrics values for both metrics were identical and consisted of the combined data. This patch changes the names to be distinct so that the data will be stored in separate Sensor objects.
Reviewers: Jason Gustafson <jason@confluent.io>
Gradle 6.5 includes a fix for https://github.com/gradle/gradle/pull/12866, which
affects the performance of Scala compilation.
I profiled the scalac build with async profiler and 54% of the time was on GC
even after the Gradle upgrade (it was more than 60% before), so I switched to
the throughput GC (GC latency is less important for batch builds) and it
was reduced to 38%.
I also centralized the jvm configuration in `build.gradle` and simplified it a bit
by removing the minHeapSize configuration from the test tasks.
On my desktop, the time to execute clean builds with no cached Gradle daemon
was reduced from 127 seconds to 97 seconds. With a cached daemon, it was
reduced from 120 seconds to 88 seconds. The performance regression when
we upgraded to Gradle 6.x was 27 seconds with a cached daemon
(https://github.com/apache/kafka/pull/7677#issuecomment-616271179), so it
should be fixed now.
Gradle 6.4 with no cached daemon:
```
BUILD SUCCESSFUL in 2m 7s
115 actionable tasks: 112 executed, 3 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.15s user 0.12s system 0% cpu 2:08.06 total
```
Gradle 6.4 with cached daemon:
```
BUILD SUCCESSFUL in 2m 0s
115 actionable tasks: 111 executed, 4 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 0.95s user 0.10s system 0% cpu 2:01.42 total
```
Gradle 6.5 with no cached daemon:
```
BUILD SUCCESSFUL in 1m 46s
115 actionable tasks: 111 executed, 4 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.27s user 0.12s system 1% cpu 1:47.71 total
```
Gradle 6.5 with cached daemon:
```
BUILD SUCCESSFUL in 1m 37s
115 actionable tasks: 111 executed, 4 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.02s user 0.10s system 1% cpu 1:38.31 total
```
This PR with no cached Gradle daemon:
```
BUILD SUCCESSFUL in 1m 37s
115 actionable tasks: 81 executed, 34 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.27s user 0.10s system 1% cpu 1:38.70 total
```
This PR with cached Gradle daemon:
```
BUILD SUCCESSFUL in 1m 28s
115 actionable tasks: 111 executed, 4 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.02s user 0.10s system 1% cpu 1:29.35 total
```
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
The method `maybeWriteTxnCompletion` is unsafe for concurrent calls. This can cause duplicate attempts to write the completion record to the log, which can ultimately lead to illegal state errors and possible to correctness violations if another transaction had been started before the duplicate was written. This patch fixes the problem by ensuring only one thread can successfully remove the pending completion from the map.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Check the uncommitted end offset after the committed end offset,
so we can be sure never to miss a pending end-transaction marker.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Minimum fix needed to stop this test failing and unblock others
Co-authored-by: Luke Chen <showuon@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. Enables `TLSv1.3` by default with Java 11 or newer.
2. Add unit tests that cover the various TLSv1.2 and TLSv1.3 combinations.
3. Extend `benchmark_test.py` and `replication_test.py` to run with 'TLSv1.2'
or 'TLSv1.3'.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Fix the failed testMultiConsumerStickyAssignment by modifying the logic error in allSubscriptionsEqual method.
We will create the consumerToOwnedPartitions to keep the set of previously owned partitions encoded in the Subscription. It's our basis to do the reassignment. In the allSubscriptionsEqual, we'll get the member generation of the subscription, and remove all previously owned partitions as invalid if the current generation is higher. However, the logic before my fix, will remove the current highest member out of the consumerToOwnedPartitions, which should be kept because it's the current higher generation member. Fix this logic error.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Fixed spotBugs error introduced by c6633a1:
>Dead store to isFreshAssignment in org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.generalAssign(Map, Map)
Reviewers: Ismael Juma <ismael@juma.me.uk>
This class has been unused since KIP-146 was merged, which introduced another way to load Connect plugins. Removing from the code base.
Reviewers: Boyang Chan <boyang@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>
1. Remove redundant connect#stop from test in InternalTopicsIntegrationTest since we'll do it after each test case in the @After method
2. Refine the error message in topic assertions to make it better explain the errors
Reviewers: Konstantine Karantasis <konstantine@confluent.io>
Motivation and pseudo code algorithm in the ticket.
Added a scale test with large number of topic partitions and consumers and 30s timeout.
With these changes, assignment with 2,000 consumers and 200 topics with 2,000 each completes within a few seconds.
Porting the same test to trunk, it took 2 minutes even with a 100x reduction in the number of topics (ie, 2 minutes for 2,000 consumers and 2 topics with 2,000 partitions)
Should be cherry-picked to 2.6, 2.5, and 2.4
Reviewers: Guozhang Wang <wangguoz@gmail.com>
It improves decompression speed:
>For x64 cpus, expect a speed bump of at least +5%, and up to +10% in favorable cases.
>ARM cpus receive more benefit, with speed improvements ranging from +15% vicinity,
>and up to +50% for certain SoCs and scenarios (ARM‘s situation is more complex due
>to larger differences in SoC designs).
See https://github.com/facebook/zstd/releases/tag/v1.4.5 for more details.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
1. Move KafkaProducer#propsToMap to Utils#propsToMap
2. Apply Utils#propsToMap to constructor of KafkaConsumer
Reviewers: Noa Resare <noa@resare.com>, Ismael Juma <ismael@juma.me.uk>
We have seen this test failing from time to time. In spite of the low quota, it is possible for one or more of the reassignments to complete before verification that the reassignment is in progress. The patch makes this less likely by reducing the quota even further and increasing the amount of data in the topic.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>