_unknownTaggedFields contains tagged fields which we don't understand
with the current schema. However, we still want to keep the data around
for various purposes. For example, if we are printing out a JSON form of
the message we received, we want to include a section containing the
tagged fields that couldn't be parsed. To leave these out would give an
incorrect impression of what was sent over the wire. Since the unknown
tagged fields represent real data, they should be included in the fields
checked by equals().
Reviewers: Ismael Juma <ismael@juma.me.uk>, Boyang Chen <boyang@confluent.io>
`SelectorMetrics` has a per-connection metrics, which means the number of `MetricName` objects and the strings associated with it (such as group name and description) grows with the number of connections in the client. This overhead of duplicate string objects is amplified when there are multiple instances of kafka clients within the same JVM. This patch addresses some of the memory overhead by making `metricGrpName` a constant field and introducing a new field `perConnectionMetricGrpName`.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Currently, Kafka Connect creates its config backing topic with a fire and forget approach.
This is fine unless someone has manually created that topic already with the wrong partition count.
In such a case Kafka Connect may run for some time. Especially if it's in standalone mode and once switched to distributed mode it will almost certainly fail.
This commits adds a check when the KafkaConfigBackingStore is starting.
This check will throw a ConfigException if there is more than one partition in the backing store.
This exception is then caught upstream and logged by either:
- DistributedHerder#run
- ConnectStandalone#main
A unit tests was added in KafkaConfigBackingStoreTest to verify the behaviour.
Author: Evelyn Bayes <evelyn@confluent.io>
Co-authored-by: Randall Hauch <rhauch@gmail.com>
Reviewer: Konstantine Karantasis <konstantine@confluent.io>
Until recently revocation of connectors and tasks was the result of a rebalance that contained a new assignment. Therefore the view of the running assignment was kept consistent outside the call to `RebalanceListener#onRevoke`. However, after KAFKA-9184 the need appeared for the worker to revoke tasks voluntarily and proactively without having received a new assignment.
This commit will allow the worker to restart tasks that have been stopped as a result of voluntary revocation after a rebalance reassigns these tasks to the work.
The fix is tested by extending an existing integration test.
Reviewers: Randall Hauch <rhauch@gmail.com>
This PR provides two fixes:
1. Skip offset validation if the current leader epoch cannot be reliably determined.
2. Raise an out of range error if the leader returns an undefined offset in response to the OffsetsForLeaderEpoch request.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Define SSL configs in all worker config classes, not just distributed
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Nigel Liang <nigel@nigelliang.com>, Randall Hauch <rhauch@gmail.com>
Connector projects may have their own mock or testing implementations of the `SinkTaskContext`, and this newly-added method should be a default method to prevent breaking those projects. Changing this to a default method that returns null also makes sense w/r/t the method semantics, since the method is already defined to return null if the reporter has not been configured.
Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Konstantine Karantasis <konstantine@confluent.io>
Also added a new unit test to verify the functionality and expectations.
Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Konstantine Karantasis <konstantine@confluent.io>
cc omkreddy this should also get backported to 2.6.x
Author: Xavier Léauté <xvrl@apache.org>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8813 from xvrl/fix-jmx-reset
- make task manager agnostic to task state
- make tasks state transitions idempotent
Reviewers: Boyang Chen <boyang@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
There are some broken links of streams javadoc, for example:
ReadOnlyKeyValueStore
ReadOnlyWindowStore
Only javadoc is update.
Reviewers: Bill Bejeck <bbejeck@apache.org>
This PR changes the way `PreferredReplicaImbalanceCount` is computed. It moves from re-computing after the processing of each event in the controller, which requires a full pass over all partitions, to incrementally maintaining the count as assignments and leaders are changing.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
Fixes KAFKA-10033.
Replace AdminOperationException with UnknownTopicOrPartitionException if topic does not exist when validating topic configs in AdminZkClient.
Author: gnkoshelev <gnkoshelev@gmail.com>
Author: Gregory <gnkoshelev@gmail.com>
Reviewers: Brian Byrne <bbyrne@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8715 from gnkoshelev/KAFKA-10033
This applies to the producer, consumer, admin client, connect worker
and inter broker communication.
`ClientDnsLookup.DEFAULT` has been deprecated and a warning
will be logged if it's explicitly set in a client config.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>
With the recent introduction of predicated SMTs, properties named "predicate" and "negate" should be ignored and removed in case they are present in transformation configs.
This commit fixes the equality check to be with the key of the config to apply proper removal.
Reviewers: Tom Bentley <tbentley@redhat.com>, Konstantine Karantasis <konstantine@confluent.io>
Sensor objects are stored in the Kafka metrics registry and keyed by name. If a new sensor is created with the same name as an existing one, the existing one is returned rather than a new object being created. The partition load time sensors for the transaction and group coordinators used the same name, so data recorded to either was stored in the same object. This meant that the metrics values for both metrics were identical and consisted of the combined data. This patch changes the names to be distinct so that the data will be stored in separate Sensor objects.
Reviewers: Jason Gustafson <jason@confluent.io>
Gradle 6.5 includes a fix for https://github.com/gradle/gradle/pull/12866, which
affects the performance of Scala compilation.
I profiled the scalac build with async profiler and 54% of the time was on GC
even after the Gradle upgrade (it was more than 60% before), so I switched to
the throughput GC (GC latency is less important for batch builds) and it
was reduced to 38%.
I also centralized the jvm configuration in `build.gradle` and simplified it a bit
by removing the minHeapSize configuration from the test tasks.
On my desktop, the time to execute clean builds with no cached Gradle daemon
was reduced from 127 seconds to 97 seconds. With a cached daemon, it was
reduced from 120 seconds to 88 seconds. The performance regression when
we upgraded to Gradle 6.x was 27 seconds with a cached daemon
(https://github.com/apache/kafka/pull/7677#issuecomment-616271179), so it
should be fixed now.
Gradle 6.4 with no cached daemon:
```
BUILD SUCCESSFUL in 2m 7s
115 actionable tasks: 112 executed, 3 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.15s user 0.12s system 0% cpu 2:08.06 total
```
Gradle 6.4 with cached daemon:
```
BUILD SUCCESSFUL in 2m 0s
115 actionable tasks: 111 executed, 4 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 0.95s user 0.10s system 0% cpu 2:01.42 total
```
Gradle 6.5 with no cached daemon:
```
BUILD SUCCESSFUL in 1m 46s
115 actionable tasks: 111 executed, 4 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.27s user 0.12s system 1% cpu 1:47.71 total
```
Gradle 6.5 with cached daemon:
```
BUILD SUCCESSFUL in 1m 37s
115 actionable tasks: 111 executed, 4 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.02s user 0.10s system 1% cpu 1:38.31 total
```
This PR with no cached Gradle daemon:
```
BUILD SUCCESSFUL in 1m 37s
115 actionable tasks: 81 executed, 34 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.27s user 0.10s system 1% cpu 1:38.70 total
```
This PR with cached Gradle daemon:
```
BUILD SUCCESSFUL in 1m 28s
115 actionable tasks: 111 executed, 4 up-to-date
./gradlew clean compileScala compileJava compileTestScala compileTestJava 1.02s user 0.10s system 1% cpu 1:29.35 total
```
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
The method `maybeWriteTxnCompletion` is unsafe for concurrent calls. This can cause duplicate attempts to write the completion record to the log, which can ultimately lead to illegal state errors and possible to correctness violations if another transaction had been started before the duplicate was written. This patch fixes the problem by ensuring only one thread can successfully remove the pending completion from the map.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Check the uncommitted end offset after the committed end offset,
so we can be sure never to miss a pending end-transaction marker.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Minimum fix needed to stop this test failing and unblock others
Co-authored-by: Luke Chen <showuon@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. Enables `TLSv1.3` by default with Java 11 or newer.
2. Add unit tests that cover the various TLSv1.2 and TLSv1.3 combinations.
3. Extend `benchmark_test.py` and `replication_test.py` to run with 'TLSv1.2'
or 'TLSv1.3'.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Fix the failed testMultiConsumerStickyAssignment by modifying the logic error in allSubscriptionsEqual method.
We will create the consumerToOwnedPartitions to keep the set of previously owned partitions encoded in the Subscription. It's our basis to do the reassignment. In the allSubscriptionsEqual, we'll get the member generation of the subscription, and remove all previously owned partitions as invalid if the current generation is higher. However, the logic before my fix, will remove the current highest member out of the consumerToOwnedPartitions, which should be kept because it's the current higher generation member. Fix this logic error.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Fixed spotBugs error introduced by c6633a1:
>Dead store to isFreshAssignment in org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.generalAssign(Map, Map)
Reviewers: Ismael Juma <ismael@juma.me.uk>
This class has been unused since KIP-146 was merged, which introduced another way to load Connect plugins. Removing from the code base.
Reviewers: Boyang Chan <boyang@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>
1. Remove redundant connect#stop from test in InternalTopicsIntegrationTest since we'll do it after each test case in the @After method
2. Refine the error message in topic assertions to make it better explain the errors
Reviewers: Konstantine Karantasis <konstantine@confluent.io>
Motivation and pseudo code algorithm in the ticket.
Added a scale test with large number of topic partitions and consumers and 30s timeout.
With these changes, assignment with 2,000 consumers and 200 topics with 2,000 each completes within a few seconds.
Porting the same test to trunk, it took 2 minutes even with a 100x reduction in the number of topics (ie, 2 minutes for 2,000 consumers and 2 topics with 2,000 partitions)
Should be cherry-picked to 2.6, 2.5, and 2.4
Reviewers: Guozhang Wang <wangguoz@gmail.com>