This fixes a regression caused by KAFKA-4682 (KIP-211) which caused offset commit failures after upgrading from an older version which used the v1 inter-broker format.
Part 1 of the suppression API.
* add the DSL suppress method and config objects
* add the processor, but only in "identity" mode (i.e., it will forward only if the suppression spec says to forward immediately)
* add tests
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
* KAFKA-7400: Compacted topic segments that precede the log start offset are not cleaned up
Currently we don't delete any log segments if the cleanup policy doesn't include delete. This patch changes the behavior to delete log segments that fully precede the log start offset even when deletion is not enabled. Tested with unit tests to verify that LogManager.cleanupLogs now cleans logs with cleanup.policy=compact and that Log.deleteOldSegments deletes segments that preced the start offset regardless of the cleanup policy.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Jason Gustafson <jason@confluent.io>, Jun Rao <junrao@gmail.com>
This patch implements KIP-336. It adds a default implementation to the Serializer/Deserializer interface to support the use of headers and it deprecates the ExtendedSerializer and ExtendedDeserializer interfaces for later removal.
Reviewers: Satish Duggana <sduggana@hortonworks.com>, John Roesler <john@confluent.io>, Jason Gustafson <jason@confluent.io>
What changes were proposed in this pull request?
atLeast(0) in StreamsConfig, ProducerConfig and ConsumerConfig were replaced by SEND_BUFFER_LOWER_BOUND and RECEIVE_BUFFER_LOWER_BOUND from CommonClientConfigs.
How was this patch tested?
Three unit tests were added to KafkaStreamsTest
Reviewers: Guozhang Wang <guozhang@confluent.io>, John Roesler <john@confluent.io>, Matthias J. Sax <mjsax@apache.org>
Increasing the number of unique keys, to increase likelihood that the test exposes KAFKA-7192.
Reviewers: Apurva Mehta <apurva@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>
In order to fix race condition between log cleaner thread and log retention thread when dynamically switching topic cleanup policy, existing log cleaner in-progress map is used to prevent more than one thread from working on the same topic partition.
Author: Xiongqi Wesley Wu <xiongqi.wu@gmail.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5591 from xiowu0/trunk
This patch fixes the inconsistent handling of out of range errors in the replica fetcher. Previously we would raise a fatal error if the follower's offset is ahead of the leader's and unclean leader election is not enabled. The behavior was inconsistent depending on the message format. With KIP-101/KIP-279 and the new message format, upon becoming a follower, the replica would use leader epoch information to reconcile the end of the log with the leader and simply truncate. Additionally, with the old format, the check is not really bulletproof for detecting data loss since the unclean leader's end offset might have already caught up to the follower's offset at the time of its initial fetch or when it queries for the current log end offset.
With this patch, we simply skip the unclean leader election check and allow the needed truncation to occur. When the truncation offset is below the high watermark, a warning will be logged. This makes the behavior consistent for all message formats and removes a scenario in which an error on one partition can bring the broker down.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
This patch fixes unsafe concurrent access in the consumer by the heartbeat thread and the thread calling `poll()` to the fetch session state in `FetchSessionHandler`.
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>
Removed ignore annotations from the upgrade tests. This PR includes the following changes for updating the upgrade tests:
* Uploaded new versions 0.10.2.2, 0.11.0.3, 1.0.2, 1.1.1, and 2.0.0 (in the associated scala versions) to kafka-packages
* Update versions in version.py, Dockerfile, base.sh
* Added new versions to StreamsUpgradeTest.test_upgrade_downgrade_brokers including version 2.0.0
* Added new versions StreamsUpgradeTest.test_simple_upgrade_downgrade test excluding version 2.0.0
* Version 2.0.0 is excluded from the streams upgrade/downgrade test as StreamsConfig needs an update for the new version, requiring a KIP. Once the community votes the KIP in, a minor follow-up PR can be pushed to add the 2.0.0 version to the upgrade test.
* Fixed minor bug in kafka-run-class.sh for classpath in upgrade/downgrade tests across versions.
* Follow on PRs for 0.10.2x, 0.11.0x, 1.0.x, 1.1.x, and 2.0.x will be pushed soon with the same updates required for the specific version.
Reviewers: Eno Thereska <eno.thereska@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <guozhang@confluent.io>, John Roessler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Eno Thereska <enother@amazon.com>
Modified several classes' `equals` methods and simplified a complex method to
reduce the NPath complexity so they could be removed from the checkstyle
suppressions that were required with the recent move to Java 8 and upgrade
of Checkstyle: https://github.com/apache/kafka/pull/5046.
Reviewers: Robert Yokota <rayokota@gmail.com>, Arjun Satish <arjun@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Currently, scala.Serdes.String, for example, invokes Serdes.String() once and caches the result.
However, the implementation of the String serde has a non-empty configure method that is variant in whether it's used as a key or value serde. So we won't get correct execution if we create one serde and use it for both keys and values.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
* Refactor the StreamThread main loop, in the following:
1. Fetch from consumer and enqueue data to tasks.
2. Check if any tasks should be enforced process.
3/ Loop over processable tasks and process them for N iterations, and then check for 1) commit, 2) punctuate, 3) need to call consumer.poll
4. Even if there is not data to process in this iteration, still need to check if commit / punctuate is needed
5. Finally, try update standby tasks.
*Add an optimization to only commit when it is needed (i.e. at least some process() or punctuate() was triggered since last commit).
*Found and fixed a ProducerFencedException scenario: producer.send() call would never throw a ProducerFencedException directly, but it may throw a KafkaException whose "cause" is a ProducerFencedException.
Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>
A call to `kafka-consumer-groups --describe --group ...` can result in NullPointerException for two reasons:
1) `Fetcher.fetchOffsetsByTimes()` may return too early, without sending list offsets request for topic partitions that are not in cached metadata.
2) `ConsumerGroupCommand.getLogEndOffsets()` and `getLogStartOffsets()` assumed that endOffsets()/beginningOffsets() which eventually call Fetcher.fetchOffsetsByTimes(), would return a map with all the topic partitions passed to endOffsets()/beginningOffsets() and that values are not null. Because of (1), null values were possible if some of the topic partitions were already known (in metadata cache) and some not (metadata cache did not have entries for some of the topic partitions). However, even with fixing (1), endOffsets()/beginningOffsets() may return a map with some topic partitions missing, when list offset request returns a non-retriable error. This happens in corner cases such as message format on broker is before 0.10, or maybe in cases of some other errors.
Testing:
-- added unit test to verify fix in Fetcher.fetchOffsetsByTimes()
-- did some manual testing with `kafka-consumer-groups --describe`, causing NPE. Was not able to reproduce any NPE cases with DescribeConsumerGroupTest.scala,
Reviewers: Jason Gustafson <jason@confluent.io>
With KIP-320, the OffsetsForLeaderEpoch API is intended to be used by consumers to detect log truncation. Therefore the new response schema should expose a field for the throttle time like all the other APIs.
Reviewers: Dong Lin <lindong28@gmail.com>
findBugs is abandoned, it doesn't work with Java 9 and the Gradle plugin will be deprecated in
Gradle 5.0: https://github.com/gradle/gradle/pull/6664
spotBugs is actively maintained and it supports Java 8, 9 and 10. Java 11 is not supported yet,
but it's likely to happen soon.
Also fixed a file leak in Connect identified by spotbugs.
Manually tested spotBugsMain, jarAll and importing kafka in IntelliJ and running
a build in the IDE.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Dong Lin <lindong28@gmail.com>
Closes#5625 from ijuma/kafka-5887-spotbugs
[KAFKA-4932](https://issues.apache.org/jira/browse/KAFKA-4932)
Added a UUID Serializer / Deserializer.
Added the UUID type to the SerializationTest
Author: Brandon Kirchner <brandon.kirchner@civitaslearning.com>
Reviewers: Jeff Klukas <jeff@klukas.net>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4438 from brandonkirchner/KAFKA-4932.uuid-serde
This patch contains the protocol updates needed for KIP-320 as well as some of the basic consumer APIs (e.g. `OffsetAndMetadata` and `ConsumerRecord`). The inter-broker format version has not been changed and the brokers will continue to use the current API versions.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5564 from hachikuji/KAFKA-7333
If a group metadata record size is higher than offsets.load.buffer.size,
loading offsets and group metadata from __consumer_offsets would hang
forever. This was due to the buffer being too small to fit any message
bigger than the maximum configuration. This patch grows the buffer
as needed so the large records will fit and the loading can move on.
A similar change was made to the logic for state loading in the transaction
coordinator.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, lambdaliu <lambdaliu@users.noreply.github.com>, Dhruvil Shah <dhruvil@confluent.io>, Jason Gustafson <jason@confluent.io>
With idempotent/transactional producers, we may leave empty batches in the log during log compaction. When filtering the data, we keep track of state like `maxOffset` and `maxTimestamp` of filtered data. This patch ensures we maintain this state correctly for the case when only empty batches are left in `MemoryRecords#filterTo`. Without this patch, we did not initialize `maxOffset` in this edge case which led us to append data to the log with `maxOffset` = -1L, causing the append to fail and log cleaner to crash.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch removes the duplication of the out of range handling between `ReplicaFetcherThread` and `ReplicaAlterLogDirsThread` and attempts to expose a cleaner API for extension. It also adds a mock implementation to facilitate testing and several new test cases.
Reviewers: Jun Rao <junrao@gmail.com>