A call to `kafka-consumer-groups --describe --group ...` can result in NullPointerException for two reasons:
1) `Fetcher.fetchOffsetsByTimes()` may return too early, without sending list offsets request for topic partitions that are not in cached metadata.
2) `ConsumerGroupCommand.getLogEndOffsets()` and `getLogStartOffsets()` assumed that endOffsets()/beginningOffsets() which eventually call Fetcher.fetchOffsetsByTimes(), would return a map with all the topic partitions passed to endOffsets()/beginningOffsets() and that values are not null. Because of (1), null values were possible if some of the topic partitions were already known (in metadata cache) and some not (metadata cache did not have entries for some of the topic partitions). However, even with fixing (1), endOffsets()/beginningOffsets() may return a map with some topic partitions missing, when list offset request returns a non-retriable error. This happens in corner cases such as message format on broker is before 0.10, or maybe in cases of some other errors.
Testing:
-- added unit test to verify fix in Fetcher.fetchOffsetsByTimes()
-- did some manual testing with `kafka-consumer-groups --describe`, causing NPE. Was not able to reproduce any NPE cases with DescribeConsumerGroupTest.scala,
Reviewers: Jason Gustafson <jason@confluent.io>
With KIP-320, the OffsetsForLeaderEpoch API is intended to be used by consumers to detect log truncation. Therefore the new response schema should expose a field for the throttle time like all the other APIs.
Reviewers: Dong Lin <lindong28@gmail.com>
findBugs is abandoned, it doesn't work with Java 9 and the Gradle plugin will be deprecated in
Gradle 5.0: https://github.com/gradle/gradle/pull/6664
spotBugs is actively maintained and it supports Java 8, 9 and 10. Java 11 is not supported yet,
but it's likely to happen soon.
Also fixed a file leak in Connect identified by spotbugs.
Manually tested spotBugsMain, jarAll and importing kafka in IntelliJ and running
a build in the IDE.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Dong Lin <lindong28@gmail.com>
Closes#5625 from ijuma/kafka-5887-spotbugs
[KAFKA-4932](https://issues.apache.org/jira/browse/KAFKA-4932)
Added a UUID Serializer / Deserializer.
Added the UUID type to the SerializationTest
Author: Brandon Kirchner <brandon.kirchner@civitaslearning.com>
Reviewers: Jeff Klukas <jeff@klukas.net>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4438 from brandonkirchner/KAFKA-4932.uuid-serde
This patch contains the protocol updates needed for KIP-320 as well as some of the basic consumer APIs (e.g. `OffsetAndMetadata` and `ConsumerRecord`). The inter-broker format version has not been changed and the brokers will continue to use the current API versions.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5564 from hachikuji/KAFKA-7333
If a group metadata record size is higher than offsets.load.buffer.size,
loading offsets and group metadata from __consumer_offsets would hang
forever. This was due to the buffer being too small to fit any message
bigger than the maximum configuration. This patch grows the buffer
as needed so the large records will fit and the loading can move on.
A similar change was made to the logic for state loading in the transaction
coordinator.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, lambdaliu <lambdaliu@users.noreply.github.com>, Dhruvil Shah <dhruvil@confluent.io>, Jason Gustafson <jason@confluent.io>
With idempotent/transactional producers, we may leave empty batches in the log during log compaction. When filtering the data, we keep track of state like `maxOffset` and `maxTimestamp` of filtered data. This patch ensures we maintain this state correctly for the case when only empty batches are left in `MemoryRecords#filterTo`. Without this patch, we did not initialize `maxOffset` in this edge case which led us to append data to the log with `maxOffset` = -1L, causing the append to fail and log cleaner to crash.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch removes the duplication of the out of range handling between `ReplicaFetcherThread` and `ReplicaAlterLogDirsThread` and attempts to expose a cleaner API for extension. It also adds a mock implementation to facilitate testing and several new test cases.
Reviewers: Jun Rao <junrao@gmail.com>
This PR aims to enforce that the controller can only update zookeeper states after checking the controller epoch zkVersion. The check and zookeeper state updates are wrapped in the zookeeper multi() operations to ensure that they are done atomically. This PR is necessary to resolve issues related to multiple controllers (i.e. old controller updates zookeeper states before resignation, which is possible during controller failover based on the single threaded event queue model we have)
This PR includes the following changes:
- Add MultiOp request and response in ZookeeperClient
- Ensure all zookeeper updates done by controller are protected by checking the current controller epoch zkVersion
- Modify test cases in KafkaZkClientTest to test mismatch controller epoch zkVersion
Tests Done:
- Unit tests (with updated tests to test mismatch controller epoch zkVersion)
- Existing integration tests
Author: Zhanxiang (Patrick) Huang <hzxa21@hotmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Dong Lin <lindong28@gmail.com>, Manikumar Reddy O <manikumar.reddy@gmail.com>
Closes#5101 from hzxa21/KAFKA-6082
Replace 'this' reference in anonymous inner class logs to out class's 'this'
Author: Kevin Lafferty <kevin.lafferty@gmail.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, Arjun Satish <arjun@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5583 from kevin-laff/connect_logging
With KIP-266 introduced, MirrorMaker should handle TimeoutException thrown in commitSync(). Besides, MM should only commit offsets for existsing topics.
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5492 from huxihx/KAFKA-7211
This fixes the following Gradle warning:
> The following annotation processors were detected on the compile classpath:
> 'org.openjdk.jmh.generators.BenchmarkProcessor'. Detecting annotation processors
> on the compile classpath is deprecated and Gradle 5.0 will ignore them. Please add
> them to the annotation processor path instead. If you did not intend to use annotation
> processors, you can use the '-proc:none' compiler argument to ignore them.
With this change, the warning went away and `./jmh-benchmarks/jmh.sh` continues
to work.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5602 from ijuma/annotation-processor
"Jetty 9.4.12 includes compatibility for JDK 11. Additionally, TLS 1.3 support has been implemented. While full functionality for new JDK features is not yet supported, this release has been built and tested for compatibility with the latest releases from Oracle."
http://dev.eclipse.org/mhonarc/lists/jetty-announce/msg00124.html
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
This ensures that the whole test suite is executed even
if there are failures. It currently stops at a module
boundary if there are any failures. There is a discussion
to change the gradle default to stop after the first test failure:
https://github.com/gradle/gradle/issues/6513
`--continue` is recommended for CI in that discussion.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5599 from ijuma/jenkins-script-continue
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
We should retry when possible if ListGroups fails due to a retriable error (e.g. coordinator loading).
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Guozhang Wang <wangguoz@gmail.com>
Previously, we depicted creating a Jackson serde for every pojo class, which becomes a burden in practice. There are many ways to avoid this and just have a single serde, so we've decided to model this design choice instead.
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Pull the epoch request build logic up to `AbstractFetcherThread`. Also get rid of the `FetchRequest` indirection.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
If a broker or topic has a message format < 0.11, it does not track leader epochs. LeaderEpochRequests for such will always return undefined, making the follower truncate to the highe watermark. Since there is no use to use the network for such cases, don't send a request.
Reviewers: Anna Povzner <anna@confluent.io>, Jason Gustafson <jason@confluent.io>
Java 11 enables aes128-cts-hmac-sha256-128 and aes256-cts-hmac-sha384-192 by default. These are not supported in earlier versions of Java and not supported by Apache DS libraries used by MiniKdc. To ensure that the default kerberos configuration used by Kafka integration and system tests work with all versions of Java 8 and above, configure default_tkt_enctypes and default_tgs_enctypes to use aes128-cts-hmac-sha1-96.
Reviewers: Ismael Juma <ismael@juma.me.uk>
If the server fails to connect to zk, `kafkaScheduler` will be `null`
when closing `KafkaServer`. The NPE is swallowed so it's
harmless apart from the confusing log noise.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This PR fixes the previously recursive call of Streams Scala peek
Reviewers: Joan Goyeau <joan@goyeau.com>, Guozhang Wang <guozhang@confluent.io>, John Roesler <john@confluent.io>
Currently, MBean `kafka.network:type=Processor,name=IdlePercent,networkProcessor=*` and `afka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent` could be greater than 1. However, these two values represent a percentage which should not exceed 1.
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5584 from huxihx/KAFKA-7354
Java 11 supports TLS 1.3 which has different cipher names than
previous TLS versions so the simplistic way of choosing ciphers
is not guaranteed to work. Fix it by configuring the context
to use TLS 1.2.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Set empty extensions if null is passed in.
Reviewers: Satish Duggana <sduggana@hortonworks.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
During actions such as a reconfiguration, the task configs are obtained
via `Worker.connectorTaskConfigs` and then subsequently saved into an
instance of `ClusterConfigState`. The values of the properties that are saved
are post-transformation (of variable references) when they should be
pre-transformation. This is to avoid secrets appearing in plaintext in
the `connect-configs` topic, for example.
The fix is to change the 2 clients of `Worker.connectorTaskConfigs` to
perform a reverse transformation (values converted back into variable
references) before saving them into an instance of `ClusterConfigState`.
The 2 places where the save is performed are
`DistributedHerder.reconfigureConnector` and
`StandaloneHerder.updateConnectorTasks`.
The way that the reverse transformation works is by using the
"raw" connector config (with variable references still intact) from
`ClusterConfigState` to convert config values back into variable
references for those keys that are common between the task config
and the connector config.
There are 2 additional small changes that only affect `StandaloneHerder`:
1) `ClusterConfigState.allTasksConfigs` has been changed to perform a
transformation (resolution) on all variable references. This is
necessary because the result of this method is compared directly to
`Worker.connectorTaskConfigs`, which also has variable references
resolved.
2) `StandaloneHerder.startConnector` has been changed to match
`DistributedHerder.startConnector`. This is to fix an issue where
during `StandaloneHerder.restartConnector`, the post-transformed
connector config would be saved back into `ClusterConfigState`.
I also performed an analysis of all other code paths where configs are
saved back into `ClusterConfigState` and did not find any other
issues.
Author: Robert Yokota <rayokota@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5475 from rayokota/KAFKA-7242-reverse-xform-props
If follower is not in ISR, it has to fetch up to start offset of the current leader epoch. Otherwise we risk losing committed data. Added unit test to verify this behavior.
Reviewers: Jason Gustafson <jason@confluent.io>
The broker should return NOT_LEADER_FOR_PARTITION for OffsetsForLeaderEpoch requests against non-replicas instead of UNKNOWN_TOPIC_OR_PARTITION. This patch also fixes a minor bug in the handling of ListOffsets request using the DEBUG replica id. We should return UNKNOWN_TOPIC_OR_PARTITION if the topic doesn't exist.
Reviewers: Jun Rao <junrao@gmail.com>
Jacoco 0.8.2 adds Java 11 support:
https://github.com/jacoco/jacoco/releases/tag/v0.8.2
Java 11 RC1 is out so it would be good for us to
get a working CI build.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5568 from ijuma/jacoco-0.8.2