Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4103 from rajinisivaram/KAFKA-6042-group-deadlock
(cherry picked from commit 5ee157126d)
Signed-off-by: Guozhang Wang <wangguoz@gmail.com>
Rearranged the testAddPartitionDuringDeleteTopic() test to keep the
likelyhood of the race condition.
Author: Maytee Chinavanichkit <maytee.chinavanichkit@linecorp.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#4056 from mayt/KAFKA-6051
Kafka today uses ZkClient, a wrapper client around the raw Zookeeper client. This library only exposes synchronous apis to the user. Synchronous apis mean we must wait an entire round trip before doing the next operation.
This becomes problematic with partition-heavy clusters, as we find the controller spending a significant amount of time just sending many sequential reads and writes to zookeeper at the per-partition granularity. This especially becomes an issue during:
- controller failover, where the newly elected controller effectively reads all zookeeper state.
- broker failures and controlled shutdown. The controller tries to elect a new leader for partitions previously led by the broker. The controller also removes the broker from isr on partitions for which the broker was a follower. These all incur partition-granular reads and writes to zookeeper.
As a first step in addressing these issues, we built a low-level wrapper client called ZookeeperClient in KAFKA-5501 that encourages pipelined, asynchronous apis.
This patch converts the controller to use the async ZookeeperClient to improve controller failover, broker failure handling, and controlled shutdown times.
Some notable changes made in this patch:
- All ControllerEvents now defer access to zookeeper at processing time instead of enqueue time as was intended with the single-threaded event queue model patch from KAFKA-5028. This results in a fresh view of the zookeeper state by the time we process the event. This reverts the hacks from KAFKA-5502 and KAFKA-5879.
- We refactored PartitionStateMachine and ReplicaStateMachine to process multiple partitions and replicas in batch rather than one-at-a-time so that we can send a batch of requests over to ZookeeperClient to pipeline.
- We've decouple ZookeeperClient handler registration from watcher registration. Previously, these two were coupled, which meant handler registrations actually sent out a request to the zookeeper ensemble to do the actual watcher registration. In KafkaController.onControllerFailover, we register partition modification handlers (thereby registering watchers) and additionally lookup the partition assignments for every topic in the cluster. We can shave a bit of time off failover if we merge these two operations. We can do this by decoupling ZookeeperClient handler registration from watcher registration. This means ZookeeperClient's registration apis have been changed so that they are purely in-memory operations, and they only take effect when the client sends ExistsRequest, GetDataRequest, or GetChildrenRequest.
- We've simplified the logic for updating LeaderAndIsr such that if we get a BADVERSION error code, the controller will now just retry in the next round by reading the new state and trying the update again. This simplifies logic when updating the partition leader epoch, removing replicas from isr, and electing leaders for partitions.
- We've implemented KAFKA-5083: always leave the last surviving member of the ISR in ZK. This means that if people re-disabled unclean leader election, we can still try to elect the leader from the last in-sync replica.
- ZookeeperClient's handlers have been changed so that their methods default to no-ops for convenience.
- All znode paths and definitions for znode encoding and decoding have been consolidated as static methods in ZkData.scala.
- The partition leader election algorithms have been refactored as pure functions so that they can be easily unit tested.
- PartitionStateMachine and ReplicaStateMachine now have unit tests.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3765 from onurkaraman/KAFKA-5642
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Tom Bentley <tbentley@redhat.com>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3874 from lindong28/KAFKA-5163
The following happens on Windows for `HUP`:
[2017-10-11 21:45:11,642] FATAL (kafka.Kafka$)
java.lang.IllegalArgumentException: Unknown signal: HUP
at sun.misc.Signal.<init>(Unknown Source)
at kafka.Kafka$.registerHandler$1(Kafka.scala:67)
at kafka.Kafka$.registerLoggingSignalHandler(Kafka.scala:73)
at kafka.Kafka$.main(Kafka.scala:82)
at kafka.Kafka.main(Kafka.scala)
I thought it was safer not to register them at all since the additional
logging is a nice to have and we haven't tested it on Windows.
Also changed map to be concurrent and removed stray
printStackTrace in test.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#4066 from ijuma/dont-register-signal-handler-windows
Use Scala string templates instead of format
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4058 from mimaison/minor_AFT_logging
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#4035 from hachikuji/KAFKA-5547-followup and squashes the following commits:
f6b04ce1a [Jason Gustafson] Add a couple missed common fields
d3473b14d [Jason Gustafson] Fix compilation errors and a few warnings
58a0ae695 [Jason Gustafson] MINOR: Avoid some unnecessary collection copies in KafkaApis
Author: Xin Li <Xin.Li@trivago.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#4043 from lisa2lisa/fix
(cherry picked from commit bb27215cea)
Signed-off-by: Jason Gustafson <jason@confluent.io>
…up rebalances in progress
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3506 from cmccabe/KAFKA-5565
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4023 from ijuma/kafka-5829-avoid-reading-older-segments-on-hard-shutdown
It is possible for batches with sequence numbers to be in the `deque` while at the same time the in flight batches in the `TransactionManager` are removed due to a producerId reset.
In this case, when the batches in the `deque` are drained, we will get a `NullPointerException` in the background thread due to this line:
```java
if (first.hasSequence() && first.baseSequence() != transactionManager.nextBatchBySequence(first.topicPartition).baseSequence())
```
Particularly, `transactionManager.nextBatchBySequence` will return null, because there no inflight batches being tracked.
In this patch, we simply allow the batches in the `deque` to be drained if there are no in flight batches being tracked in the TransactionManager. If they succeed, well and good. If the responses come back with an error, the batces will be ultimately failed in the producer with an `OutOfOrderSequenceException` when the response comes back.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4022 from apurvam/KAFKA-6015-npe-in-record-accumulator
Without this patch, if the replica's log was somehow truncated before
the leader's, it is possible for the replica fetcher thread to
continuously throw an OutOfOrderSequenceException because the
incoming sequence would be non-zero and there is no local state.
This patch changes the behavior so that the replica state is updated to
the leader's state if there was no local state for the producer at the
time of the append.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#4004 from apurvam/KAFKA-6003-handle-unknown-producer-on-replica
For record conversion tests, check time >=0 since conversion times may be too small to be measured accurately. Since default value is -1, the test is still useful. Also increase message size in SslTransportLayerTest#testNetworkThreadTimeRecorded to avoid failures when processing time is too small.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4018 from rajinisivaram/KAFKA-6010-MemoryRecordsBuilderTest
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3956 from rajinisivaram/KAFKA-5970-delayedproduce-deadlock
- Improve tests and javadoc (including expected exceptions)
- Return correct authorization error if no describe topic
permission
Author: Tom Bentley <tbentley@redhat.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3937 from tombentley/KAFKA-5856-AdminClient.createPartitions-follow-up
- Simplify LogCleaner.cleanSegments and add comment regarding thread
unsafe usage of `LogSegment.append`. This was a result of investigating
KAFKA-4972.
- Fix compiler warnings (in some cases use the fully qualified name as a
workaround for deprecation warnings in import statements).
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4016 from ijuma/simplify-log-cleaner-and-fix-warnings
Since we removed the unused `TRACE` option from `SecurityProtocol`, it now seems safer to expose it from `AuthenticationContext`. Additionally this patch exposes javadocs under security.auth and relocates the `Login` and `AuthCallbackHandler` to a non-public package.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3863 from hachikuji/use-security-protocol-in-auth-context
This simplifies running a console consumer as part of a custom group. Note that group id can be provided via only one of the three possible means: directly using `--group ` option (as part of this PR), as property via `--consumer-property` option, or inside a config file via `--consumer.config` option.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#2150 from vahidhashemian/KAFKA-4416
- Upgrade Gradle to 4.2.1, which handles Azul Zulu 9's version
correctly.
- Add tests to our Java version handling code
- Refactor the code to make it possible to add tests
- Rename `isIBMJdk` method to use consistent naming
convention.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4007 from ijuma/java-9-version-handling-improvements
This PR improves the output format of DumpLogSegments when the `--offset-decoder` option is used for consuming `__consumer_offsets`, especially when it comes to group metadata.
An example of the partial output with existing formatting:
```
key: metadata::console-consumer-40190 payload: consumer:range:1:{consumer-1-20240b92-fbf4-44d5-bf8c-66b6d70c9948=[foo-0]}
```
An example of the same output with suggested formatting:
```
key: {"metadata":"console-consumer-40190"} payload: {"protocolType":"consumer","protocol":"range","generationId":1,"assignment":"{consumer-1-20240b92-fbf4-44d5-bf8c-66b6d70c9948=[foo-0]}"}
```
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#1937 from vahidhashemian/KAFKA-4108
We log a warning instead, which is what we also do if the partition
hasn't been created yet.
A few other improvements:
- Return updated high watermark if fetch is returned immediately.
This seems to be more intuitive and is consistent with the case
where the fetch request is served from the purgatory.
- Centralise offline partition handling
- Remove unnecessary `tryCompleteDelayedProduce` that would
already have been done by the called method
- A few other minor clean-ups
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#3954 from ijuma/kafka-5758-dont-fail-fetch-request-if-replica-is-not-follower
Author: Matthias J. Sax <matthias@confluent.io>
Author: Bharat Viswanadham <bharatv@us.ibm.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Damian Guy <damian.guy@gmail.com>
Closes#3970 from mjsax/kafka-5225-streams-resetter-properties
Replaces the static `RequestMetrics` object with a class so that metrics
are created and removed during broker startup and shutdown to avoid metrics
tests being affected by metrics left behind by previous tests.
Also reinstates `kafka.api.MetricsTest` which was failing frequently earlier
due to tests removing the static request metrics.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3991 from rajinisivaram/KAFKA-5968
It should be the number of records instead of the
number of batches.
A few additional clean-ups:
- Minor renames
- Removed unused variable
- Some test fixes
- Ignore a flaky test
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, tedyu <yuzhihong@gmail.com>
Closes#3989 from ijuma/kafka-5746-health-metrics-follow-up
Adds new metrics to support health checks:
1. Error rates for each request type, per-error code
2. Request size and temporary memory size
3. Message conversion rate and time
4. Successful and failed authentication rates
5. ZooKeeper latency and status
6. Client version
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3705 from rajinisivaram/KAFKA-5746-new-metrics
It now records the correct size for `Send` and does more appropriate
logging if the connection is being closed or if no response is being
sent.
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Denis Bolshakov, Ismael Juma <ismael@juma.me.uk>
Closes#3961 from huxihx/KAFKA-5976
This depends on sun.misc.Signal and sun.misc.SignalHandler, which may be
removed in future releases. But along with sun.misc.Unsafe, these classes
are available in Java 9 (see JEP 260), so they are safe to use for now.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3668 from rajinisivaram/KAFKA-5679
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3944 from hachikuji/KAFKA-5960
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3036 from mimaison/KAFKA-3356
Offset and partition are not converted from String to
long and int correctly.
Running the command line with --from-file option causes
the following exception:
java.lang.ClassCastException: java.lang.String cannot be
cast to java.lang.Integer
Reason: asInstanceOf used for the conversion.
Also, unit test is using --to-earliest and --from-file
together when executing the test. This is executing
--to-earliest option only and ignoring --from-file
option. Since the preparation part is also using
--to-earliest to create the file, this unit test
passes without testing --from-file option. Fixed
the unit test too.
Author: Erkan Unal <eunal@cisco.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3938 from eu657/eu657-patch-1
As mentioned in MappedByteBuffers' class documentation, its
implementation was inspired by Lucene's MMapDirectory:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.1/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java#L315
Without this change, unmapping fails with the following message:
> java.lang.IllegalAccessError: class kafka.log.AbstractIndex (in unnamed module 0x45103d6b) cannot access class jdk.internal.ref.Cleaner (in module java.base) because module java.base does not export jdk.internal.ref to unnamed module 0x45103d6b
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3879 from ijuma/kafka-5915-unmap-mapped-buffers-java-9
These methods help users check for cases in which this metadata was not returned by the broker (e.g. in the case of acks=0 or a duplicate error when idempotence is enabled).
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3878 from apurvam/KAFKA-5913-add-record-metadata-not-available-exception
Developed with edoardocomar
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Edoardo Comar <ecomar@uk.ibm.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3906 from mimaison/KAFKA-5735
- Remove DelayedCreatePartitions to reduce code duplication
- Avoid unnecessary ZK calls (call it once per request instead
of once per topic, if possible)
- Simplify code
- A few minor clean-ups
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Tom Bentley <tbentley@redhat.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3930 from ijuma/kafka-5856-admin-client-creation-partitions