Scenario is as follows:
1. Consumer subscribes to topic t1 and begins consuming
2. heartbeat fails as the group is rebalancing
3. ConsumerCoordinator.onJoinGroupPrepare is called
3.1 onPartitionsRevoked is called
4. consumer becomes the group leader
5. sends sync group request
6. sync group is cancelled due to disconnection
7. fetch request is sent for partitions that have previously been revoked
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3181 from dguy/kafka-5154
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3193 from mjsax/kafka-5361-add-eos-integration-tests-for-streams-api
More specifically, V2 messages are always batched (whether compressed or not) while
V0/V1 are only batched if they are compressed.
Clients like librdkafka expect to receive messages from the fetch offset when dealing with uncompressed V0/V1 messages. When converting from V2 to V0/1, we were returning all the
messages in the V2 batch.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3191 from ijuma/kafka-5360-down-converted-uncompressed-respect-offset
Without this patch, future client retries would get the `CONCURRENT_TRANSACTIONS` error code indefinitely, since the pending state wouldn't be cleared when the append to the log failed.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3184 from apurvam/KAFKA-5351-clear-pending-state-on-retriable-error
subscribed stream may not be detected by all followers until
onJoinComplete returns.
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3157 from bbejeck/KAFKA-5226_null_pointer_source_node_deserialize
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#3172 from guozhangwang/K5350-compatibility-annotations
- Added a boolean `allow_auto_topic_creation` to MetadataRequest and
bumped the protocol version to V4.
- When connecting to brokers older than 0.11.0.0, the `allow_auto_topic_creation`
field won't be considered, so we send a metadata request for all topics
to keep the behavior consistent.
- Set `allow_auto_topic_creation` to false in the new AdminClient and
StreamsKafkaClient (which exists for the purpose of creating topics
manually); set it to true everywhere else for now. Other clients will eventually
rely on client-side auto topic creation, but that’s not there yet.
- Add `allowAutoTopicCreation` field to `Metadata`, which is used by
`DefaultMetadataUpdater`. This is not strictly needed for the new
`AdminClient`, but it avoids surprises if it ever adds a topic to `Metadata`
via `setTopics` or `addTopic`.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#3098 from ijuma/kafka-5291-admin-client-no-auto-topic-creation
This makes the case where we build the records from scratch consistent
with the case where update the batch header "in place". Thanks to
edenhill who found the issue while testing librdkafka.
The reason our tests don’t catch this is that we rely on the maxTimestamp
to compute the record level timestamps if log append time is used.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3177 from ijuma/set-base-sequence-for-log-append-time
Without it, it's possible that the assertion is checked before the exception
is thrown in the callback.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3182 from ijuma/fix-controller-failover-flakiness
Author: Mario Molina <mmolimar@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>, Michael G. Noll <michael@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3017 from mmolimar/KAFKA-5218
Adjust "importance level" and add explanation to the docs.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#2855 from mjsax/minor-improve-streams-config-parameters
Also introduce TopicConfig.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3120 from cmccabe/KAFKA-5265
This should be less flaky as it has a higher timeout. I also increased the timeout
in a couple of other tests that had a very low (100 ms) timeouts.
The failure would manifest itself as:
```text
java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:85)
at kafka.network.BlockingChannel.readCompletely(BlockingChannel.scala:129)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:120)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:100)
at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:84)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:133)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:133)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:133)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:132)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:132)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:132)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:131)
at kafka.api.test.ProducerCompressionTest.testCompression(ProducerCompressionTest.scala:97)
```
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3178 from ijuma/producer-compression-test-flaky
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3175 from hachikuji/KAFKA-5349
It sometimes fails in Jenkins like:
```text
java.lang.AssertionError: IllegalStateException was not thrown
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at kafka.controller.ControllerFailoverTest.testHandleIllegalStateException(ControllerFailoverTest.scala:86)
```
I ran it locally 100 times with no failure.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3176 from ijuma/improve-controller-failover-assert
Return UNSUPPORTED_MESSAGE_FORMAT in handleWriteTxnMarkers when a topic is not the correct message format.
Remove any TopicPartitions that have same error from those waiting for markers
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3152 from dguy/kafka-5308
Handle` rocksdb.config.setter` being set as a class name or class
instance.
Author: Tommy Becker <tobecker@tivo.com>
Author: Tommy Becker <twbecker@gmail.com>
Reviewers: Matthias J. Sax, Damian Guy, Guozhang Wang
Closes#3155 from twbecker/KAFKA-5334
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3161 from hachikuji/KAFKA-5251
- reuse decompression buffers in consumer Fetcher
- switch lz4 input stream to operate directly on ByteBuffers
- avoids performance impact of catching exceptions when reaching the end of legacy record batches
- more tests with both compressible / incompressible data, multiple
blocks, and various other combinations to increase code coverage
- fixes bug that would cause exception instead of invalid block size
for invalid incompressible blocks
- fixes bug if incompressible flag is set on end frame block size
Overall this improves LZ4 decompression performance by up to 40x for small batches.
Most improvements are seen for batches of size 1 with messages on the order of ~100B.
We see at least 2x improvements for for batch sizes of < 10 messages, containing messages < 10kB
This patch also yields 2-4x improvements on v1 small single message batches for other compression types.
Full benchmark results can be found here
https://gist.github.com/xvrl/05132e0643513df4adf842288be86efd
Author: Xavier Léauté <xavier@confluent.io>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2967 from xvrl/kafka-5150
Also update the test to be simpler since we can use a mock event to simulate the issue
more easily (thanks Jun for the suggestion). This should fix two issues:
1. A transient test failure due to a NPE in ControllerFailoverTest.testMetadataUpdate:
```text
Caused by: java.lang.NullPointerException
at kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers(ControllerChannelManager.scala:338)
at kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:975)
at kafka.controller.ControllerFailoverTest.testMetadataUpdate(ControllerFailoverTest.scala:141)
```
The test was creating an additional thread and it does not seem like it was doing the
appropriate synchronization (perhaps this became more of an issue after we changed
the Controller to be single-threaded and changed the locking)
2. Setting `activeControllerId.set(-1)` in `triggerControllerMove` causes `Reelect` not to invoke `onControllerResignation`. Among other things, this causes an `IllegalStateException` to be thrown when `KafkaScheduler.startup` is invoked for the second time without the corresponding `shutdown`. We now simply call `onControllerResignation` as part of `triggerControllerMove`.
Finally, I included a few clean-ups:
1. No longer update the broker state in `onControllerFailover`. This is no longer needed
since we removed the `RunningAsController` state (KAFKA-3761).
2. Trivial clean-ups in KafkaController
3. Removed unused parameter in `ZkUtils.getPartitionLeaderAndIsrForTopics`
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2935 from ijuma/on-controller-resignation-if-trigger-controller-move
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3169 from rajinisivaram/MINOR-producer-metrics
Here is the sketch of this proposal:
1. When it is time to send the txn markers, only look for the leader node of the partition once instead of retrying, and if that information is not available, it means the partition is highly likely been removed since it was in the cache before. In this case, we just remove the partition from the metadata object and skip putting into the corresponding queue, and if all partitions' leader broker are non-available, complete this delayed operation to proceed to write the complete txn log entry.
2. If the leader id is unknown from the cache but the corresponding node object with the listener name is not available, it means that the leader is likely unavailable right now. Put it into a separate queue and let sender thread retry fetching its metadata again each time upon draining the queue.
One caveat of this approach is the delete-and-recreate case, and the argument is that since all the messages are deleted anyways when deleting the topic-partition, it does not matter whether the markers are on the log partitions or not.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Apurva Mehta <apurva@confluent.io>, Damian Guy <damian.guy@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#3130 from guozhangwang/K5202-handle-topic-deletion
KAFKA-4603 the command parsed error
Using "new OptionParser" might result in parse error
Change all the OptionParser constructor in Kafka into "new OptionParser(false)"
Author: xinlihua <xin.lihua1@zte.com.cn>
Author: unknown <00067310@A23338408.zte.intra>
Author: auroraxlh <xin.lihua1@zte.com.cn>
Author: xin <xin.lihua1@zte.com.cn>
Reviewers: Damian Guy, Guozhang Wang
Closes#2349 from auroraxlh/fix_OptionParser_bug
ByteBufferOutputStream improvements:
* Document pitfalls
* Improve efficiency when dealing with direct byte buffers
* Improve handling of buffer expansion
* Be consistent about using `limit` instead of `capacity`
* Add constructors that allocate the internal buffer
Other minor changes:
* Fix log warning to specify correct Kafka version
* Clean-ups
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3166 from ijuma/minor-kafka-5316-follow-ups
In the original implementation, console-consumer fails to honor `--value-deserializer` config.
Author: amethystic <huxi_2b@hotmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3100 from amethystic/KAFKA-5278
Keep track of when a transaction has begun by setting a flag, `transactionStarted` when a successfull `AddPartitionsToTxnResponse` or `AddOffsetsToTxnResponse` had been received. If an `AbortTxnRequest` about to be sent and `transactionStarted` is false, don't send the request and transition the state to `READY`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#3126 from dguy/kafka-5260
There is a Misspell in Annotations of ResetIntegrationTest.
Author: hejiefang <he.jiefang@zte.com.cn>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#3159 from hejiefang/KAFKA-5338
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3142 from hachikuji/KAFKA-5316
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3147 from vahidhashemian/minor/remove_unsed_method_parameter_simpleaclauthorizer
Autogenerate docs for the Consumer Fetcher's metrics. This is a smaller subset of the original PR https://github.com/apache/kafka/pull/1202.
CC ijuma benstopford hachikuji
Author: James Cheng <jylcheng@yahoo.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Closes#2993 from wushujames/fetcher_metrics_docs
dguy , mjsax Please review the PR and let me know your comments.
Author: umesh chaudhary <umesh9794@gmail.com>
Reviewers: Bill Bejeck, Matthias J. Sax, Guozhang Wang
Closes#3099 from umesh9794/mylocal
Add check in `KafkaApis` that the inter broker protocol version is at least `KAFKA_0_11_0_IV0`, i.e., supporting transactions
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3103 from dguy/kafka-5128
- introduces a new thread state DEAD
- ignores DEAD threads when querying
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Eno Thereska, Guozhang Wang
Closes#3140 from mjsax/kafka-5309-stores-not-queryable
The previous code did not handle this correctly if a batch was
compacted more than once.
Also add test case for duplicate check after log cleaning and
improve various comments.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3145 from hachikuji/minor-improve-base-sequence-docs
The basic idea is that exactly three collections, ie. `pendingRequests`, `newPartitionsToBeAddedToTransaction`, and `partitionsInTransaction` are accessed from the context of application threads. The first two are modified from the application threads, and the last is read from those threads.
So to make the `TransactionManager` truly thread safe, we have to ensure that all accesses to these three members are done in a synchronized block. I inspected the code, and I believe this patch puts the synchronization in all the correct places.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3132 from apurvam/KAFKA-5147-transaction-manager-synchronization-fixes
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3137 from rajinisivaram/KAFKA-5320