- fixed consumer group dead condition
- disabled state store cache
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2056 from mjsax/KAFKA-4058-instableResetToolTest
This PR makes a couple of enhancements to the `--describe` option of `ConsumerGroupCommand`:
1. Listing members with no assigned partitions.
2. Showing the member id along with the owner of each partition (owner is supposed to be the logical application id and all members in the same group are supposed to set the same owner).
3. Printing a warning indicating whether ZooKeeper based or new consumer API based information is being reported.
It also adds unit tests to verify the added functionality.
Note: The third request on the corresponding JIRA (listing active offsets for empty groups of new consumers) is not implemented as part of this PR, and has been moved to its own JIRA (KAFKA-3853).
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#1336 from vahidhashemian/KAFKA-3144
Author: Alexey Ozeritsky <aozeritsky@yandex-team.ru>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2023 from resetius/AbstractFetcherManager-shutdown-speedup
Author: Ishita Mandhan <imandha@us.ibm.com>
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Jason Gustafson <jason@confluent.io>
Closes#2013 from imandhan/KAFKA-refactor
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2019 from hachikuji/KAFKA-4298
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#1995 from kkonstantine/KAFKA-4254-Update-producers-metadata-before-failing-on-non-existent-partition
Remove topics under /admin/delete_topics path in zk if deleteTopic is disabled. The topic should never be enqueued for deletion.
Author: MayureshGharat <gharatmayuresh15@gmail.com>
Reviewers: Joel Koshy <jjkoshy.w@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>, Jun Rao <junrao@gmail.com>
Closes#846 from MayureshGharat/kafka-3175
Test expects all records to be published successfully, which cannot be guaranteed with acks=0 since failures are not retried.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ben Stopford <benstopford@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1991 from rajinisivaram/KAFKA-4265
Author: Jiangjie Qin <becket.qin@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#1993 from becketqin/KAFKA-4274
We recently switched `SystemTimer` to use `hiResClockMs` (based on `nanoTime`), but we
were still using `System.currentTimeMillis` in the test. That would sometimes mean
that we would measure elapsed time as lower than expected.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
Closes#1890 from ijuma/fix-test-request-satisfaction-transient-failure
To prevent test from completing without throttling before config change takes effect, produce more messages in the test.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ben Stopford <benstopford@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1982 from rajinisivaram/KAFKA-4262
Grow read and write buffers of cleaner up to the maximum message size of the log being cleaned if the topic has larger max message size than the default config of the broker.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#1758 from rajinisivaram/KAFKA-4019
Reopening of https://github.com/apache/kafka/pull/1428
Author: Edoardo Comar <ecomar@uk.ibm.com>
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Grant Henke <granthenke@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#1908 from edoardocomar/KAFKA-3396
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1932 from benstopford/KAFKA-4225-over-KAFKA-4216
Splits the throttled replica configuration (the list of which replicas should be throttled for each topic) into two. One for the leader throttle, one for the follower throttle.
So:
quota.replication.throttled.replicas
=>
quota.leader.replication.throttled.replicas & quota.follower.replication.throttled.replicas
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1906 from benstopford/KAFKA-4216-seperate-leader-and-follower-throttled-replica-lists
Author: Jiangjie Qin <becket.qin@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#1897 from becketqin/KAFKA-4194
Run quota tests which expect throttling only until the first produce/consume request is throttled.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1902 from rajinisivaram/KAFKA-4209
Terminate topic purgatory thread in AdminManager during server shutdown to avoid threads being left around in unit tests.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1927 from rajinisivaram/KAFKA-4227
This small PR pulls ThrottledReplicationRateLimit out of KafkaConfig and puts it in a class that defines Dynamic Configs. Client configs are also placed in this class and validation added.
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1864 from benstopford/KAFKA-4177
Simple jira which alters two things:
1. kafka-reassign-partitions --verify prints Throttle was removed regardless of whether a throttle was applied. It should only print this if the value was actually changed.
2. --verify should exception if the —throttle argument. (check generate too)
To test this I extracted all validation logic into a separate method and added a test which covers the majority of combinations. The validation logic was retained as is, other than implementing (2) and adding validation to the --broker-list option which you can currently apply to any of hte main actions (where it is ignored). Requirement 1 was tested manually (as it's just println).
Testing:
- Build passes locally.
- System test reassign_partitions_test.py also passes.
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1896 from benstopford/KAFKA-4200
This patch adds proper warning message and necessary doc updates for updating the default partition assignment strategy of Mirror Maker from range to round robin. The actual switch would occur as part of a major release cycle (to be scheduled).
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#1499 from vahidhashemian/KAFKA-3831
There is a corner case bug, where during partition reassignment, if the
controller and a broker receiving a new replica are bounced at the same
time, the partition reassignment is failed.
The cause of this bug is a block of code in the KafkaController which
fails the reassignment if the aliveNewReplicas != newReplicas, ie. if
some of the new replicas are offline at the time a controller fails
over.
The fix is to have the controller listen for ISR change events even for
new replicas which are not alive when the controller boots up. Once the
said replicas come online, they will be in the ISR set, and the new
controller will detect this, and then mark the reassignment as
successful.
Interestingly, the block of code in question was introduced in
KAFKA-990, where a concern about this exact scenario was raised :)
This bug was revealed in the system tests in https://github.com/apache/kafka/pull/1904.
The relevant tests will be enabled in either this or a followup PR when PR-1904 is merged.
Thanks to junrao identifying the issue and providing the patch.
Author: Apurva Mehta <apurva.1618@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1910 from apurvam/KAFKA-4214
Author: Arun Mahadevan <aiyer@hortonworks.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#1376 from arunmahadevan/cons-consumer-fix
Technically this does not strictly adhere to RFC-952 however it is valid for domain names, urls and uris so we should loosen the requirements a tad.
Author: Ryan Pridgeon <ryan.n.pridgeon@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1856 from rnpridgeon/KAFKA-3719
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Closes#1859 from hachikuji/KAFKA-3590
We had a number of failures recently due to these timeouts being too low. It's a particular problem if multiple forks are used while running the tests.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#1889 from ijuma/increase-zk-timeout-in-tests
Build is unstable, so it's hard to validate this change. Of the various builds up until 11am BST the test ran twice and passed twice.
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1873 from benstopford/KAFKA-4184
Author: Jiangjie Qin <becket.qin@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#1852 from becketqin/KAFKA-4148
We added the `reporters` parameter as part of KIP-74: Cluster Id.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#1878 from ijuma/add-kafka-startable-overload-for-compat
Changed the lowerBound argument reference in the summary comment of the translateOffset method to match the actual argument name: startingFilePosition.
Author: Luke Zaparaniuk <luke.zaparaniuk@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1876 from lukezaparaniuk/patch-1
Implementation and tests for secure quotas at <user> and <user, client-id> levels as described in KIP-55. Also adds dynamic default quotas for <client-id>, <user> and <user-client-id>. For each client connection, the most specific quota matching the connection is used, with user quota taking precedence over client-id quota.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1753 from rajinisivaram/KAFKA-3492
This PR implements KIP-78:Cluster Identifiers [(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id#KIP-78:ClusterId-Overview) and includes the following changes:
1. Changes to broker code
- generate cluster id and store it in Zookeeper
- update protocol to add cluster id to metadata request and response
- add ClusterResourceListener interface, ClusterResource class and ClusterMetadataListeners utility class
- send ClusterResource events to the metric reporters
2. Changes to client code
- update Cluster and Metadata code to support cluster id
- update clients for sending ClusterResource events to interceptors, (de)serializers and metric reporters
3. Integration tests for interceptors, (de)serializers and metric reporters for clients and for protocol changes and metric reporters for broker.
4. System tests for upgrading from previous versions.
Author: Sumit Arrawatia <sumit.arrawatia@gmail.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1830 from arrawatia/kip-78
This applies to Replication Quotas
based on KIP-73 [(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas) originally motivated by KAFKA-1464.
System Tests Run: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/544/
**This first PR demonstrates the approach**.
**_Overview of Change_**
The guts of this change are relatively small. Throttling occurs on both leader and follower sides. A single class tracks the throttled throughput in and out of each broker (**_ReplicationQuotaManager_**).
On the follower side, the Follower Throttled Rate is calculated as fetch responses arrive. Then, before the next fetch request is sent, we check to see if the quota is violated, removing throttled partitions from the request if it is. This is all encapsulated in a few lines of code in the **_ReplicaFetcherThread_**. There is existing code to handle temporal back off, if the request ends up being empty.
On the leader side it's a little more complex. When a fetch request arrives in the leader, it is built, partition by partition, in **_ReplicaManager.readFromLocalLog_**. As we put each partition into the fetch response, we check if the total size fits in the current quota. If the quota is exceeded, the partition will not be added to the fetch response. Importantly, we don't increase the quota at this point, we just check to see if the bytes will fit.
Now, if there aren't enough bytes to send the response immediately, which is common if we're catching up and throttled, then the request will be put in purgatory. I've added some simple code to **_DelayedFetch_** to handle throttled partitions (throttled partitions are checked against the quota, rather than the messages available in the log).
When the delayed fetch completes, and exits purgatory, _**ReplicaManager.readFromLocalLog**_ will be called again. This is why _**ReplicaManager.readFromLocalLog**_ does not actually increase the quota, it just checks whether enough bytes are available for a partition.
Finally, when there are enough bytes to be sent, or the delayed fetch times out, the response will be sent. Before it is sent the throttled-outbound-rate is increased, based on the size of throttled partitions being sent. This is at the end of _**KafkaApis.handleFetchRequest**_, exactly where client quotas are recorded.
There is an acceptance test which asserts the whole throttling process stabilises on the desired value. This covers a number of use cases including many-to-many replication. See **_ReplicationQuotaTest_**.
Note:
It should be noted that this protocol can over-request. The request is built, based on the quota at time t1 (_ReplicaManager.readFromLocalLog_). The bytes in the response are recorded at time t2 (end of _KafkaApis.handleFetchRequest_), where t2 > t1. For this reason I originally included an OverRequestedRate as a JMX metric, but testing has not seen revealed any obvious issue. Over-requesting is quickly compensated by subsequent requests, stabilising close to the quota value.
_**Main stuff left to do:**_
- The fetch size is currently unbounded. This will be addressed in KIP-74, but we need to ensure this ensures requests don’t go beyond the throttle window.
- There are two failures showing up in the system tests on this branch: StreamsSmokeTest.test_streams (which looks like it fails regularly) and OffsetValidationTest.test_broker_rolling_bounce (which I need to look into)
_**Stuff left to do that could be deferred:**_
- Add the extra metrics specified in the KIP.
- There are no system tests.
- There is no validation for the cluster size / throttle combination that could lead to ISR dropouts
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>
Closes#1776 from benstopford/rep-quotas-v2
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Joel Koshy <jjkoshy.w@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>
Closes#1851 from lindong28/KAFKA-4158
Now uses LogSegment.largestTimestamp to determine age of segment's messages.
Author: Eric Wasserman <eric.wasserman@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1794 from ewasserman/feat-1981