This PR implements KIP-78:Cluster Identifiers [(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id#KIP-78:ClusterId-Overview) and includes the following changes:
1. Changes to broker code
- generate cluster id and store it in Zookeeper
- update protocol to add cluster id to metadata request and response
- add ClusterResourceListener interface, ClusterResource class and ClusterMetadataListeners utility class
- send ClusterResource events to the metric reporters
2. Changes to client code
- update Cluster and Metadata code to support cluster id
- update clients for sending ClusterResource events to interceptors, (de)serializers and metric reporters
3. Integration tests for interceptors, (de)serializers and metric reporters for clients and for protocol changes and metric reporters for broker.
4. System tests for upgrading from previous versions.
Author: Sumit Arrawatia <sumit.arrawatia@gmail.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1830 from arrawatia/kip-78
This applies to Replication Quotas
based on KIP-73 [(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas) originally motivated by KAFKA-1464.
System Tests Run: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/544/
**This first PR demonstrates the approach**.
**_Overview of Change_**
The guts of this change are relatively small. Throttling occurs on both leader and follower sides. A single class tracks the throttled throughput in and out of each broker (**_ReplicationQuotaManager_**).
On the follower side, the Follower Throttled Rate is calculated as fetch responses arrive. Then, before the next fetch request is sent, we check to see if the quota is violated, removing throttled partitions from the request if it is. This is all encapsulated in a few lines of code in the **_ReplicaFetcherThread_**. There is existing code to handle temporal back off, if the request ends up being empty.
On the leader side it's a little more complex. When a fetch request arrives in the leader, it is built, partition by partition, in **_ReplicaManager.readFromLocalLog_**. As we put each partition into the fetch response, we check if the total size fits in the current quota. If the quota is exceeded, the partition will not be added to the fetch response. Importantly, we don't increase the quota at this point, we just check to see if the bytes will fit.
Now, if there aren't enough bytes to send the response immediately, which is common if we're catching up and throttled, then the request will be put in purgatory. I've added some simple code to **_DelayedFetch_** to handle throttled partitions (throttled partitions are checked against the quota, rather than the messages available in the log).
When the delayed fetch completes, and exits purgatory, _**ReplicaManager.readFromLocalLog**_ will be called again. This is why _**ReplicaManager.readFromLocalLog**_ does not actually increase the quota, it just checks whether enough bytes are available for a partition.
Finally, when there are enough bytes to be sent, or the delayed fetch times out, the response will be sent. Before it is sent the throttled-outbound-rate is increased, based on the size of throttled partitions being sent. This is at the end of _**KafkaApis.handleFetchRequest**_, exactly where client quotas are recorded.
There is an acceptance test which asserts the whole throttling process stabilises on the desired value. This covers a number of use cases including many-to-many replication. See **_ReplicationQuotaTest_**.
Note:
It should be noted that this protocol can over-request. The request is built, based on the quota at time t1 (_ReplicaManager.readFromLocalLog_). The bytes in the response are recorded at time t2 (end of _KafkaApis.handleFetchRequest_), where t2 > t1. For this reason I originally included an OverRequestedRate as a JMX metric, but testing has not seen revealed any obvious issue. Over-requesting is quickly compensated by subsequent requests, stabilising close to the quota value.
_**Main stuff left to do:**_
- The fetch size is currently unbounded. This will be addressed in KIP-74, but we need to ensure this ensures requests don’t go beyond the throttle window.
- There are two failures showing up in the system tests on this branch: StreamsSmokeTest.test_streams (which looks like it fails regularly) and OffsetValidationTest.test_broker_rolling_bounce (which I need to look into)
_**Stuff left to do that could be deferred:**_
- Add the extra metrics specified in the KIP.
- There are no system tests.
- There is no validation for the cluster size / throttle combination that could lead to ISR dropouts
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>
Closes#1776 from benstopford/rep-quotas-v2
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Joel Koshy <jjkoshy.w@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>
Closes#1851 from lindong28/KAFKA-4158
Now uses LogSegment.largestTimestamp to determine age of segment's messages.
Author: Eric Wasserman <eric.wasserman@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1794 from ewasserman/feat-1981
change console producer default acks to 1, update acks docs. Also added the -1 config to the acks docs since that question comes up often. ijuma and vahidhashemian, does this look reasonable to you?
Author: Dustin Cote <dustin@confluent.io>
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1795 from cotedm/KAFKA-3129
- use AdminTool to check for active consumer group
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1767 from mjsax/kafka-4058-trunk
Get channel remote address before calling ```channel.close```
Author: Tao Xiao <xiaotao183@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1826 from xiaotao183/KAFKA-4129
Typically this error condition is caused by topic-level configuration issues, so it is useful to include which topic partition was reset for operator use when debugging the root cause.
Author: Dana Powers <dana.powers@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1801 from dpkp/log_topic_partition_reset_dirty_offset
Author: Jiangjie Qin <becket.qin@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#1809 from becketqin/KAFKA-4099
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1807 from hachikuji/KAFKA-4103
set print-data-log option when offset-decoder is set. hachikuji we had talked about this one before, does this change look ok to you?
Author: Dustin Cote <dustin@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1797 from cotedm/KAFKA-4062
They don't require access to `ZkClient`.
Also include a few obvious clean-ups in `ZKUtils`:
* Remove redundant rethrows and braces
* Use named arguments for booleans
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Gwen Shapira <cshapi@gmail.com>
Closes#1775 from ijuma/move-some-zk-utils-methods-to-companion-object
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1786 from hachikuji/hotfix-ctrlchannelmgr-verbose-logging
Change cleanup.policy to accept a comma separated list of valid policies.
Updated LogCleaner.CleanerThread to also run deletion for any topics configured with compact,delete.
Ensure Log.deleteSegments only runs when delete is enabled.
Additional Integration and unit tests to cover new option
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Grant Henke <granthenke@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#1742 from dguy/kafka-4015
handled by adding a catch all for any unhandled exception. Because the jira specifically mentions the InvalidReplicationFactor exception, a test was added for that specific case.
Author: Grant Henke <granthenke@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#1739 from granthenke/create-errors
Use System.nanoseconds instead of System.currentTimeMillis in broker timer tasks to cope with changes to wall-clock time.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Gwen Shapira
Closes#1768 from rajinisivaram/KAFKA-4051
As discussed in https://issues.apache.org/jira/browse/KAFKA-3894, this PR makes the log cleaner do a "partial" clean on a segment, whereby it builds a partial offset map up to a particular offset in a segment. Once cleaning resumes again, we will continue from the next dirty offset, which can now be located in the middle of a segment.
Prior to this PR, segments with overly numerous keys could crash the log cleaner thread, as it was required that the log cleaner had to fit at least a single segment in the offset map.
Author: Tom Crayford <tcrayford@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1725 from tcrayford/dont_crash_log_cleaner_thread_if_segment_overflows_buffer
junrao Could you take a look when get a chance? Thanks.
Author: Jiangjie Qin <becket.qin@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1769 from becketqin/KAFKA-3163-follow-up
Add an optional configuration for the SecureRandom PRNG implementation, with the default behavior being the same (use the default implementation in the JDK/JRE).
Author: Todd Palino <Todd Palino>
Reviewers: Grant Henke <granthenke@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Joel Koshy <jjkoshy@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#1747 from toddpalino/trunk
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Closes#1627 from hachikuji/KAFKA-3888
Author: Manikumar Reddy O <manikumar.reddy@gmail.com>
Reviewers: Sriharsha Chintalapani <harsha@hortonworks.com>, Jun Rao <junrao@gmail.com>
Closes#1723 from omkreddy/KAFKA-4035
Author: Grant Henke <granthenke@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#1730 from granthenke/test-delete
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1720 from hachikuji/KAFKA-4034
Author: Grant Henke <granthenke@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <me@ewencp.org>, Jun Rao <junrao@gmail.com>
Closes#1616 from granthenke/delete-wire-new
A small fix to check null before using the reference
Author: Som Sahu <sosahu@microsoft.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1718 from soumyajit-sahu/nullCheckForDirectBufferCleaner
It previously hardcoded it.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Grant Henke <granthenke@gmail.com>, Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1613 from ijuma/kafka-3954-consumer-internal-topics-from-broker
Also include a few minor clean-ups.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar reddy O <manikumar.reddy@gmail.com>, Grant Henke <granthenke@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1623 from ijuma/fix-zk-inconsistent-security-check
ijuma
Author: dan norwood <norwood@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1713 from norwood/add-security-protocol-option-for-fetch
ijuma said that it would make sense to split out this work from KAFKA-3234, since KAFKA-3234 had both a mechanical change (generating docs) as well as a change requiring discussion (deprecating/renaming config options).
jjkoshy, I hope you don't mind that I took over this work. It's been 3 months since the last activity on KAFKA-3234, so I thought it would be okay to take over.
This work is essentially is the first 5-6 commits from Joel's https://github.com/apache/kafka/pull/907. However, since I'm not very experienced with git, I didn't do a direct merge/rebase, but instead largely hand-merged it. I did some minor cleanup. All credit goes to Joel, all blame goes to me. :)
For reference, I attached the auto-generated configuration.html file (as a PDF, because github won't let me attache html).
[configuration.pdf](https://github.com/apache/kafka/files/323901/configuration.pdf)
This is my first time writing Scala, so let me know if there are any changes needed.
I don't know who is the right person to review this. ijuma, can you help me redirect this to the appropriate person? Thanks.
Author: James Cheng <jylcheng@yahoo.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Joel Koshy <jjkoshy@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1527 from wushujames/generate_topic_docs
1. The IllegalStateException is actually thrown from testCloseWithZeroTimeoutFromSenderThread() due to a bug. We call producer.close() in the callback. Once the first callback is called, producing records in the callback will hit the IllegalStateException. This only pollutes the output, but doesn't fail the test. I fixed this by only calling producer.send() in the first callback.
2. It's not clear which test throws TimeoutException and it's not reproducible locally. One thing is that the error message in TimeoutException is mis-leading since the timeout is not necessarily due to metadata. Improved this by making the error message in TimeoutException clearer.
3. It's not clear what actually failed testSendNonCompressedMessageWithCreateTime(). One thing I found is that since we set the linger time to MAX_LONG and are sending small messages, those produced messages won't be drained until we call producer.close(10000L, TimeUnit.MILLISECONDS). Normally, 10 secs should be enough for the records to be sent. My only hypothesis is that since SSL is more expensive, occasionally, 10 secs is still not enough. So, I bumped up the timeout from 10 secs to 20 secs.
Author: Jun Rao <junrao@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1703 from junrao/kafka-3875
The `acks` config that is provided to console producer with `request-required-acks` comes with `all`, `-1`, `0`, `1` as valid options (`all` and `-1` being interchangeable). Currently, the console producer expects an integer for this input and that makes `all` to become an invalid input. This PR fixes this issue by changing the input type to String.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Manikumar reddy O <manikumar.reddy@gmail.com>, Grant Henke <granthenke@gmail.com, Ismael Juma <ismael@juma.me.uk>
Closes#1618 from vahidhashemian/KAFKA-3945
moved streams application reset tool from tools to core
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1685 from mjsax/moveResetTool
(cherry picked from commit f2405a73ea)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
When resetting the first dirty offset to the log start offset, we currently log an ERROR which makes users think the log cleaner has a problem and maybe has exited. We should log a WARN instead to avoid alarming the users.
Author: Dustin Cote <dustin@confluent.io>
Reviewers: Gwen Shapira
Closes#1691 from cotedm/minorlogcleanerlogging
…ceful shutdown
The patch is pretty simple and the justification is explained in https://issues.apache.org/jira/browse/KAFKA-3924
I could not find Andrew Olson, who seems to be the contributor of this part of the code, in github so I am not sure whom I should ask to review the patch.
the contribution is my original work and that i license the work to the project under the project's open source license.
Author: Maysam Yabandeh <myabandeh@dropbox.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Andrew Olson <andrew.olson@cerner.com>, Jun Rao <junrao@gmail.com>
Closes#1634 from maysamyabandeh/KAFKA-3924
Also:
* Introduce a blocking variant to be used by `FileMessageSet.append`
* Add tests
* Minor clean-ups
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1669 from ijuma/kafka-3996-byte-buffer-message-set-write-to-non-blocking
They are now consistent with `waitUntilTrue`.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1665 from ijuma/increase-default-wait-until-time
Avoids leaking native memory and hence crashing brokers on bootup due to
running out of memory.
Seeeing as `messageFormat > 0` always reads the full compressed message
set and is the default going forwards, we can use that behaviour to
always close the compressor when calling `deepIterator`
Author: Tom Crayford <tcrayford@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1660 from tcrayford/dont_leak_native_memory_round_2
Add additional information to Acceptor debug message upon connection acceptance
Author: rnpridgeon <ryan.n.pridgeon@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1648 from rnpridgeon/trunk
5 seconds is probably enough when running tests locally, but
doesn't seem to be so for Jenkins when it is overloaded.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1589 from ijuma/increase-default-wait-time-for-wait-until-true
Currently, logs involving PartitionState are not very helpful.
```
Broker 449 cached leader info org.apache.kafka.common.requests.UpdateMetadataRequest$PartitionState3285d64a for partition <topic>-<partition> in response to UpdateMetadata request sent by controller 356 epoch 138 with correlation id 0
TRACE state.change.logger: Broker 449 received LeaderAndIsr request org.apache.kafka.common.requests.LeaderAndIsrRequest$PartitionState66d6a8eb correlation id 3 from controller 356 epoch 138 for partition [<topic>,<partition>]
```
Author: Ashish Singh <asingh@cloudera.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1609 from SinghAsDev/partitionState
…broker id
This is because the id passed into the MetadataCache is the value from the config before the real broker id is generated.
Author: Grant Henke <granthenke@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1632 from granthenke/metadata-id