Author: Jiangjie Qin <becket.qin@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#1852 from becketqin/KAFKA-4148
The contribution is my original work and I license the work to the project under the project's open source license.
guozhangwang
Author: Elias Levy <fearsome.lucidity@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax, Guozhang Wang
Closes#1846 from eliaslevy/KAFKA-4153
When the consumer is not subscribed to any topic or, in the case of manual assignment, is not assigned any partition, calling `poll()` should raise an exception.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#1839 from vahidhashemian/KAFKA-4135
Include a few clean-ups (also in producer section), mention deprecation plans and reorder so that the new consumer documentation is before the old consumers.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#1880 from ijuma/remove-beta-from-new-consumer-documentation
We added the `reporters` parameter as part of KIP-74: Cluster Id.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#1878 from ijuma/add-kafka-startable-overload-for-compat
Cleaner to just check once for optional & default value from the `convertToConnect()` function.
It also helps address an issue with conversions for logical type schemas that have default values and null as the included value. That test case is _probably_ not an issue in practice, since when using the `JsonConverter` to serialize a missing field with a default value, it will serialize the default value for the field. But in the face of JSON data streaming in from a topic being [generous on input, strict on output](http://tedwise.com/2009/05/27/generous-on-input-strict-on-output) seems best.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#1872 from shikhar/kafka-4183
Details in KIP-55.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1847 from rajinisivaram/KAFKA-4079
During rebalance operations the Cluster object gets set to Cluster.empty(). This can result in NPEs when doing certain operation on StreamsMetadataState. This should throw a StreamsException if the Cluster is empty as it is not yet (re-)initialized
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#1845 from dguy/streams-meta-hotfix
standby tasks should be assigned per consumer not per process
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#1862 from dguy/kafka-4175
The logic in `verifyCanGetByKey` was incorrect. It was
```
windowState.size() < keys.length &&
countState.size() < keys.length &&
System.currentTimeMillis() < timeout
```
but should be:
```
(windowState.size() < keys.length || countState.size() < keys.length) && System.currentTimeMillis() < timeout
```
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1879 from dguy/minor-fix-test
Changed the lowerBound argument reference in the summary comment of the translateOffset method to match the actual argument name: startingFilePosition.
Author: Luke Zaparaniuk <luke.zaparaniuk@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1876 from lukezaparaniuk/patch-1
Trying to build the source and publish it to internal Maven repo, I ran into an issue that I explain in the mailing list discussion here https://www.mail-archive.com/devkafka.apache.org/msg56359.html.
The commit here updates the README.md to make a note that the GRADLE_USER_HOME environment variable plays a role in deciding which file to add the maven configs to.
Author: Jaikiran Pai <jaikiran.pai@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1837 from jaikiran/readme-update-grade-user-home
…t.test_replica_lags
Author: Grant Henke <granthenke@gmail.com>
Reviewers: Ashish Singh <asingh@cloudera.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1849 from granthenke/replica-verification-fix
Implementation and tests for secure quotas at <user> and <user, client-id> levels as described in KIP-55. Also adds dynamic default quotas for <client-id>, <user> and <user-client-id>. For each client connection, the most specific quota matching the connection is used, with user quota taking precedence over client-id quota.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1753 from rajinisivaram/KAFKA-3492
This PR implements KIP-78:Cluster Identifiers [(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id#KIP-78:ClusterId-Overview) and includes the following changes:
1. Changes to broker code
- generate cluster id and store it in Zookeeper
- update protocol to add cluster id to metadata request and response
- add ClusterResourceListener interface, ClusterResource class and ClusterMetadataListeners utility class
- send ClusterResource events to the metric reporters
2. Changes to client code
- update Cluster and Metadata code to support cluster id
- update clients for sending ClusterResource events to interceptors, (de)serializers and metric reporters
3. Integration tests for interceptors, (de)serializers and metric reporters for clients and for protocol changes and metric reporters for broker.
4. System tests for upgrading from previous versions.
Author: Sumit Arrawatia <sumit.arrawatia@gmail.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1830 from arrawatia/kip-78
The `JsonConverter` class has `LogicalTypeConverter` implementations for Date, Time, Timestamp, and Decimal, but these implementations fail when the input literal value (deserialized from the message) is null.
Test cases were added to check for these cases, and these failed before the `LogicalTypeConverter` implementations were fixed to consider whether the schema has a default value or is optional, similarly to how the `JsonToConnectTypeConverter` implementations do this. Once the fixes were made, the new tests pass.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Shikhar Bhushan <shikhar@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#1867 from rhauch/kafka-4183
This is joint work between dguy and enothereska. The work implements KIP-63. Overview of main changes:
- New byte-based cache that acts as a buffer for any persistent store and for forwarding changes downstream.
- Forwarding record path changes: previously a record in a task completed end-to-end. Now it may be buffered in a processor node while other records complete in the task.
- Cleanup and state stores and decoupling of cache from state store and forwarding.
- More than 80 new unit and integration tests.
Author: Damian Guy <damian.guy@gmail.com>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#1752 from enothereska/KAFKA-3776-poc
This applies to Replication Quotas
based on KIP-73 [(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas) originally motivated by KAFKA-1464.
System Tests Run: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/544/
**This first PR demonstrates the approach**.
**_Overview of Change_**
The guts of this change are relatively small. Throttling occurs on both leader and follower sides. A single class tracks the throttled throughput in and out of each broker (**_ReplicationQuotaManager_**).
On the follower side, the Follower Throttled Rate is calculated as fetch responses arrive. Then, before the next fetch request is sent, we check to see if the quota is violated, removing throttled partitions from the request if it is. This is all encapsulated in a few lines of code in the **_ReplicaFetcherThread_**. There is existing code to handle temporal back off, if the request ends up being empty.
On the leader side it's a little more complex. When a fetch request arrives in the leader, it is built, partition by partition, in **_ReplicaManager.readFromLocalLog_**. As we put each partition into the fetch response, we check if the total size fits in the current quota. If the quota is exceeded, the partition will not be added to the fetch response. Importantly, we don't increase the quota at this point, we just check to see if the bytes will fit.
Now, if there aren't enough bytes to send the response immediately, which is common if we're catching up and throttled, then the request will be put in purgatory. I've added some simple code to **_DelayedFetch_** to handle throttled partitions (throttled partitions are checked against the quota, rather than the messages available in the log).
When the delayed fetch completes, and exits purgatory, _**ReplicaManager.readFromLocalLog**_ will be called again. This is why _**ReplicaManager.readFromLocalLog**_ does not actually increase the quota, it just checks whether enough bytes are available for a partition.
Finally, when there are enough bytes to be sent, or the delayed fetch times out, the response will be sent. Before it is sent the throttled-outbound-rate is increased, based on the size of throttled partitions being sent. This is at the end of _**KafkaApis.handleFetchRequest**_, exactly where client quotas are recorded.
There is an acceptance test which asserts the whole throttling process stabilises on the desired value. This covers a number of use cases including many-to-many replication. See **_ReplicationQuotaTest_**.
Note:
It should be noted that this protocol can over-request. The request is built, based on the quota at time t1 (_ReplicaManager.readFromLocalLog_). The bytes in the response are recorded at time t2 (end of _KafkaApis.handleFetchRequest_), where t2 > t1. For this reason I originally included an OverRequestedRate as a JMX metric, but testing has not seen revealed any obvious issue. Over-requesting is quickly compensated by subsequent requests, stabilising close to the quota value.
_**Main stuff left to do:**_
- The fetch size is currently unbounded. This will be addressed in KIP-74, but we need to ensure this ensures requests don’t go beyond the throttle window.
- There are two failures showing up in the system tests on this branch: StreamsSmokeTest.test_streams (which looks like it fails regularly) and OffsetValidationTest.test_broker_rolling_bounce (which I need to look into)
_**Stuff left to do that could be deferred:**_
- Add the extra metrics specified in the KIP.
- There are no system tests.
- There is no validation for the cluster size / throttle combination that could lead to ISR dropouts
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>
Closes#1776 from benstopford/rep-quotas-v2
Fix for bug outlined in KAFKA-4131
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Damian Guy, Guozhang Wang
Closes#1843 from bbejeck/KAFKA-4131_mulitple_regex_consumers_cause_npe
Set the NUM_STREAM_THREADS_CONFIG = 1 in SmokeTestClient as we get locking issues when we have NUM_STREAM_THREADS_CONFIG > 1 and we have Standby Tasks, i.e., replicas. This is because the Standby Tasks can be assigned to the same KafkaStreams instance as the active task, hence the directory is locked
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#1861 from dguy/fix-smoketest
Followed the same naming pattern as the producer sender thread.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson
Closes#1854 from ijuma/heartbeat-thread-name
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Joel Koshy <jjkoshy.w@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>
Closes#1851 from lindong28/KAFKA-4158
Here is the patch on github ijuma.
Acquiring the consumer lock (the single thread access controls) requires that the consumer be open. I changed the closed variable to be volatile so that another thread's writes will visible to the reading thread.
Additionally, there was an additional check if the consumer was closed after the lock was acquired. This check is no longer necessary.
This is my original work and I license it to the project under the project's open source license.
Author: Tim Brooks <tim@uncontended.net>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#1637 from tbrooks8/KAFKA-2311
A couple of the tests may transiently fail in QueryableStateIntegrationTest as they are not catching InvalidStateStoreException. This exception is expected during rebalance.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#1840 from dguy/minor-fix
Now uses LogSegment.largestTimestamp to determine age of segment's messages.
Author: Eric Wasserman <eric.wasserman@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1794 from ewasserman/feat-1981
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Dan Norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#1821 from hachikuji/KAFKA-3807
This PR changes topic subscription semantics so a change in subscription does not immediately cause a rebalance.
Instead, the next poll or the next scheduled metadata refresh will update the assigned partitions.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson
Closes#1726 from vahidhashemian/KAFKA-4033
The verification in verifyGreaterOrEqual was incorrect. It was failing when a new key was found.
Set the TimeWindow to a large value so all windowed results fall in a single window
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1833 from dguy/minor-test-fix
changelogs of window stores now configure cleanup.policy=compact,delete with retention.ms set to window maintainMs + StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG
StoreChangeLogger produces messages with context.timestamp().
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#1792 from dguy/kafka-3595
Mark the store as open after the DB has been restored from the changelog.
Only add the store to the map in ProcessorStateManager post restore.
Make RocksDBWindowStore.Segment override openDB(..) as it needs to mark the Segment as open.
Throw InvalidStateStoreException if any stores in a KafkaStreams instance are not available.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#1824 from dguy/kafka-4123
Update documentation for Kafka Connect distributed’s config.storage.topic, offset.storage.topic, and status.storage.topic configuration values to indicate that all three should refer to compacted topics.
Author: Mathieu Fenniak <mathieu.fenniak@replicon.com>
Reviewers: Jason Gustafson
Closes#1832 from mfenniak/kafka-connect-topic-docs
change console producer default acks to 1, update acks docs. Also added the -1 config to the acks docs since that question comes up often. ijuma and vahidhashemian, does this look reasonable to you?
Author: Dustin Cote <dustin@confluent.io>
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1795 from cotedm/KAFKA-3129
- use AdminTool to check for active consumer group
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1767 from mjsax/kafka-4058-trunk
The commit here changes the log level of a log message from WARN to DEBUG.
As noted in the mail discussion here https://www.mail-archive.com/devkafka.apache.org/msg56035.html,
in a pretty straightforward/typical and valid setup, the broker logs get
flooded with the following message:
[2016-09-02 08:07:13,773] WARN SSL peer is not authenticated, returning ANONYMOUS instead (org.apache.kafka.common.network.SslTransportLayer)
Author: Jaikiran Pai <jaikiran.pai@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1825 from jaikiran/ssl-log-level
Also add tests and make `Crc32.update` perform the same argument checks as
`java.util.zip.CRC32`.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Gwen Shapira
Closes#1672 from ijuma/record-is-valid-should-be-more-robust
Get channel remote address before calling ```channel.close```
Author: Tao Xiao <xiaotao183@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1826 from xiaotao183/KAFKA-4129
Typically this error condition is caused by topic-level configuration issues, so it is useful to include which topic partition was reset for operator use when debugging the root cause.
Author: Dana Powers <dana.powers@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1801 from dpkp/log_topic_partition_reset_dirty_offset