This PR now justs removes the check in TaskPairs.hasNewPair that was causing the task assignment issue.
This was done as we need to further refine task assignment strategy and this approach needs to include the statefulness of tasks and is best done in one pass vs taking a "patchy" approach.
Updated current tests and ran locally
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
An untimely wakeup can cause ConsumerCoordinator.onJoinComplete to throw a WakeupException before completion. On the next poll(), it will be retried, but this leads to an underflow error because the buffer containing the assignment data will already have been advanced. The solution is to duplicate the buffer passed to onJoinComplete.
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
ZooKeeper client from version 3.4.13 doesn't handle connections to localhost very well. If ZooKeeper is started on 127.0.0.1 on a machine that has both ipv4 and ipv6 and a client is created using localhost rather than the IP address in the connection string, ZooKeeper client attempts to connect to ipv4 or ipv6 randomly with a fixed one second backoff if connection fails. Use 127.0.0.1 instead of localhost in streams tests to avoid intermittent test failures due to ZK client connection timeouts if ipv6 is chosen in consecutive address selections. Also add note to upgrade docs for 2.0.0.
Reviewers: Ismael Juma <github@juma.me.uk>, Matthias J. Sax <matthias@confluent.io>
This has always been an issue, but the recent upgrade to ZooKeeper
3.4.13 means it is also an issue when an unresolvable ZK
address is used, causing some tests to leak threads.
The change in behaviour in ZK 3.4.13 is that no exception is thrown
from the ZooKeeper constructor in case of an unresolvable address.
Instead, ZooKeeper tries to re-resolve the address hoping it becomes
resolvable again. We eventually throw a
`ZooKeeperClientTimeoutException`, which is similar to the case
where the the address is resolvable but ZooKeeper is not
reachable.
Reviewers: Ismael Juma <ismael@juma.me.uk>
When there are many inactive partitions in the cluster, we observed constant churn of URP in the cluster even if follower can catch up with leader's byte-in-rate because leader broker frequently moves replicas of inactive partitions out of ISR. This PR mitigates this issue by not moving replica out of ISR if follower's LEO == leader's LEO.
Author: Zhanxiang (Patrick) Huang <hzxa21@hotmail.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5412 from hzxa21/KAFKA-7152
After successful completion of KafkaProducer#close, it is possible that an application calls KafkaProducer#send. If the send is invoked for a topic for which we do not have any metadata, the producer will block until `max.block.ms` elapses - we do not expect to receive any metadata update in this case because Sender (and NetworkClient) has already exited. It is only when RecordAccumulator#append is invoked that we notice that the producer has already been closed and throw an exception. If `max.block.ms` is set to Long.MaxValue (or a sufficiently high value in general), the producer could block awaiting metadata indefinitely.
This patch makes sure `Metadata#awaitUpdate` periodically checks if the network client has been closed, and if so bails out as soon as possible.
And change KafkaController to use the newly introduced method.
Also remove redundant `InZk` postfixes from `registerBrokerInZk` and
`updateBrokerInfoInZk`.
As `checkedEphemeralCreate` is not used outside of `KafkaZkClient`
any longer, reduce its visibility.
ControllerIntegrationTest already covers this functionality well, it validates the
refactor.
Reviewers: Ismael Juma <ismael@juma.me.uk>
The log line says `ms`, but the actual value could represent a
different time unit depending on what the user provided.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Updated the 2.0 document for changed quota behaviors.
Author: Jon Lee <jonlee@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Dong Lin <lindong28@gmail.com>
Closes#5384 from jonlee2/KAFKA-7177
A recent change from `new FileOutputStream` to `Files.newOutputStream` missed the `CREATE` flag (which is necessary in addition to `APPEND`).
Reviewers: Ismael Juma <ismael@juma.me.uk>
In system tests, it is useful to have the thread dumps if a broker cannot be stopped using SIGTERM.
Reviewers: Xavier Léauté <xl+github@xvrl.net>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Currently, if a consumer group never commits offsets, ConsumerGroupCommand will not include it in the describe output even if the member assignment is valid. Instead, the tool should be able to describe the group information showing empty current_offset and LAG.
Reviewers: Sriharsha Chintalapani <sriharsha@apache.org>, Vahid Hashemian <vahidhashemian@us.ibm.com>, Jason Gustafson <jason@confluent.io>
SslTransportLayer currently closes the SSL engine and logs a warning if close_notify message canot be sent because the remote end closed its connection. This tends to fill up broker logs, especially when using clients which close connections immediately. Since this log entry is not very useful anyway, it would be better to log at debug level.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
1. At the beginning of assign, we first check that all the non-repartition source topics are included in the metadata. If not, we log an error at the leader and set an error in the Assignment userData bytes, indicating that leader cannot complete assignment and the error code would indicate the root cause of it.
2. Upon receiving the assignment, if the error is not NONE the streams will shutdown itself with a log entry re-stating the root cause interpreted from the error code.
Author: tedyu <yuzhihong@gmail.com>
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Closes#5322 from tedyu/trunk
The SASL/OAUTHBEARER client response as currently implemented in OAuthBearerSaslClient sends the valid gs2-header "n,," but then sends the "auth" key and value immediately after it.
This does not conform to the specification because there is no %x01 after the gs2-header, no %x01 after the auth value, and no terminating %x01. Fixed this and the parsing of the client response in
OAuthBearerSaslServer, which currently allows the malformed text. Also updated to accept and ignore unknown properties as required by the spec.
Reviewers: Stanislav Kozlovski <familyguyuser192@windowslive.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
1. Remove MinTimestampTracker and its TimestampTracker interface.
2. In RecordQueue, keep track of the head record (deserialized) while put the rest raw bytes records in the fifo queue, the head record as well as the partition timestamp will be updated accordingly.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
- Replace adminZkClient.createOrUpdateTopicPartitionAssignmentPathInZK calls with TestUtils.createTopic wherever applicable
- Replace adminZkClient.createTopic calls with TestUtils.createTopic wherever applicable
- Move non-deprecated tests to other test classes and deprecate AdminTest.scala
- Remove duplicate tests between AdminTest and AdminZkClientTest
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Dong Lin <lindong28@gmail.com>
Closes#5303 from omkreddy/topiccreate
SSL `close_notify` from broker connection close was processed as a handshake failure in clients while unwrapping the message if a handshake is in progress. Updated to handle this as a retriable IOException rather than a non-retriable SslAuthenticationException to avoid authentication exceptions in clients during rolling restart of brokers.
Reviewers: Ismael Juma <ismael@juma.me.uk>
The default server.properties file now contains the log.dirs setting and not log.dir anymore.
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Co-authored-by: Katherine Farmer <kfarme3@uk.ibm.com>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Sriharsha Chintalapani <sriharsha@apache.org>
…egal char and generates InvalidTopicException
If config parameter max.block.ms config parameter is set to a non-zero value,
KafkaProducer.send() blocks for the max.block.ms time if topic name has illegal
char or is invalid.
Wrote a unit test that verifies the appropriate exception is returned when
performing a get on the returned future by KafkaProducer.send().
Author: Ahmed Al Mehdi <aalmehdi@aalmehdi-ld1.linkedin.biz>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Joel Koshy <jjkoshy@gmail.com>, Manikumar Reddy O <manikumar.reddy@gmail.com>
Closes#5247 from ahmedha/KAFKA-5098
This PR uses bulk loading for recovering RocksDBWindowStore, same as RocksDBStore.
Reviewers: Boyang Chen <bchen11@outlook.com>, Shawn Nguyen <shnguyen@pinterest.com>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This includes a fix for ZOOKEEPER-2184 (Zookeeper Client
should re-resolve hosts when connection attempts fail), which
fixes KAFKA-4041.
Updated a couple of tests as unresolvable addresses are now
retried until the connection timeout. Cleaned up tests a little.
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
This setting allows specifying a chroot so we documented it.
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Co-authored-by: Katherine Farmer <kfarme3@uk.ibm.com>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Jason Gustafson <jason@confluent.io>
Add some additional size validation to prevent overflows when using `FileRecords`.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>
If inter.broker.protocol.version is 2.0-IV1 or newer. Also fixed ListOffsetRequest
so that v2 is used, if applicable.
Added a unit test which verifies that we use the latest version of the various
requests by default. Included a few minor tweaks to make testing easier.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>