If the only producer state left in the log is a transaction marker, then we do not know the next expected sequence number. This can happen if there is a call to DeleteRecords which arrives prior to the writing of the marker. Currently we raise an OutOfOrderSequence error when this happens, but this is treated as a fatal error by the producer. Raising UnknownProducerId instead allows the producer to check for truncation using the last acknowledged sequence number and reset if possible.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
* Updated TestLogCleaning tool to use Java consumer and rename as LogCompactionTester.
* Enabled the log cleaner in every system test.
* Removed configs from "kafka.properties" with default values and `socket.receive.buffer.bytes`
as the override did not seem necessary.
* Updated `kafka.py` logic to handle duplicates between `kafka.properties` and `server_prop_overrides`.
* Updated Gradle build so that classes from `kafka-clients` test jar can be used in
system tests.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Ismael Juma <ismael@juma.me.uk>
Author: radai-rosenblatt <radai.rosenblatt@gmail.com>
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <github@juma.me.uk>, Dong Lin <lindong28@gmail.com>
Closes#5221 from radai-rosenblatt/metadata-adventures
Pre-initialization of clients in IntegrationTestHarness is a cause of significant confusion and has resulted in a bunch of inconsistent client creation patterns. This patch requires test cases to create needed clients explicitly and makes the creation logic more consistent.
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Increase record size and use compression for downconversion metrics test to ensure that conversion time is above 1ms to avoid transient test failures.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Since ConsumerFetcherThread has been removed, we have
an opportunity to simplify the *FetcherThread classes. This
is an unambitious first step which removes the now unneeded
`PartitionData` indirection.
Currently, we skip the steps to make a replica a follower if the leader does not change, including truncating the follower log if necessary. This can cause problems if the follower has missed one or more leader updates. Change the logic to only skip the steps if the new epoch is the same or one greater than the old epoch. Tested with unit tests that verify the behavior of `Partition` and that show log truncation when the follower's log is ahead of the leader's, the follower has missed an epoch update, and the follower receives a `LeaderAndIsrRequest` making it a follower.
Reviewers: Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>
ACL updates currently get `(currentAcls, currentVersion)` for the resource from ZK and do a conditional update using `(currentAcls+newAcl, currentVersion)`. This supports concurrent atomic updates if the resource path already exists in ZK. If the path doesn't exist, we currently do a conditional createOrUpdate using `(newAcl, -1)`. But `-1` has a special meaning in ZooKeeper for update operations - it means match any version. So two brokers adding acls using `(newAcl1, -1)` and `(newAcl2, -1)` will result in one broker creating the path and setting newAcl1, while the other broker can potentially update the path with `(newAcl2, -1)`, losing newAcl1. The timing window is very small, but we have seen intermittent failures in `SimpleAclAuthorizerTest.testHighConcurrencyModificationOfResourceAcls` as a result of this window.
This commit fixes the version used for conditional updates in ZooKeeper. It also replaces the confusing `ZkVersion.NoVersion=-1` used for `set(any-version)` and `get(return not-found)` with `ZkVersion.MatchAnyVersion` for `set(any-version)` and `ZkVersion.UnknownVersion` for `get(return not-found)` to avoid the return value from `get` matching arbitrary values in `set`.
Summary:
1. Revert GroupMetadata.members to private
2. Add back a wrongly removed comment
3. In GroupMetadata.remove(), update supportedProtocols and awaitingJoinCallbackMembers, only when the remove succeeded
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Sriharsha Chintalapani <sriharsha@apache.org>
We currently do a lot of bookkeeping for timeouts which is both error-prone and distracting. This patch adds a new `Timer` class to simplify this logic and control unnecessary calls to system time. In particular, this helps with nested timeout operations. The consumer has been updated to use the new class.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Use delivery timeout instead of retries when possible and remove various TODOs associated with completion of KIP-91.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
By waiting until server1 has joined the ISR before shutting down server2
Rerun the test method many times after the code change, and there is no flakiness any more.
Author: Lucas Wang <luwang@linkedin.com>
Reviewers: Mayuresh Gharat <gharatmayuresh15@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5387 from gitlw/fixing_flacky_logrecevorytest
KAFKA-6432: Make index lookup more cache friendly
For each topic-partition, Kafka broker maintains two indices: one for message offset, one for message timestamp. By default, a new index entry is appended to each index for every 4KB messages. The lookup of the indices is a simple binary search. The indices are mmaped files, and cached by Linux page cache.
Both consumer fetch and follower fetch have to do an offset lookup, before accessing the actual message data. The simple binary search algorithm used for looking up the index is not cache friendly, and may cause page faults even on high QPS topic-partitions.
In a normal Kafka broker, all the follower fetch requests, and most consumer fetch requests should only look up the last few entries of the index. We can make the index lookup more cache friendly, by searching in the last one or two pages of the index first.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Guozhang Wang <wangguoz@gmail.com>, Ted Yu <yuzhihong@gmail.com>, Ismael Juma <github@juma.me.uk>, Sriharsha Chintalapani <sriharsha@apache.org>
This has always been an issue, but the recent upgrade to ZooKeeper
3.4.13 means it is also an issue when an unresolvable ZK
address is used, causing some tests to leak threads.
The change in behaviour in ZK 3.4.13 is that no exception is thrown
from the ZooKeeper constructor in case of an unresolvable address.
Instead, ZooKeeper tries to re-resolve the address hoping it becomes
resolvable again. We eventually throw a
`ZooKeeperClientTimeoutException`, which is similar to the case
where the the address is resolvable but ZooKeeper is not
reachable.
Reviewers: Ismael Juma <ismael@juma.me.uk>
When there are many inactive partitions in the cluster, we observed constant churn of URP in the cluster even if follower can catch up with leader's byte-in-rate because leader broker frequently moves replicas of inactive partitions out of ISR. This PR mitigates this issue by not moving replica out of ISR if follower's LEO == leader's LEO.
Author: Zhanxiang (Patrick) Huang <hzxa21@hotmail.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5412 from hzxa21/KAFKA-7152
After successful completion of KafkaProducer#close, it is possible that an application calls KafkaProducer#send. If the send is invoked for a topic for which we do not have any metadata, the producer will block until `max.block.ms` elapses - we do not expect to receive any metadata update in this case because Sender (and NetworkClient) has already exited. It is only when RecordAccumulator#append is invoked that we notice that the producer has already been closed and throw an exception. If `max.block.ms` is set to Long.MaxValue (or a sufficiently high value in general), the producer could block awaiting metadata indefinitely.
This patch makes sure `Metadata#awaitUpdate` periodically checks if the network client has been closed, and if so bails out as soon as possible.
And change KafkaController to use the newly introduced method.
Also remove redundant `InZk` postfixes from `registerBrokerInZk` and
`updateBrokerInfoInZk`.
As `checkedEphemeralCreate` is not used outside of `KafkaZkClient`
any longer, reduce its visibility.
ControllerIntegrationTest already covers this functionality well, it validates the
refactor.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Currently, if a consumer group never commits offsets, ConsumerGroupCommand will not include it in the describe output even if the member assignment is valid. Instead, the tool should be able to describe the group information showing empty current_offset and LAG.
Reviewers: Sriharsha Chintalapani <sriharsha@apache.org>, Vahid Hashemian <vahidhashemian@us.ibm.com>, Jason Gustafson <jason@confluent.io>
- Replace adminZkClient.createOrUpdateTopicPartitionAssignmentPathInZK calls with TestUtils.createTopic wherever applicable
- Replace adminZkClient.createTopic calls with TestUtils.createTopic wherever applicable
- Move non-deprecated tests to other test classes and deprecate AdminTest.scala
- Remove duplicate tests between AdminTest and AdminZkClientTest
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Dong Lin <lindong28@gmail.com>
Closes#5303 from omkreddy/topiccreate
This includes a fix for ZOOKEEPER-2184 (Zookeeper Client
should re-resolve hosts when connection attempts fail), which
fixes KAFKA-4041.
Updated a couple of tests as unresolvable addresses are now
retried until the connection timeout. Cleaned up tests a little.
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
This setting allows specifying a chroot so we documented it.
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Co-authored-by: Katherine Farmer <kfarme3@uk.ibm.com>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Jason Gustafson <jason@confluent.io>
If inter.broker.protocol.version is 2.0-IV1 or newer. Also fixed ListOffsetRequest
so that v2 is used, if applicable.
Added a unit test which verifies that we use the latest version of the various
requests by default. Included a few minor tweaks to make testing easier.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
ZooKeeper listener for change notifications should be created before loading the ACL cache to avoid timing window if acls are modified when broker is starting up.
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@confluent.io>
Use KafkaPrincipal objects for authorization in `SimpleAclAuthorizer` so that comparison with super.users and ACLs instantiated from Strings work. Previously, it would compare two different classes `KafkaPrincipal` and the custom class, which would always return false because of the implementation of `KafkaPrincipal#equals`.
Do not update LogReadResult after it is initially populated when returning fetches immediately (i.e. without hitting the purgatory). This was done in #3954 as an optimization so that the followers get the potentially updated high watermark. However, since many things can happen (like deleting old segments and advancing log start offset) between initial creation of LogReadResult and the update, we can hit issues like log start offset in fetch response being higher than the last offset in fetched records.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
This patch removes the need to build up producer state when the log is using V0 / V1 message format where we did not have idempotent and transactional producers yet.
Also fixes a small issue where we incorrectly reported the offset index corrupt if the last offset in the index is equal to the base offset of the segment.
They rely on finalizers (before Java 11), which create
unnecessary GC load. The alternatives are as easy to
use and don't have this issue.
Also use FileChannel directly instead of retrieving
it from RandomAccessFile whenever possible
since the indirection is unnecessary.
Finally, add a few try/finally blocks.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Rajini Sivaram <rajinisivaram@googlemail.com>
NoSuchElementException will be thrown if ReplicaAlterDirThread replaces the current replica with future replica right before the request handler thread executes `futureReplica.log.get.dir.getParent` in the ReplicaManager.alterReplicaLogDirs(). The solution is to grab the partition lock when request handler thread attempts to check the destination log directory of the future replica.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#5081 from lindong28/KAFKA-6949