Currently, we skip the steps to make a replica a follower if the leader does not change, including truncating the follower log if necessary. This can cause problems if the follower has missed one or more leader updates. Change the logic to only skip the steps if the new epoch is the same or one greater than the old epoch. Tested with unit tests that verify the behavior of `Partition` and that show log truncation when the follower's log is ahead of the leader's, the follower has missed an epoch update, and the follower receives a `LeaderAndIsrRequest` making it a follower.
Reviewers: Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>
ACL updates currently get `(currentAcls, currentVersion)` for the resource from ZK and do a conditional update using `(currentAcls+newAcl, currentVersion)`. This supports concurrent atomic updates if the resource path already exists in ZK. If the path doesn't exist, we currently do a conditional createOrUpdate using `(newAcl, -1)`. But `-1` has a special meaning in ZooKeeper for update operations - it means match any version. So two brokers adding acls using `(newAcl1, -1)` and `(newAcl2, -1)` will result in one broker creating the path and setting newAcl1, while the other broker can potentially update the path with `(newAcl2, -1)`, losing newAcl1. The timing window is very small, but we have seen intermittent failures in `SimpleAclAuthorizerTest.testHighConcurrencyModificationOfResourceAcls` as a result of this window.
This commit fixes the version used for conditional updates in ZooKeeper. It also replaces the confusing `ZkVersion.NoVersion=-1` used for `set(any-version)` and `get(return not-found)` with `ZkVersion.MatchAnyVersion` for `set(any-version)` and `ZkVersion.UnknownVersion` for `get(return not-found)` to avoid the return value from `get` matching arbitrary values in `set`.
Summary:
1. Revert GroupMetadata.members to private
2. Add back a wrongly removed comment
3. In GroupMetadata.remove(), update supportedProtocols and awaitingJoinCallbackMembers, only when the remove succeeded
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Sriharsha Chintalapani <sriharsha@apache.org>
We currently do a lot of bookkeeping for timeouts which is both error-prone and distracting. This patch adds a new `Timer` class to simplify this logic and control unnecessary calls to system time. In particular, this helps with nested timeout operations. The consumer has been updated to use the new class.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Use delivery timeout instead of retries when possible and remove various TODOs associated with completion of KIP-91.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
By waiting until server1 has joined the ISR before shutting down server2
Rerun the test method many times after the code change, and there is no flakiness any more.
Author: Lucas Wang <luwang@linkedin.com>
Reviewers: Mayuresh Gharat <gharatmayuresh15@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5387 from gitlw/fixing_flacky_logrecevorytest
KAFKA-6432: Make index lookup more cache friendly
For each topic-partition, Kafka broker maintains two indices: one for message offset, one for message timestamp. By default, a new index entry is appended to each index for every 4KB messages. The lookup of the indices is a simple binary search. The indices are mmaped files, and cached by Linux page cache.
Both consumer fetch and follower fetch have to do an offset lookup, before accessing the actual message data. The simple binary search algorithm used for looking up the index is not cache friendly, and may cause page faults even on high QPS topic-partitions.
In a normal Kafka broker, all the follower fetch requests, and most consumer fetch requests should only look up the last few entries of the index. We can make the index lookup more cache friendly, by searching in the last one or two pages of the index first.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Guozhang Wang <wangguoz@gmail.com>, Ted Yu <yuzhihong@gmail.com>, Ismael Juma <github@juma.me.uk>, Sriharsha Chintalapani <sriharsha@apache.org>
This has always been an issue, but the recent upgrade to ZooKeeper
3.4.13 means it is also an issue when an unresolvable ZK
address is used, causing some tests to leak threads.
The change in behaviour in ZK 3.4.13 is that no exception is thrown
from the ZooKeeper constructor in case of an unresolvable address.
Instead, ZooKeeper tries to re-resolve the address hoping it becomes
resolvable again. We eventually throw a
`ZooKeeperClientTimeoutException`, which is similar to the case
where the the address is resolvable but ZooKeeper is not
reachable.
Reviewers: Ismael Juma <ismael@juma.me.uk>
When there are many inactive partitions in the cluster, we observed constant churn of URP in the cluster even if follower can catch up with leader's byte-in-rate because leader broker frequently moves replicas of inactive partitions out of ISR. This PR mitigates this issue by not moving replica out of ISR if follower's LEO == leader's LEO.
Author: Zhanxiang (Patrick) Huang <hzxa21@hotmail.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5412 from hzxa21/KAFKA-7152
After successful completion of KafkaProducer#close, it is possible that an application calls KafkaProducer#send. If the send is invoked for a topic for which we do not have any metadata, the producer will block until `max.block.ms` elapses - we do not expect to receive any metadata update in this case because Sender (and NetworkClient) has already exited. It is only when RecordAccumulator#append is invoked that we notice that the producer has already been closed and throw an exception. If `max.block.ms` is set to Long.MaxValue (or a sufficiently high value in general), the producer could block awaiting metadata indefinitely.
This patch makes sure `Metadata#awaitUpdate` periodically checks if the network client has been closed, and if so bails out as soon as possible.
And change KafkaController to use the newly introduced method.
Also remove redundant `InZk` postfixes from `registerBrokerInZk` and
`updateBrokerInfoInZk`.
As `checkedEphemeralCreate` is not used outside of `KafkaZkClient`
any longer, reduce its visibility.
ControllerIntegrationTest already covers this functionality well, it validates the
refactor.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Currently, if a consumer group never commits offsets, ConsumerGroupCommand will not include it in the describe output even if the member assignment is valid. Instead, the tool should be able to describe the group information showing empty current_offset and LAG.
Reviewers: Sriharsha Chintalapani <sriharsha@apache.org>, Vahid Hashemian <vahidhashemian@us.ibm.com>, Jason Gustafson <jason@confluent.io>
- Replace adminZkClient.createOrUpdateTopicPartitionAssignmentPathInZK calls with TestUtils.createTopic wherever applicable
- Replace adminZkClient.createTopic calls with TestUtils.createTopic wherever applicable
- Move non-deprecated tests to other test classes and deprecate AdminTest.scala
- Remove duplicate tests between AdminTest and AdminZkClientTest
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Dong Lin <lindong28@gmail.com>
Closes#5303 from omkreddy/topiccreate
This includes a fix for ZOOKEEPER-2184 (Zookeeper Client
should re-resolve hosts when connection attempts fail), which
fixes KAFKA-4041.
Updated a couple of tests as unresolvable addresses are now
retried until the connection timeout. Cleaned up tests a little.
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
This setting allows specifying a chroot so we documented it.
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Co-authored-by: Katherine Farmer <kfarme3@uk.ibm.com>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Jason Gustafson <jason@confluent.io>
If inter.broker.protocol.version is 2.0-IV1 or newer. Also fixed ListOffsetRequest
so that v2 is used, if applicable.
Added a unit test which verifies that we use the latest version of the various
requests by default. Included a few minor tweaks to make testing easier.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
ZooKeeper listener for change notifications should be created before loading the ACL cache to avoid timing window if acls are modified when broker is starting up.
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@confluent.io>
Use KafkaPrincipal objects for authorization in `SimpleAclAuthorizer` so that comparison with super.users and ACLs instantiated from Strings work. Previously, it would compare two different classes `KafkaPrincipal` and the custom class, which would always return false because of the implementation of `KafkaPrincipal#equals`.
Do not update LogReadResult after it is initially populated when returning fetches immediately (i.e. without hitting the purgatory). This was done in #3954 as an optimization so that the followers get the potentially updated high watermark. However, since many things can happen (like deleting old segments and advancing log start offset) between initial creation of LogReadResult and the update, we can hit issues like log start offset in fetch response being higher than the last offset in fetched records.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
This patch removes the need to build up producer state when the log is using V0 / V1 message format where we did not have idempotent and transactional producers yet.
Also fixes a small issue where we incorrectly reported the offset index corrupt if the last offset in the index is equal to the base offset of the segment.
They rely on finalizers (before Java 11), which create
unnecessary GC load. The alternatives are as easy to
use and don't have this issue.
Also use FileChannel directly instead of retrieving
it from RandomAccessFile whenever possible
since the indirection is unnecessary.
Finally, add a few try/finally blocks.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Rajini Sivaram <rajinisivaram@googlemail.com>
NoSuchElementException will be thrown if ReplicaAlterDirThread replaces the current replica with future replica right before the request handler thread executes `futureReplica.log.get.dir.getParent` in the ReplicaManager.alterReplicaLogDirs(). The solution is to grab the partition lock when request handler thread attempts to check the destination log directory of the future replica.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#5081 from lindong28/KAFKA-6949
Currently the check whether a user as a super user in `SimpleAclAuthorizer` is performed only after all other ACLs have been evaluated. Since all requests from a super user are granted we don't really need to apply the ACLs. This patch returns true if the user is a super user before checking ACLs, thus bypassing the needless evaluation effort.
Reviewers: Andy Coates <big-andy-coates@users.noreply.github.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Jason Gustafson <jason@confluent.io>
Wait for produce to fail before updating listener to avoid send succeeding after the listener update. Also use different topics in tests with connection failures where one is expected to fail and the other is expected to succeed.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Leftover threads doing network I/O can interfere with subsequent tests. Add missing shutdown in tests and include admin client in the check for leftover threads.
Reviewers: Anna Povzner <anna@confluent.io>, Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy O <manikumar.reddy@gmail.com>
This patch contains the improved offset expiration semantics proposed in KIP-211. Committed offsets will not be expired as long as a group is active. Once all members have left the group, then offsets will be expired after the timeout configured by `offsets.retention.minutes`. Note that the optimization for early expiration of unsubscribed topics will be implemented in a separate patch.
This is an unexpected exception so `UnknownServerException`
is thrown back to the client.
This is a minimal change to make the behaviour match `ZkUtils`.
This is better, but one could argue that it's not perfect. A more
sophisticated approach can be tackled separately.
Added a concurrent test that fails without this change.
Reviewers: Jun Rao <junrao@gmail.com>
For metadata request version 6 and above, use a different error code to indicate missing listener on leader broker to enable diagnosis of listener configuration issues.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Also:
- Remove exceptions in `kafka.common` that are no longer used.
- Keep `kafka.common.KafkaException` as it's still used by `ZkUtils`,
`kafka.admin.AdminClient` and `kafka.security.auth` classes and
we would like to maintain compatibility for now.
- Add deprecated annotation to `kafka.admin.AdminClient`. The scaladoc
stated that the class is deprecated, but the annotation was missing.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
A broker with multiple log dirs will die on startup if dir.getCanonicalPath() throws
IOException for one of the log dirs. We should mark such log directory as offline
instead and the broker should start if there is a healthy log dir.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch fixes the following issues in the log splitting logic added to address KAFKA-6264:
1. We were not handling the case when all messages in the segment overflowed the index. In this case, there is only one resulting segment following the split.
2. There was an off-by-one error in the recovery logic when completing a swap operation which caused an unintended segment deletion.
Additionally, this patch factors out of `splitOverflowedSegment` a method to write to a segment using from with an instance of `FileRecords`. This allows for future reuse and isolated testing.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
- Removed Scala consumers (`SimpleConsumer` and `ZooKeeperConsumerConnector`)
and their tests.
- Removed Scala request/response/message classes.
- Removed any mention of new consumer or new producer in the code
with the exception of MirrorMaker where the new.consumer option was
never deprecated so we have to keep it for now. The non-code
documentation has not been updated either, that will be done
separately.
- Removed a number of tools that only made sense in the context
of the Scala consumers (see upgrade notes).
- Updated some tools that worked with both Scala and Java consumers
so that they only support the latter (see upgrade notes).
- Removed `BaseConsumer` and related classes apart from `BaseRecord`
which is used in `MirrorMakerMessageHandler`. The latter is a pluggable
interface so effectively public API.
- Removed `ZkUtils` methods that were only used by the old consumers.
- Removed `ZkUtils.registerBroker` and `ZKCheckedEphemeral` since
the broker now uses the methods in `KafkaZkClient` and no-one else
should be using that method.
- Updated system tests so that they don't use the Scala consumers except
for multi-version tests.
- Updated LogDirFailureTest so that the consumer offsets topic would
continue to be available after all the failures. This was necessary for it
to work with the Java consumer.
- Some multi-version system tests had not been updated to include
recently released Kafka versions, fixed it.
- Updated findBugs and checkstyle configs not to refer to deleted
classes and packages.
Reviewers: Dong Lin <lindong28@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Support configuration of dynamic broker configs in ZooKeeper before starting brokers using ConfigCommand. This will allow password configs to be encrypted and stored in ZooKeeper, without requiring clear passwords in server.properties to bootstrap the broker first.
Reviewers: Jason Gustafson <jason@confluent.io>