#5253 broke standby restoration for windowed stores.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
1. extend isWindowStore to consider session store as well.
2. extend the existing unit test accordingly.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
See also KIP-319.
Replace number-of-segments parameters with segment-interval-ms parameters in various places. The latter was always the parameter that several components needed, and we accidentally supplied the former because it was the one available.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
ZooKeeper listener for change notifications should be created before loading the ACL cache to avoid timing window if acls are modified when broker is starting up.
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@confluent.io>
KAFKA-6986:Export Admin Client metrics through Stream Threads
We already exported producer and consumer metrics through KafkaStreams class:
#4998
It makes sense to also export the Admin client metrics.
I didn't add a separate unittest case for this. Let me know if it's needed.
This is my first contribution, feel free to point out any mistakes that I did.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
There are cases where the broker would return an unresolve-able address (e.g broker inside a docker network while client is outside) and the client would not log any information as to why it is timing out, since the default log level does not print DEBUG messages.
Changing this log level will enable easier troubleshooting in such circumstances. This change does not change the logs shown on transient failures like a broker failure.
Use KafkaPrincipal objects for authorization in `SimpleAclAuthorizer` so that comparison with super.users and ACLs instantiated from Strings work. Previously, it would compare two different classes `KafkaPrincipal` and the custom class, which would always return false because of the implementation of `KafkaPrincipal#equals`.
We need to ensure that the last poll time is always updated when the user call poll(Duration). This patch fixes a bug in the new KIP-266 timeout behavior which would cause this to be skipped if the coordinator could not be found while the consumer was in an active group.
Note that I've also fixed some type inconsistencies for various timeouts.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Currently create time is interpreted as integer.
This PR makes the tool accept long values.
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
1. Rename metrics scope of rocksDB window and session stores; also modify the store metrics accordingly with guidance on its correlations to metricsScope.
2. Add the missing total metrics for per-thread, per-task, per-node and per-store sensors.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Before KIP-266, consumer.poll(0) would call updateAssignmentMetadataIfNeeded(Long.MAX_VALUE), which makes sure that the rebalance is definitely completed, i.e. both onPartitionRevoked and onPartitionAssigned called within this poll(0). After KIP-266, however, it is possible that only onPartitionRevoked will be called if timeout is elapsed. And hence we need to double check that state is still PARTITIONS_ASSIGNED after the consumer.poll(duration) call.
Reviewers: Ted Yu <yuzhihong@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Previously, the connection-creation metric only accounted for opened connections from the broker. This change extends it to account for received connections.
Do not update LogReadResult after it is initially populated when returning fetches immediately (i.e. without hitting the purgatory). This was done in #3954 as an optimization so that the followers get the potentially updated high watermark. However, since many things can happen (like deleting old segments and advancing log start offset) between initial creation of LogReadResult and the update, we can hit issues like log start offset in fetch response being higher than the last offset in fetched records.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Added two additional test cases to quota_test.py, which run between brokers and clients with different throttling behaviors. More specifically,
1. clients with new throttling behavior (i.e., post-KIP-219) and brokers with old throttling behavior (i.e., pre-KIP-219)
2. clients with old throttling behavior and brokers with new throttling behavior
Author: Jon Lee <jonlee@linkedin.com>
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5294 from jonlee2/kafka-6944
This patch removes the need to build up producer state when the log is using V0 / V1 message format where we did not have idempotent and transactional producers yet.
Also fixes a small issue where we incorrectly reported the offset index corrupt if the last offset in the index is equal to the base offset of the segment.
They rely on finalizers (before Java 11), which create
unnecessary GC load. The alternatives are as easy to
use and don't have this issue.
Also use FileChannel directly instead of retrieving
it from RandomAccessFile whenever possible
since the indirection is unnecessary.
Finally, add a few try/finally blocks.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Rajini Sivaram <rajinisivaram@googlemail.com>
NoSuchElementException will be thrown if ReplicaAlterDirThread replaces the current replica with future replica right before the request handler thread executes `futureReplica.log.get.dir.getParent` in the ReplicaManager.alterReplicaLogDirs(). The solution is to grab the partition lock when request handler thread attempts to check the destination log directory of the future replica.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#5081 from lindong28/KAFKA-6949
Currently the check whether a user as a super user in `SimpleAclAuthorizer` is performed only after all other ACLs have been evaluated. Since all requests from a super user are granted we don't really need to apply the ACLs. This patch returns true if the user is a super user before checking ACLs, thus bypassing the needless evaluation effort.
Reviewers: Andy Coates <big-andy-coates@users.noreply.github.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Jason Gustafson <jason@confluent.io>
Wait for produce to fail before updating listener to avoid send succeeding after the listener update. Also use different topics in tests with connection failures where one is expected to fail and the other is expected to succeed.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Enforce window retention times strictly:
* records for windows that are expired get dropped
* queries for timestamps old enough to be expired immediately answered with null
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Leftover threads doing network I/O can interfere with subsequent tests. Add missing shutdown in tests and include admin client in the check for leftover threads.
Reviewers: Anna Povzner <anna@confluent.io>, Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy O <manikumar.reddy@gmail.com>
This patch contains the improved offset expiration semantics proposed in KIP-211. Committed offsets will not be expired as long as a group is active. Once all members have left the group, then offsets will be expired after the timeout configured by `offsets.retention.minutes`. Note that the optimization for early expiration of unsubscribed topics will be implemented in a separate patch.
This is an unexpected exception so `UnknownServerException`
is thrown back to the client.
This is a minimal change to make the behaviour match `ZkUtils`.
This is better, but one could argue that it's not perfect. A more
sophisticated approach can be tackled separately.
Added a concurrent test that fails without this change.
Reviewers: Jun Rao <junrao@gmail.com>
Maven Central dropped support for all versions
but TLS 1.2, so dependency resolution fails if
Gradle builds run with JDK 7. 2.0 and trunk
requre JDK 8, but every other version is
affected.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Significant refactor of Segments to use stream-time as the basis of segment expiration.
Previously Segments assumed that the current record time was representative of stream time.
In the event of a "future" event (one whose record time is greater than the stream time), this
would inappropriately drop live segments. Now, Segments will provision the new segment
to house the future event and drop old segments only after they expire.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
For metadata request version 6 and above, use a different error code to indicate missing listener on leader broker to enable diagnosis of listener configuration issues.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Also:
- Remove exceptions in `kafka.common` that are no longer used.
- Keep `kafka.common.KafkaException` as it's still used by `ZkUtils`,
`kafka.admin.AdminClient` and `kafka.security.auth` classes and
we would like to maintain compatibility for now.
- Add deprecated annotation to `kafka.admin.AdminClient`. The scaladoc
stated that the class is deprecated, but the annotation was missing.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
A broker with multiple log dirs will die on startup if dir.getCanonicalPath() throws
IOException for one of the log dirs. We should mark such log directory as offline
instead and the broker should start if there is a healthy log dir.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch fixes the following issues in the log splitting logic added to address KAFKA-6264:
1. We were not handling the case when all messages in the segment overflowed the index. In this case, there is only one resulting segment following the split.
2. There was an off-by-one error in the recovery logic when completing a swap operation which caused an unintended segment deletion.
Additionally, this patch factors out of `splitOverflowedSegment` a method to write to a segment using from with an instance of `FileRecords`. This allows for future reuse and isolated testing.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>