Metadata image classes such as MetadataImage, ClusterImage, FeaturesImage, and so forth contain
numerous sub-images. This PR adds a structured way of traversing those sub-images. This is useful
for the metadata shell, and also for implementing toString functions.
In both cases, the previous solution was suboptimal. The metadata shell was previously implemented
in an ad-hoc way by mutating text-based tree nodes when records were replayed. This was difficult
to keep in sync with changes to the record types (for example, we forgot to do this for SCRAM). It
was also pretty low-level, being done at a level below that of the image classes. For toString, it
was difficult to keep the implementations consistent previously, and also support both redacted and
non-redacted output.
The metadata shell directory was getting crowded since we never had submodules for it. This PR
creates glob/, command/, node/, and state/ directories to keep things better organized.
This change includes
- Recognize the fetch requests with out of range local log offsets
- Add fetch implementation for the data lying in the range of [logStartOffset, localLogStartOffset]
- Add a new purgatory for async remote read requests which are served through a specific thread pool
We have an extended version of remote fetch that can fetch from multiple remote partitions in parallel, which we will raise as a followup PR.
A few tests for the newly introduced changes are added in this PR. There are some tests available for these scenarios in 2.8.x, refactoring with the trunk changes, will add them in followup PRs.
Other contributors:
Kamal Chandraprakash <kchandraprakash@uber.com> - Further improvements and adding a few tests
Luke Chen <showuon@gmail.com> - Added a few test cases for these changes.
PS: This functionality is pulled out from internal branches with other functionalities related to the feature in 2.8.x. The reason for not pulling all the changes as it makes the PR huge, and burdensome to review and it also needs other metrics, minor enhancements(including perf), and minor changes done for tests. So, we will try to have followup PRs to cover all those.
Reviewers: Jun Rao <junrao@gmail.com>, Alexandre Dupriez <alexandre.dupriez@gmail.com>, Divij Vaidya <diviv@amazon.com>, Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>
Rename handleSnapshot to handleLoadSnapshot to make it explicit that it is handling snapshot load,
not generation.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
This test was reported as flaky on CI server.
When connecting to a multi-homed machine using RMI, the wrong address may be returned by the RMI registry to the client, causing the connection to the RMI server to timeout.
This change explicitly set the hostname returned to the the clients in the remote stub object.
Reviewers: Luke Chen <showuon@gmail.com>, vamossagar12 <sagarmeansocean@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, hudeqi <16120374@bjtu.edu.cn>, Christo Lolov <christololov@gmail.com>
Despite the documentation, these collections are limited to 10.
Reviewers: Josep Prat <josep.prat@aiven.io>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
All dependency upgrades in the PR are minor upgrades with backward compatible changes. Note that no major version for dependencies have been changed to make it a low risk change. No code changes are required for any of these dependencies.
Reviewers: Luke Chen <showuon@gmail.com>
This patch introduces `GenericGroup` which rewrite the `GroupMetadata` in Java. The `GenericGroup` is basically a group using the current rebalance protocol in the new group coordinator.
Reviewers: Ritika Reddy <rreddy@confluent.io>, Christo Lolov <lolovc@amazon.com>, David Jacot <djacot@confluent.io>
1. add hint in switch item "BROKER_LOGGER" in ConfigResourceExistenceChecker, otherwise, it will be classified as default break and deleted directly. I don’t know if adding hint is better than deleting directly.
2. delete some unused variables and methods.
3. add the "@test" mark to a method in unit test that is forgotten.
Reviewers: dengziming <dengziming1993@gmail.com>
The KRaft controller return empty finalized features in `ApiVersionResponse`, the brokers are not infected by this, so this problem doesn't have any impact currently, but it's worth fixing it to avoid unexpected problems.
And there is a bunch of of confusing methods in `ApiVersionResponse` which are only used in test code, I moved them to TestUtils to make the code more clear, and force everyone to pass in the correct parameters instead of the default zero parameters, for example, empty supported features and empty finalized features.
Reviewers: Luke Chen <showuon@gmail.com>
The VersionedKeyValueToBytesStoreAdapter#isOpen API accidentally returns the value of inner.persistent() when it should be returning inner.isOpen()
Reviewers: Matthias J. Sax <mjsax@apache.org>, Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>, Victoria Xia <victoria.xia@confluent.io>
This patch adds the RangeAssignor on the server for KIP-848. This range assignor is very different from the old client side implementation. We added functionality to make these assignments sticky while also inheriting crucial properties of the range assignor such as facilitating joins and distributing partitions of a topic somewhat equally amongst its subscribers.
Reviewers: Philip Nee <philipnee@gmail.com>, Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>
`KafkaBasedLog` is a widely used utility class that provides a generic implementation of a shared, compacted log of records in a Kafka topic. It isn't in Connect's public API, but has been used outside of Connect and we try to preserve backward compatibility whenever possible. KAFKA-14455 modified the two overloaded void `KafkaBasedLog::send` methods to return a `Future`. While this change is source compatible, it isn't binary compatible. We can restore backward compatibility simply by renaming the new Future returning send methods, and reinstating the older send methods to delegate to the newer methods.
This refactoring changes no functionality other than restoring the older methods.
Reviewers: Randall Hauch <rhauch@gmail.com>
This patch adds the `CurrentAssignmentBuilder` class which encapsulates the reconciliation engine of the consumer group protocol. Given the current state of a member and a desired or target assignment state, the state machine takes the necessary steps to converge the member to its desired state.
Reviewers: Ritika Reddy <rreddy@confluent.io>, Calvin Liu <caliu@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
#13557 introduced a utils method to close executors silently. This PR leverages that method to close executors in connect runtime. There was duplicate code while closing the executors which isn't the case with this PR.
Note that there are a few more executors used in Connect runtime but their close methods don't follow this pattern of shutdown, await and shutdown. Some of them have some logic like executor like Worker, so not changing at such places.
---------
Co-authored-by: Sagar Rao <sagarrao@Sagars-MacBook-Pro.local>
Reviewers: Daniel Urban <durban@cloudera.com>, Yash Mayya <yash.mayya@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Motivation
PlaintextAdminIntegrationTest fails in a flaky manner with the follow trace (e.g. in this build):
org.opentest4j.AssertionFailedError: expected: <false> but was: <true>
at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at org.junit.jupiter.api.AssertFalse.failNotFalse(AssertFalse.java:63)
at org.junit.jupiter.api.AssertFalse.assertFalse(AssertFalse.java:36)
at org.junit.jupiter.api.AssertFalse.assertFalse(AssertFalse.java:31)
at org.junit.jupiter.api.Assertions.assertFalse(Assertions.java:228)
at kafka.api.PlaintextAdminIntegrationTest.testElectUncleanLeadersForOnePartition(PlaintextAdminIntegrationTest.scala:1583)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
The std output doesn't contain useful information that we could use to debug the cause of failure. This is because the test, currently, validates if there is an exception and fails when one is present. It does not print what the exception is.
Change
1. Make the test a bit more robust by waiting for server startup.
2. Fail the test with the actual unexpected exception which will help in debugging the cause of failure.
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Motivation
Reading/writing the protocol buffer varInt32 and varInt64 (also called varLong in our code base) is in the hot path of data plane code in Apache Kafka. We read multiple varInt in a record and in long. Hence, even a minor change in performance could extrapolate to larger performance benefit.
In this PR, we only update varInt32 encoding/decoding.
Changes
This change uses loop unrolling and reduces the amount of repetition of calculations. Based on the empirical results from the benchmark, the code has been modified to pick up the best implementation.
Results
Performance has been evaluated using JMH benchmarks on JDK 17.0.6. Various implementations have been added in the benchmark and benchmarking has been done for different sizes of varints and varlongs. The benchmark for various implementations have been added at ByteUtilsBenchmark.java
Reviewers: Ismael Juma <mlists@juma.me.uk>, Luke Chen <showuon@gmail.com>, Alexandre Dupriez <alexandre.dupriez@gmail.com>
When Log cleaning is shutdown, it doesn't remove metrics that were registered to `KafkaYammerMetrics.defaultRegistry()` which has one instance per server. Log cleaner's lifecycle is associated with lifecycle of `LogManager` and hence, there is no possibility where log cleaner will be shutdown but the broker won't. Broker shutdown will close the `jmxReporter` and hence, there is no current metric leak here. The motivation for this code change is to "do the right thing" from a code hygiene perspective.
Reviewers: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Kirk True <kirk@mustardgrain.com>, David Jacot <djacot@confluent.io>
This patch adds support for handling metadata snapshots while in dual-write mode. Prior to this change, if the active
controller loaded a snapshot, it would get out of sync with the ZK state.
In order to reconcile the snapshot state with ZK, several methods were added to scan through the metadata in ZK to
compute differences with the MetadataImage. Since this introduced a lot of code, I opted to split out a lot of methods
from ZkMigrationClient into their own client interfaces, such as TopicMigrationClient, ConfigMigrationClient, and
AclMigrationClient. Each of these has some iterator method that lets the caller examine the ZK state in a single pass
and without using too much memory.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Luke Chen <showuon@gmail.com>
When creating the QuorumController, log whether ZK migration is enabled.
When applying a feature level record which sets the metadata version, log the metadata version enum
rather than the numeric feature level.
Improve the logging when we replay snapshots in QuorumController. Log both the beginning and the
end of replay.
When TRACE is enabled, log every record that is replayed in QuorumController. Since some records
may contain sensitive information, create RecordRedactor to assist in logging only what is safe to
put in the log4j file.
Add logging to ControllerPurgatory. Successful completions are logged at DEBUG; failures are logged
at INFO, and additions are logged at TRACE.
Remove SnapshotReason.java, SnapshotReasonTest.java, and
QuorumController#generateSnapshotScheduled. They are deadcode now that snapshot generation moved to
org.apache.kafka.image.publisher.SnapshotGenerator.
Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>
Also modifies verification to only add a partition to verify if it is transactional.
When verifying we look at all the transactional producer IDs and throw INVALID_RECORD on the request if one is different.
Reviewers: Kirk True <ktrue@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>
Producers used to throw a fatal error upon failing initProducerId, which can be caused by authorization errors. In this case, the user will need to instantiate a producer.
This PR makes authorization errors non-fatal so that the user can retry until the permission is fixed by an admin.
Here we first transition the producer to the ABORTABLE state, then to the UNINITIALIZED state (so that the producer is recoverable). Upon the subsequent send, the producer will transition to INITIALIZING and attempt to send another InitProducerIdRequest.
Reviewers: Kirk True <ktrue@confluent.io>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Justine Olshan <jolshan@confluent.io>
The integration tests seem to create an unnecessarily large number of threads. This reduces the number of threads created per integration test harness broker.
Reviewers: Luke Chen <showuon@gmail.com>. Justine Olshan <jolshan@confluent.io>
The generated message types are missing a range check for the case when the tagged version range is a subset of
the flexible version range. This causes the tagged field count, which is computed correctly, to conflict with the
number of tags serialized.
Reviewers: Colin P. McCabe <cmccabe@apache.org>