Fix a bug where multiple remove records could be generated for a single ACL. Previously, this happened
if the user submitted multiple filters to deleteAcls, and more than one matched.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
This patch adds a new case class `FetchParams` which encapsulates the parameters of the fetch request. It then uses this class in `DelayedFetch` directly instead of `FetchMetadata`. The intent is to reduce the number of things we need to change whenever we need to pass through new parameters. The patch also cleans up `ReplicaManagerTest` for more consistent usage.
Reviewers: David Jacot <djacot@confluent.io>
We can append/subtract multiple config values in kraft mode using the `IncrementalAlterConfig` RPC. For example: append/subtract topic config "cleanup.policy" with value="delete,compact" will end up treating "delete,compact" as a value not 2 values. This patch fixes the problem. Additionally, it update the zk logic to correctly handle duplicate additions.
Reviewers: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
When a client connects to a SSL listener using PLAINTEXT security protocol, after the TCP connection is setup, the client considers the channel setup is complete. In reality the channel setup is not complete yet. The client then resets reconnect exponential backoff and issues API version request. Since the broker expects SSL handshake, the API version request will cause the connection to disconnect. Client reconnects without exponential backoff since it has been reset.
This commit removes the reset of reconnect exponential backoff when sending API version request. In the good case where the channel setup is complete, reconnect exponential backoff will be reset when the node becomes ready, which is after getting the API version response. Inter-broker clients which do not send API version request and go directly to ready state continue to reset backoff before any successful requests.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
This patch refactors the `Partition.makeLeader` and `Partition.makeFollower` to be robust to all partition updates from the KRaft metadata log. Particularly, it ensures the following invariants:
- A partition update is accepted if the partition epoch is equal or newer. The partition epoch is updated by the AlterPartition path as well so we accept an update from the metadata log with the same partition epoch in order to fully update the partition state.
- The leader epoch state offset is only updated when the leader epoch is bumped.
- The follower states are only updated when the leader epoch is bumped.
- Fetchers are only restarted when the leader epoch is bumped. This was already the case but this patch adds unit tests to prove/maintain it.
In the mean time, the patch unifies the state change logs to be similar in both ZK and KRaft world.
Reviewers: Jason Gustafson <jason@confluent.io>
The goals here include:
1. Create an overloaded variant of the IncrementalCooperativeAssignor::performTaskAssignment method that is more testing friendly
2. Simplify the parameter list for the IncrementalCooperativeAssignor::handleLostAssignments method, which in turn simplifies the logic for testing this class
3. Capture repeated Java 8 streams logic in simple, reusable, easily-verifiable utility methods added to the ConnectUtils class
Reviewers: Luke Chen <showuon@gmail.com>
The design is described in detail in KIP-794
https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner.
Implementation notes:
The default partitioning logic is moved to the BuiltInPartitioner class
(there is one object per topic). The object keeps track of how many
bytes are produced per-partition and once the amount exceeds batch.size,
switches to the next partition (note that partition switch decision is
decoupled from batching). The object also keeps track of probability
weights that are based on the queue sizes (the larger the queue size
is the less chance for the next partition to be chosen). The queue
sizes are calculated in the RecordAccumulator in the `ready` method,
the method already enumerates all partitions so we just add some extra
logic into the existing O(N) method. The partition switch decision may
take O(logN), where N is the number partitions per topic, but it happens
only once per batch.size (and the logic is avoided when all queues are
of equal size). Produce bytes accounting logic is lock-free.
When partitioner.availability.timeout.ms is non-0, RecordAccumulator
keeps stats on "node latency" which is defined as the difference between
the last time the node had a batch waiting to be send and the last time
the node was ready to take a new batch. If this difference exceeds
partitioner.availability.timeout.ms we don't switch to that partition
until the node is ready.
Reviewers: Jun Rao <junrao@gmail.com>
A prior commit accidentally changed the javadoc for RecordContext.
In reality, it is not reachable from api.Processor, only Processor.
Reviewers: Guozhang Wang <guozhang@apache.org>
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Reviewers: Tom Bentley <tbentley@redhat.com>, Hector Geraldino <hgeraldino@bloomberg.net>, Andrew Eugene Choi <andrew.choi@uwaterloo.ca>
Time ordered session store implementation. I introduced AbstractRocksDBTimeOrderedSegmentedBytesStore to make it generic for RocksDBTimeOrderedSessionSegmentedBytesStore and RocksDBTimeOrderedSegmentedBytesStore.
A few minor follow-up changes:
1. Avoid extra byte array allocation for fixed upper/lower range serialization.
2. Rename some class names to be more consistent.
Authored-by: Hao Li <1127478+lihaosky@users.noreply.github.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com.com>, John Roesler <vvcephei@apache.org>
This PR adds the default implementation of the state updater. The implementation only implements adding active tasks to the state updater.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This PR tries to avoid the reinitialization of the leader epoch cache
and the partition metadata if the corresponding replica is being deleted.
With this change, the asyncDelete method can run more efficiently,
which means a StopReplica request with many partitions to be deleted can be
processed more quickly.
Reviewers: David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
The KRaft implementation of the `CreatePartitions` ignores the `validateOnly` flag in the
request and creates the partitions if the validations are successful. Fixed the behavior
not to create partitions upon validation if the `validateOnly` flag is true.
Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>
These tests belongs to ApiVersionsResponseTest, and accidentally copied them to MetadataVersionTest when working on #12072.
Reviewer: Luke Chen <showuon@gmail.com>
This is a copy PR of #11896, authored by @lihaosky (Hao Li): Initial implementation to emit final for TimeWindowedKStreamImpl. This PR is on top of #12030
Author: Hao Li
Reviewers: John Roesler <vvcephei@apache.org>
Refactoring ApiVersion to MetadataVersion to support both old IBP versioning and new KRaft versioning (feature flags)
for KIP-778.
IBP versions are now encoded as enum constants and explicitly prefixed w/ IBP_ instead of KAFKA_, and having a
LegacyApiVersion vs DefaultApiVersion was not necessary and replaced with appropriate parsing rules for extracting
the correct shortVersions/versions.
Co-authored-by: David Arthur <mumrah@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
Fix ClientQuotasRequestTest.testAlterClientQuotasBadIp so that it uses actually unresolvable hostnames.
The previous choices "ip" and "abc-123" are now resolvable.
Reviewers: David Jacot <djacot@confluent.io>, Andrew Choi <andrew.choi@uwaterloo.ca>, Divij Vaidya <divijvaidya13@gmail.com>
This patch fixes some strangeness and inconsistency in the messages written by `TransactionalMessageCopier` to stdout. Here is a sample of two messages.
Progress message:
```
{"consumed":33000,"stage":"ProcessLoop","totalProcessed":33000,"progress":"copier-0","time":"2022/04/24 05:40:31:649","remaining":333}
```
The `transactionalId` is set to the value of the `progress` key.
And a shutdown message:
```
{"consumed":33333,"shutdown_complete":"copier-0","totalProcessed":33333,"time":"2022/04/24 05:40:31:937","remaining":0}
```
The `transactionalId` this time is set to the `shutdown_complete` key and there is no `stage` key.
In this patch, we change the following:
1. Use a separate key for the `transactionalId`.
2. Drop the `progress` and `shutdown_complete` keys.
3. Use `stage=ShutdownComplete` in the shutdown message.
4. Modify `transactional_message_copier.py` system test service accordingly.
Reviewers: David Arthur <mumrah@gmail.com>
Conceptually, the ordering is defined by the producer id, producer epoch
and the sequence number. This set should generally only have entries
for the same producer id and epoch, but there is one case where
we can have conflicting `remove` calls and hence we add this as
a temporary safe fix.
We'll follow-up with a fix that ensures the original intended invariant.
Reviewers: Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>
We added the metadata wait time in total blocked time (#11805). But we added it in the critical path of send which is called per-record, whereas metadata refresh only happens rarely. This way the cost of time.nanos becomes unnecessarily significant as we call it twice per record.
This PR moves the call to inside the waitOnMetadata callee and only when we do need to wait for a metadata refresh round-trip (i.e. we are indeed blocking).
Reviewers: Matthias J. Sax <matthias@confluent.io>
The html document generation has some errors in it, specifically related to protocols. The two issues identified and resolved are:
* Missing </tbody> closing tags added
* Invalid usage of a <p> tag as a wrapper element for <table> elements. Changed the <p> tag to be a <div>.
Tested by running ./gradlew siteDocsTar and observing that the output was properly formed.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Since we have changed the `AlterIsr` API to `AlterPartition`, it makes sense to rename `AlterIsrManager` as well and some of the associated classes.
Reviewers: dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>
The bug was introduced in #11689 that an additional onAcknowledgement was made using the InterceptorCallback class. This is undesirable since onSendError will attempt to call onAcknowledgement once more.
Reviewers: Jun Rao <junrao@gmail.com>
Adding KRaft and ZK params to ConfigCommandIntegrationTest wherever appropriate.
Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@users.noreply.github.com>
This patch refactors kafka.cluster.Replica, it usages and tests. This is part of the work in KAFKA-13790.
Reviewers: Jason Gustafson <jason@confluent.io>
Currently we validate recovery state before checking leader epoch in `KafkaController`. It seems more intuitive to validate leader epoch first since the leader might be working with stale state, which is what we do in KRaft. This patch fixes this and adds a couple additional validations to make the behavior consistent.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
The wildcard * in command without wrapped by single quote will be replaced into the file name under the current folder by bash. So we need to wrap with single quote. Update the doc and command option description.
Reviewers: dengziming <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>
This patch does some initial cleanups in the context of KAFKA-13790. Mainly, it renames `ZkVersion` field to `PartitionEpoch` in the `LeaderAndIsrRequest`, the `LeaderAndIsr` and the `Partition`.
Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>
This PR fixes a case where we were unable to place on fenced brokers In KRaft mode. Specifically,
if we had a broker registration in the metadata log, but no associated heartbeat, previously the
HeartbeatManager would not track the fenced broker. This PR fixes this by adding this logic to the
metadata log replay path in ClusterControlManager.
Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>
When LeaderRecoveryState was added to the PartitionChangeRecord, the
check for being a noop was not updated. This commit fixes that and
improves the associated test to avoid this oversight in the future.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
* Fix UP-TO-DATE check in `create*VersionFile` tasks
`create*VersionFile` tasks explicitly declared output UP-TO-DATE status
as being false. This change properly sets the inputs to
`create*VersionFile` tasks to the `commitId` and `version` values and
sets `receiptFile` locally rather than in an extra property.
* Enable output caching for `process*Messages` tasks
`process*Messages` tasks did not have output caching enabled. This
change enables that caching, as well as setting a property name and
RELATIVE path sensitivity.
* Fix existing Gradle deprecations
Replaces `JavaExec#main` with `JavaExec#mainClass`
Replaces `Report#destination` with `Report#outputLocation`
Adds a `generator` configuration to projects that need to resolve
the `generator` project (rather than referencing the runtimeClasspath
of the `generator` project from other project contexts.
Reviewers: Mickael Maison <mickael.maison@gmail.com>
In drainBatchesForOneNode method, there's possibility causing some partitions in a node will never get picked. Fix this issue by maintaining a drainIndex for each node.
Reviewers: Luke Chen <showuon@gmail.com>, RivenSun <91005273+RivenSun2@users.noreply.github.com>