`ZkUtils` is not used by the broker, has been deprecated since
2.0.0 and it was never intended as a public API. We should
remove it along with `AdminUtils` methods that rely on it.
Reviewers: David Arthur <mumrah@gmail.com>
* KAFKA-8106:Reducing the allocation and copying of ByteBuffer when logValidator do validation.
* KAFKA-8106:Reducing the allocation and copying of ByteBuffer when logValidator do validation.
* github comments
* use batch.skipKeyValueIterator
* cleanups
* no need to skip kv for uncompressed iterator
* checkstyle fixes
* fix findbugs
* adding unit tests
* reuse decompression buffer; and using streaming iterator
* checkstyle
* add unit tests
* remove reusing buffer supplier
* fix unit tests
* add unit tests
* use streaming iterator
* minor refactoring
* rename
* github comments
* github comments
* reuse buffer at DefaultRecord caller
* some further optimization
* major refactoring
* further refactoring
* update comment
* github comments
* minor fix
* add jmh benchmarks
* update jmh
* github comments
* minor fix
* github comments
It has been deprecated since 0.11.0, it was never meant as a publicly
supported API and people should use
`org.apache.kafka.clients.admin.AdminClient` instead. Its presence
causes confusion and people still use them accidentally at times.
`BrokerApiVersionsCommand` uses one method that is not available
in `org.apache.kafka.clients.admin.AdminClient`, we inline it for now.
Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ismael@juma.me.uk>
The AllReplicas was only printing remote replica ids. This change prints
all ids, including local one.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
Kafka should not NPE while loading a deleted partition dir with no log segments. This patch ensures that there will always be at least one segment after initialization.
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Cancel PeriodicProducerExpirationCheck when closing a Log instance, to avoid a memory leak. Add a method to KafkaScheduler to make this possible.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
This patch simplifies the controller election API. We were passing `LeaderIsrAndControllerEpoch` into the election utilities even though we just needed `LeaderAndIsr`. We also remove some unneeded collection copies `doElectLeaderForPartitions`.
Reviewers: Ismael Juma <ismael@juma.me.uk>
A `Partition` object contain one or many `Replica` objects. These replica
objects in turn can have the "log" if the replica corresponds to the
local node. All the code in Partition or ReplicaManager peek into
replica object to fetch the log if they need to operate on that. As
replica object can represent a local replica or a remote one, this
lead to a bunch of "if-else" code in log fetch and offset update code.
NOTE: In addition to a "log" that is in use during normal operation, if
an alter log directory command is issued, we also create a future log
object. This object catches up with local log and then we switch the log
directory. So temporarily a Partition can have two local logs. Before
this change both logs are inside replica objects.
This change is an attempt to untangle this relationship. In particular
it moves `Log` from `Replica` into `Partition`. So a partition contains
a local log to which all writes go and possibly a "future log" if the
partition is being moved between directories. Additionally, it maintains
a list of remote replicas for offset and "caught up time" data that it uses
for replication protocol.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
Makes HostedPartition a sealed trait and make all of the match cases explicit.
Reviewers: Vikas Singh <soondenana@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
This PR fixes a bug in static group membership. Previously we limit the `member.id` replacement in JoinGroup to only cases when the group is in Stable. This is error-prone and could potentially allow duplicate consumers reading from the same topic. For example, imagine a case where two unknown members join in the `PrepareRebalance` stage at the same time.
The PR fixes the following things:
1. Replace `member.id` at any time we see a known static member rejoins group with unknown member.id
2. Immediately fence any ongoing join/sync group callback to early terminate the duplicate member.
3. Clearly handle Dead/Empty cases as exceptional.
4. Return old leader id upon static member leader rejoin to avoid trivial member assignment being triggered.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Currently we load the high watermark checkpoint separately for every replica that we load. This patch makes this loading logic lazy and caches the loaded map while a LeaderAndIsr request is being handled.
Reviewers: Jun Rao <junrao@gmail.com>
We have seen this failing recently due to the follower error:
```
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:366)
at scala.None$.get(Option.scala:364)
at kafka.admin.PreferredReplicaLeaderElectionCommandTest.getLeader(PreferredReplicaLeaderElectionCommandTest.scala:101)
at kafka.admin.PreferredReplicaLeaderElectionCommandTest.testNoopElection(PreferredReplicaLeaderElectionCommandTest.scala:240)
```
We need to wait for the leader to be available.
Reviewers: David Arthur <mumrah@gmail.com>
We see this failure from time to time:
```
java.lang.AssertionError: expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at kafka.api.TransactionsTest.testFencingOnTransactionExpiration(TransactionsTest.scala:512)
```
The cause is probably that we are using `consumeRecordsFor` which has no expectation on the number of records to fetch and a timeout of just 1s. This patch changes the code to use `consumeRecords` and the default 15s timeout.
Note we have also fixed a bug in the test case itself, which was using the wrong topic for the second write, which meant it could never have failed in the anticipated way anyway.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Gwen Shapira
Closes#6905 from hachikuji/fix-flaky-transaction-test
This patch adds test cases for the leader election command added in KIP-460.
Reviewers: Vikas Singh, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
This patch checks for errors handling a fetch request before updating follower state. Previously we were unsafely passing the failed `LogReadResult` with most fields set to -1 into `Replica` to update follower state. Additionally, this patch attempts to improve the test coverage for ISR shrinking and expansion logic in `Partition`.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The Dead state in the coordinator is used for groups which are either pending deletion or migration to a new coordinator. Currently requests received while in this state result in an UNKNOWN_MEMBER_ID which causes consumers to reset the memberId. This is a problem for KIP-345 since it can cause an older member to fence a newer member. This patch changes the error code returned in this state to COORDINATOR_NOT_AVAILABLE, which causes the consumer to rediscover the coordinator, but not reset the memberId.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
This commit makes three changes:
- Adds a constructor for NewTopic(String, Optional<Integer>, Optional<Short>)
which allows users to specify Optional.empty() for numPartitions or
replicationFactor in order to use the broker default.
- Changes AdminManager to accept -1 as valid options for replication
factor and numPartitions (resolving to broker defaults).
- Makes --partitions and --replication-factor optional arguments when creating
topics using kafka-topics.sh.
- Adds a dependency on scalaJava8Compat library to make it simpler to
convert Scala Option to Java Optional
Reviewers: Ismael Juma <ismael@juma.me.uk>, Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>
The test was failing because other test cases are leaving some metrics in the registry. This patch cleans the registry before and after test runs.
Reviewers: Ismael Juma <ismael@juma.me.uk>
I think it's better just to make single-batch as a universal requirement for all versions for compressed messages, and for V2 and beyond uncompressed messages as well.
Reviewers: Jason Gustafson <jason@confluent.io>
We have two fields `highWatermarkCheckPointThreadStarted` and `hwThreadInitialized` which appear to be serving the same purpose. This patch gets rid of `hwThreadInitialized`.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
When cleaning transactional data, we need to keep track of which transactions still have data associated with them so that we do not remove the markers. We had logic to do this, but the state was not being carried over when beginning cleaning for a new set of segments. This could cause the cleaner to incorrectly believe a transaction marker was no longer needed. The fix here carries the transactional state between groups of segments to be cleaned.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>
In the olden days, OffsetForLeaderEpoch was exclusively an inter-broker protocol and
required Cluster level permission. With KIP-320, clients can use this API as well and
so we lowered the required permission to Topic Describe. The only way the client can
be sure that the new permissions are in use is to require version 3 of the protocol
which was bumped for 2.3. If the broker does not support this version, we skip the
validation and revert to the old behavior.
Additionally, this patch fixes a problem with the newly added replicaId field when
parsed from older versions which did not have it. If the field was not present, then
we used the consumer's sentinel value, but this would limit the range of visible
offsets by the high watermark. To get around this problem, this patch adds a
separate "debug" sentinel similar to APIs like Fetch and ListOffsets.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch attempts to simplify the interaction between Partition and the various components from `ReplicaManager`. This is primarily to make unit testing easier. I have also tried to eliminate the OfflinePartition sentinel which has always been unsafe.
Reviewers: Boyang Chen <bchen11@outlook.com>, David Arthur <mumrah@gmail.com>
Remove created metrics when shutting down `ControllerEventManager`. This fixes transient failures in `ControllerEventManagerTest.testEventQueueTime` and is generally good hygiene.
Reviewers: José Armando García Sancio <jsancio@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#5041 from omkreddy/KAFKA-3143
This changes the field "generationid" to "generationId" to be consistent with other uses.
Reviewers: Shaobo Liu <lambda.tencent@gmail.com>, Jason Gustafson <jason@confluent.io>
For static members join/rejoin, we encode the current timestamp in the new member.id. The format looks like group.instance.id-timestamp.
During consumer/broker interaction logic (Join, Sync, Heartbeat, Commit), we shall check the whether group.instance.id is known on group. If yes, we shall match the member.id stored on static membership map with the request member.id. If mismatching, this indicates a conflict consumer has used same group.instance.id, and it will receive a fatal exception to shut down.
Right now the only missing part is the system test. Will work on it offline while getting the major logic changes reviewed.
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This patch includes API changes for follower fetching per [KIP-392](https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica) as well as the consumer implementation. After this patch, consumers will continue to fetch only from the leader, since the broker implementation to select an alternate read replica is not included here.
Adds new `client.rack` consumer configuration property is added which allows the consumer to indicate its rack. This is just an arbitrary string to indicate some relative location, it doesn't have to actually represent a physical rack. We are keeping the naming consistent with the broker property (`broker.rack`).
FetchRequest now includes `rack_id` which can optionally be specified by the consumer. FetchResponse includes an optional `preferred_read_replica` field for each partition in the response. OffsetForLeaderEpochRequest also adds new `replica_id` field which is similar to the same field in FetchRequest.
When the consumer sees a `preferred_read_replica` in a fetch response, it will use the Node with that ID for the next fetch.
Reviewers: Jason Gustafson <jason@confluent.io>
Deletion of a large number of topics can cause a ton of log spam. In a test case on 2.2, deletion of 50 topics with 100 partitions each caused about 158 Mb of data in the controller log. With the improvements to batch StopReplica and the patch here, we reduce that to about 1.5 Mb.
Kudos to gwenshap for spotting these spammy messages.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Gwen Shapira
Closes#6738 from hachikuji/remove-verbose-topic-deletion-log-message
The replica fetcher thread is terminated in case a partition crashes which leads to under replication. This behavior can be improved by dropping the failed partition. The thread can continue monitoring the rest of the partitions. If all partitions of a thread have failed, the thread would be shut down. This is documented in KIP-461: https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure.
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
To make static consumer group members more persistent, we want to avoid kicking out unjoined members through rebalance timeout. Essentially we allow static members to participate in a rebalance using their old subscription without sending a JoinGroup. The only catch is that an unjoined static member might be the current group leader, and we may need to elect a different leader.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>