Makes HostedPartition a sealed trait and make all of the match cases explicit.
Reviewers: Vikas Singh <soondenana@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
This PR fixes a bug in static group membership. Previously we limit the `member.id` replacement in JoinGroup to only cases when the group is in Stable. This is error-prone and could potentially allow duplicate consumers reading from the same topic. For example, imagine a case where two unknown members join in the `PrepareRebalance` stage at the same time.
The PR fixes the following things:
1. Replace `member.id` at any time we see a known static member rejoins group with unknown member.id
2. Immediately fence any ongoing join/sync group callback to early terminate the duplicate member.
3. Clearly handle Dead/Empty cases as exceptional.
4. Return old leader id upon static member leader rejoin to avoid trivial member assignment being triggered.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Currently we load the high watermark checkpoint separately for every replica that we load. This patch makes this loading logic lazy and caches the loaded map while a LeaderAndIsr request is being handled.
Reviewers: Jun Rao <junrao@gmail.com>
We have seen this failing recently due to the follower error:
```
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:366)
at scala.None$.get(Option.scala:364)
at kafka.admin.PreferredReplicaLeaderElectionCommandTest.getLeader(PreferredReplicaLeaderElectionCommandTest.scala:101)
at kafka.admin.PreferredReplicaLeaderElectionCommandTest.testNoopElection(PreferredReplicaLeaderElectionCommandTest.scala:240)
```
We need to wait for the leader to be available.
Reviewers: David Arthur <mumrah@gmail.com>
We see this failure from time to time:
```
java.lang.AssertionError: expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at kafka.api.TransactionsTest.testFencingOnTransactionExpiration(TransactionsTest.scala:512)
```
The cause is probably that we are using `consumeRecordsFor` which has no expectation on the number of records to fetch and a timeout of just 1s. This patch changes the code to use `consumeRecords` and the default 15s timeout.
Note we have also fixed a bug in the test case itself, which was using the wrong topic for the second write, which meant it could never have failed in the anticipated way anyway.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Gwen Shapira
Closes#6905 from hachikuji/fix-flaky-transaction-test
This patch adds test cases for the leader election command added in KIP-460.
Reviewers: Vikas Singh, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
This patch checks for errors handling a fetch request before updating follower state. Previously we were unsafely passing the failed `LogReadResult` with most fields set to -1 into `Replica` to update follower state. Additionally, this patch attempts to improve the test coverage for ISR shrinking and expansion logic in `Partition`.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The Dead state in the coordinator is used for groups which are either pending deletion or migration to a new coordinator. Currently requests received while in this state result in an UNKNOWN_MEMBER_ID which causes consumers to reset the memberId. This is a problem for KIP-345 since it can cause an older member to fence a newer member. This patch changes the error code returned in this state to COORDINATOR_NOT_AVAILABLE, which causes the consumer to rediscover the coordinator, but not reset the memberId.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
This commit makes three changes:
- Adds a constructor for NewTopic(String, Optional<Integer>, Optional<Short>)
which allows users to specify Optional.empty() for numPartitions or
replicationFactor in order to use the broker default.
- Changes AdminManager to accept -1 as valid options for replication
factor and numPartitions (resolving to broker defaults).
- Makes --partitions and --replication-factor optional arguments when creating
topics using kafka-topics.sh.
- Adds a dependency on scalaJava8Compat library to make it simpler to
convert Scala Option to Java Optional
Reviewers: Ismael Juma <ismael@juma.me.uk>, Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>
The test was failing because other test cases are leaving some metrics in the registry. This patch cleans the registry before and after test runs.
Reviewers: Ismael Juma <ismael@juma.me.uk>
I think it's better just to make single-batch as a universal requirement for all versions for compressed messages, and for V2 and beyond uncompressed messages as well.
Reviewers: Jason Gustafson <jason@confluent.io>
We have two fields `highWatermarkCheckPointThreadStarted` and `hwThreadInitialized` which appear to be serving the same purpose. This patch gets rid of `hwThreadInitialized`.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
When cleaning transactional data, we need to keep track of which transactions still have data associated with them so that we do not remove the markers. We had logic to do this, but the state was not being carried over when beginning cleaning for a new set of segments. This could cause the cleaner to incorrectly believe a transaction marker was no longer needed. The fix here carries the transactional state between groups of segments to be cleaned.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>
In the olden days, OffsetForLeaderEpoch was exclusively an inter-broker protocol and
required Cluster level permission. With KIP-320, clients can use this API as well and
so we lowered the required permission to Topic Describe. The only way the client can
be sure that the new permissions are in use is to require version 3 of the protocol
which was bumped for 2.3. If the broker does not support this version, we skip the
validation and revert to the old behavior.
Additionally, this patch fixes a problem with the newly added replicaId field when
parsed from older versions which did not have it. If the field was not present, then
we used the consumer's sentinel value, but this would limit the range of visible
offsets by the high watermark. To get around this problem, this patch adds a
separate "debug" sentinel similar to APIs like Fetch and ListOffsets.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch attempts to simplify the interaction between Partition and the various components from `ReplicaManager`. This is primarily to make unit testing easier. I have also tried to eliminate the OfflinePartition sentinel which has always been unsafe.
Reviewers: Boyang Chen <bchen11@outlook.com>, David Arthur <mumrah@gmail.com>
Remove created metrics when shutting down `ControllerEventManager`. This fixes transient failures in `ControllerEventManagerTest.testEventQueueTime` and is generally good hygiene.
Reviewers: José Armando García Sancio <jsancio@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#5041 from omkreddy/KAFKA-3143
This changes the field "generationid" to "generationId" to be consistent with other uses.
Reviewers: Shaobo Liu <lambda.tencent@gmail.com>, Jason Gustafson <jason@confluent.io>
For static members join/rejoin, we encode the current timestamp in the new member.id. The format looks like group.instance.id-timestamp.
During consumer/broker interaction logic (Join, Sync, Heartbeat, Commit), we shall check the whether group.instance.id is known on group. If yes, we shall match the member.id stored on static membership map with the request member.id. If mismatching, this indicates a conflict consumer has used same group.instance.id, and it will receive a fatal exception to shut down.
Right now the only missing part is the system test. Will work on it offline while getting the major logic changes reviewed.
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This patch includes API changes for follower fetching per [KIP-392](https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica) as well as the consumer implementation. After this patch, consumers will continue to fetch only from the leader, since the broker implementation to select an alternate read replica is not included here.
Adds new `client.rack` consumer configuration property is added which allows the consumer to indicate its rack. This is just an arbitrary string to indicate some relative location, it doesn't have to actually represent a physical rack. We are keeping the naming consistent with the broker property (`broker.rack`).
FetchRequest now includes `rack_id` which can optionally be specified by the consumer. FetchResponse includes an optional `preferred_read_replica` field for each partition in the response. OffsetForLeaderEpochRequest also adds new `replica_id` field which is similar to the same field in FetchRequest.
When the consumer sees a `preferred_read_replica` in a fetch response, it will use the Node with that ID for the next fetch.
Reviewers: Jason Gustafson <jason@confluent.io>
Deletion of a large number of topics can cause a ton of log spam. In a test case on 2.2, deletion of 50 topics with 100 partitions each caused about 158 Mb of data in the controller log. With the improvements to batch StopReplica and the patch here, we reduce that to about 1.5 Mb.
Kudos to gwenshap for spotting these spammy messages.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Gwen Shapira
Closes#6738 from hachikuji/remove-verbose-topic-deletion-log-message
The replica fetcher thread is terminated in case a partition crashes which leads to under replication. This behavior can be improved by dropping the failed partition. The thread can continue monitoring the rest of the partitions. If all partitions of a thread have failed, the thread would be shut down. This is documented in KIP-461: https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure.
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
To make static consumer group members more persistent, we want to avoid kicking out unjoined members through rebalance timeout. Essentially we allow static members to participate in a rebalance using their old subscription without sending a JoinGroup. The only catch is that an unjoined static member might be the current group leader, and we may need to elect a different leader.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
SslFactory: split the part of SslFactory that creates SSLEngine instances into SslEngineBuilder. When (re)configuring, we simply create a new SslEngineBuilder. This allows us to make all the builder fields immutable. It also simplifies the logic for reconfiguring. Because we sometimes need to test old SslEngine instances against new ones, being able to use both the old and the new builder at once is useful.
Create an enum named SslClientAuth which encodes the possible values for ssl.client.auth. This will simplify the handling of this configuration.
SslTransportLayer#maybeProcessHandshakeFailure should treat an SSLHandshakeException with a "Received fatal alert" message as a handshake error (and therefore an authentication error.)
SslFactoryTest: add some line breaks for very long lines.
ConfigCommand#main: when terminating the command due to an uncaught exception, log the exception using debug level in slf4j, in addition to printing it to stderr. This makes it easier to debug failing junit tests, where stderr may not be kept, or may be reordered with respect to other slf4j messages. The use of debug level is consistent with how we handle other types of exceptions in ConfigCommand#main.
StateChangeLogMerger#main: spell out the full name of scala.io.Source rather than abbreviating it as io.Source. This makes it clearer that it is part of the Scala standard library. It also avoids compiler errors when other libraries whose groupId starts with "io" are used in the broker.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Enable reconfiguration of SSL keystores and truststores in client-side channel builders used by brokers for controller, transaction coordinator and replica fetchers. This enables brokers using TLS mutual authentication for inter-broker listener to use short-lived certs that may be updated before expiry without restarting brokers.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
`ConsumerTopicCreationTest` relied on `KafkaConsumer#poll` to send a `MetadataRequest` within 100ms to verify if a topic is auto created or not. This is brittle and does not guarantee if the request made it to the broker or was processed successfully. This PR fixes the flaky test by adding another topic; we wait until we consume a previously produced record to this topic. This ensures MetadataRequest was processed and we could then check if the topic we're interested in was created or not.
Reviewers: Boyang Chen <bchen11@outlook.com>, Jason Gustafson <jason@confluent.io>
MINOR: update documentation for the log cleaner max compaction lag feature (KIP-354) implemented in KAFKA-7321
Author: Xiongqi Wu <xiowu@linkedin.com>
Reviewer: Joel Koshy <jjkoshy@gmail.com>
The main problem we are trying to solve here is the batching of StopReplica requests and the lack of test coverage for `ControllerChannelManager`. Addressing the first problem was straightforward, but the second problem required quite a bit of work because of the dependence on `KafkaController` for all of the events. It seemed to make sense to separate the events from the processing of events so that we could remove this dependence and improve testability. With the refactoring, I was able to add test cases covering most of the logic in `ControllerChannelManager` including the generation of requests and the expected response handling logic. Note that I have not actually changed any of the event handling logic in `KafkaController`.
While refactoring this logic, I found that the event queue time metric was not being correctly computed. The problem is that many of the controller events were singleton objects which inherited the `enqueueTimeMs` field from the `ControllerEvent` trait. This would never get updated, so queue time would be skewed.
Reviewers: Jun Rao <junrao@gmail.com>
KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354)
Records become eligible for compaction after the specified time interval.
Author: Xiongqi Wu <xiowu@linkedin.com>
Reviewer: Joel Koshy <jjkoshy@gmail.com>
The log cleaner attempts to preserve the last entry for each producerId in order to ensure that sequence/epoch state is not lost. The current validation checks only the last sequence number for each producerId in order to decide whether a batch should be retained. There are two problems with this:
1. Sequence numbers are not unique alone. It is the tuple of sequence number and epoch which is uniquely defined.
2. The group coordinator always writes batches beginning with sequence number 0, which means there could be many batches which have the same sequence number.
The complete fix for the second issue would probably add proper sequence number bookkeeping in the coordinator. For now, we have left the coordinator implementation unchanged and changed the cleaner logic to use the last offset written by a producer instead of the last sequence number.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Because of how conversions between Java collections and Scala collections work, ImplicitLinkedHashMultiSet objects were being treated as unordered in some contexts where they shouldn't be. This broke JOIN_GROUP handling.
This patch renames ImplicitLinkedHashMultiSet to ImplicitLinkedHashMultCollection. The order of Collection objects will be preserved when converting to scala. Adding Set and List "views" to the Collection gives us a more elegant way of accessing that functionality when needed.
Reviewers: Colin P. McCabe <cmccabe@apache.org>