The controller maintains state across `ControllerContext`, `PartitionStateMachine`, `ReplicaStateMachine`, and `TopicDeletionManager`. None of this state is actually isolated from the rest. For example, topics undergoing deletion are intertwined with the partition and replica states. As a consequence of this, each of these components tends to be dependent on all the rest, which makes testing and reasoning about the system difficult. This is a first step toward untangling all the state. This patch moves it all into `ControllerContext` and removes many of the circular dependencies. So far, this is mostly a direct translation, but in the future we can add additional validation in `ControllerContext` to make sure that state is maintained consistently.
Additionally, this patch adds several mock objects to enable easier testing: `MockReplicaStateMachine` and `MockPartitionStateMachine`. These have simplified logic for updating the current state. This is used to create some new test cases for `TopicDeletionManager`.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jun Rao <junrao@gmail.com>
We suspect the problem might be a race condition after broker startup where the consumer has yet to find the coordinator and rebalance. The fix here rolls all the brokers first and then waits for the expected exception.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Gwen Shapira
Closes#6608 from hachikuji/KAFKA-7965
This patch fixes a bug in the append logic which can cause duplicate offsets to be appended to the log when the append to the transaction index fails. Rather than incrementing the log end offset after the index append, we do it immediately after the records are written to the log. If the index append later fails, we do two things:
1) We ensure that the last stable offset cannot advance. This guarantees that the aborted data will not be returned to the user until the transaction index contains the corresponding entry.
2) We skip updating the end offset of the producer state. When recovering the log, we will have to reprocess the log and write the index entries.
Reviewers: Jun Rao <junrao@gmail.com>
We should include partition/offset information when we raise exceptions during producer state validation. This saves a lot of the discovery work to figure out where the problem occurred. This patch also includes a new test case to verify additional coordinator fencing cases.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Most of the time, the group coordinator runs on broker 1. Occasionally the group coordinator will be placed on broker 2. If that's the case, the loop starting at line 320 have no chance to check and update `kickedOutConsumerIdx`. A quick fix is to safely do another round of loop to ensure `kickedOutConsumerIdx` always be checked after the last broker restart.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
* Describe/Delete/Reset offsets on multiple consumer groups at a time (including each group by repeating `--group` parameter)
* Describe/Delete/Reset offsets on ALL consumer groups at a time (add new `--all-groups` option similar to `--all-topics`)
* Reset plan CSV file generation reworked: structure updated to support multiple consumer groups and make sure that CSV file generation is done properly since there are no restrictions on consumer group names and symbols like commas and quotes are allowed.
* Extending data output table format by adding `GROUP` column for all `--describe` queries
This patch updates the InitProducerId request API to use the generated sources. It also fixes a small bug in the DescribeAclsRequest class where we were using the wrong api key.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Colin McCabe <cmccabe@apache.org>
ConsumerBounceTest redundantly executes a couple test cases which were included in the abstract class `BaseConsumerTest`. We should try to keep a cleaner separation of testing logic and utility logic so that this does not happen (the build time is long enough without doing unnecessary work). This PR moves the cluster initialization and consumer utilities out of BaseConsumerTest and into a new class AbstractConsumerTest. We then let ConsumerBounceTest extend AbstractConsumerTest.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This PR should help address the flakiness in the ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup test (https://issues.apache.org/jira/browse/KAFKA-7965). I tested this locally and have verified it significantly reduces flakiness - 25/25 tests now pass. Running the test 25 times in trunk, I'd get `18/25` passes.
It does so by reusing the less-flaky consumer integration testing functionality inside `BaseConsumerTest`. Most notably, the test now makes use of the `ConsumerAssignmentPoller` class - each consumer now polls non-stop rather than the more batch-oriented polling we had in `ConsumerBounceTest#waitForRebalance()`.
Reviewers: Jason Gustafson <jason@confluent.io>
Ensure that modification time is checked against the file used to create the SSLContext that is in-use so that SSLContext is updated whenever file is modified and a config update request is received.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
This change adds waits for metadata updates after killing the broker in order to make the tests more stable.
Author: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6505 from viktorsomogyi/flaky-min-isr-test
A broken can have more than one instance of ZooKeeperClient. For example, SimpleAclAuthorizer creates a separate ZooKeeperClient instance when configured.
This commit makes it possible to optionally specify the name for the ZooKeeperClient instance. The name is specified only for a broker's ZooKeeperClient instances, but not for commands' and tests'.
Reviewers: Jun Rao <junrao@gmail.com>
`map` was being used to convert `Iterable[Integer]` to `Iterable[Int`]. That
operation represented 11% of total CPU time measured under load for us.
We also expect a positive impact on GC.
Reviewers: Joel Koshy <jjkoshy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
- The flaky failure is caused by the fact that the main thread sometimes issues DescribeConsumerGroup request before the consumer assignment takes effect. Added a latch to make sure such situation is not going to happen.
Author: huxihx <huxi_2b@hotmail.com>
Author: huxi <huxi_2b@hotmail.com>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6441 from huxihx/KAFKA-8098
This patch adds additional DEBUG statements in AbstractIndex.scala, OffsetIndex.scala, and TimeIndex.scala. It also changes the logging on append from DEBUG to TRACE to make DEBUG logging less disruptive, and it ensures that exceptions raised from index classes include file/offset information.
Reviewers: Jason Gustafson <jason@confluent.io>
Shutdown session expiry thread prior to closing ZooKeeper client to ensure that new clients are not created by the expiry thread and left active when returning from ZooKeeperClient.close().
Reviewers: Ismael Juma <ismael@juma.me.uk>
We verify that ZK clients are closed in tests since these can affect subsequent tests and that makes it hard to debug test failures. But because of changes to ZooKeeper client, we were checking the wrong thread name. The thread name used now is <creatorThreadName>-EventThread where creatorThreadName varies depending on the test. Fixed ZooKeeperTestHarness to check this format and fixed tests which were leaving ZK clients behind. Also added a test to make sure we can detect changes to the thread name when we update ZK clients in future.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>
https://issues.apache.org/jira/browse/KAFKA-7813
Running the JMX tool without --object-name parameter, results in a NullPointerException.
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6139 from huxihx/KAFKA-7813
The use of consumer.poll() made the test flaky since in some cases, it doesn't wait for coordinator connection.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Adds a new listener config `max.connections` to limit the number of active connections on each listener. The config may be prefixed with listener prefix. This limit may be dynamically reconfigured without restarting the broker.
This is one of the PRs for KIP-402 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-402%3A+Improve+fairness+in+SocketServer+processors). Note that this is currently built on top of PR #6022
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Gwen Shapira <cshapi@gmail.com>
Closes#6034 from rajinisivaram/KAFKA-7730-max-connections
Just a minor cleanup to use Java 8 lambdas vs anonymous classes in this test.
I ran all tests in the streams test suite
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
When unclean leader election is enabled dynamically on brokers, we notify controller of the update before updating KafkaConfig. When processing this event, controller's decision to elect unclean leaders is based on the current KafkaConfig, so there is a small timing window when the controller may not elect unclean leader because KafkaConfig of the server was not yet updated. The commit fixes this timing window by using the existing BrokerReconfigurable interface used by other classes which rely on the current value of KafkaConfig.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
To keep align with the way it handles the offset topic, TopicCommand should not be able to alter transaction topic partition count.
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6109 from huxihx/KAFKA-7801
DynamicBrokerReconfigurationTest.testAddRemoveSaslListeners removes a listener, waits for the config to be propagated to all brokers and then validates that connections to the removed listener fail. But there is a small timing window between config update and Processor shutdown. Before validating that connections to a removed listener fail, this commit waits for all metrics of the removed listener to be deleted, ensuring that the Processors of the listener have been shutdown.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
*Handle OptionException while parsing options when using console producer and print usage before die.*
Author: Suman BN <sumannewton@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6386 from sumannewton/console-producer-parse-printusage
After the 2.1 release, if the broker hasn't been upgrade to the latest inter-broker protocol version, the committed offsets stored in the __consumer_offset topic will get cleaned up way earlier than it should be when the offsets are loaded back from the __consumer_offset topic in GroupCoordinator, which will happen during leadership transition or after broker bounce. This patch fixes the bug by setting expireTimestamp to None if it is the default value after loading v1 offset records from __consumer_offsets.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Test currently checks that there were at least 5 polls when 5 connections are established with connectionQueueSize=1. But we could be doing the check just after the 5th connection before the 5th poll, so updated the check to verify that there were at least 4 polls.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Metadata may be updated from the background thread, so we need to protect access to SubscriptionState. This patch restructures the metadata handling so that we only check pattern subscriptions in the foreground. Additionally, it improves the following:
1. SubscriptionState is now the source of truth for the topics that will be fetched. We had a lot of messy logic previously to try and keep the the topic set in Metadata consistent with the subscription, so this simplifies the logic.
2. The metadata needs for the producer and consumer are quite different, so it made sense to separate the custom logic into separate extensions of Metadata. For example, only the producer requires topic expiration.
3. We've always had an edge case in which a metadata change with an inflight request may cause us to effectively miss an expected update. This patch implements a separate version inside Metadata which is bumped when the needed topics changes.
4. This patch removes the MetadataListener, which was the cause of https://issues.apache.org/jira/browse/KAFKA-7764.
Reviewers: David Arthur <mumrah@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Added additional description for the "time" parameter for GetOffsetShell which adds " No offset is returned if timestamp provided is greater than recently committed record timestamp." in the description.
Author: KartikVK <karthikkalaghatgi123@gmail.com>
Reviewers: huxi <huxi_2b@hotmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6357 from Kartikvk1996/kartik-branch
Disable forceSync in EmbeddedZookeeper.
Increase ZK tick to allow longer maxSessionTimeout in tests.
Increase ZK client session timeout in tests.
Handle transient ZK session expiration exception in test utils for createTopic.
Author: Jun Rao <junrao@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#6354 from junrao/KAFKA-8018