* Fixed bug in Struct.equals where we returned prematurely and added tests
* Update RequestResponseTest to check that `equals` and `hashCode` of
the struct is the same after serialization/deserialization only when possible.
* Use `Objects.equals` and `Long.hashCode` to simplify code
* Removed deprecated usages of `JUnitTestSuite`
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Implements KIP-361 to provide a consumer configuration to specify whether subscribing or assigning a non-existent topic would result in it being automatically created or not.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch refactors the implementation of the --version option and moves it into the default command options. This has the benefit of automatically including it in the usage output of the command line tools. Several tools had to be manually updated because they did not use the common options.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
This patch ensures that the log end offset of each partition is initialized consistently with the checkpointed log start offset.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
During a partial message format upgrade, it is possible for the message format to flap between new and old versions. If we detect that data appended to the log is on an old format, we can clear the leader epoch cache so that we revert to truncation by high watermark. Once the upgrade completes and all replicas are on the same format, we will append to the epoch cache as usual. Note this is related to KAFKA-7897, which handles message format downgrades through configuration.
Reviewers: Jun Rao <junrao@gmail.com>
This patch fixes a bug in the sending of transactional requests. We need to call `KafkaClient.send` with an updated current time. Failing to do so can result in an `IllegalStateExcepton` which leaves the producer effectively dead since the in-flight correlation id has been set, but no request has been sent. To avoid the same problem in the future, we update the in flight correlationId only after sending the request.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This is the first diff for the implementation of JoinGroup logic for static membership. The goal of this diff contains:
* Add group.instance.id to be unique identifier for consumer instances, provided by end user;
Modify group coordinator to accept JoinGroupRequest with/without static membership, refactor the logic for readability and code reusability.
* Add client side support for incorporating static membership changes, including new config for group.instance.id, apply stream thread client id by default, and new join group exception handling.
* Increase max session timeout to 30 min for more user flexibility if they are inclined to tolerate partial unavailability than burdening rebalance.
* Unit tests for each module changes, especially on the group coordinator logic. Crossing the possibilities like:
6.1 Dynamic/Static member
6.2 Known/Unknown member id
6.3 Group stable/unstable
6.4 Leader/Follower
The rest of the 345 change will be broken down to 4 separate diffs:
* Avoid kicking out members through rebalance.timeout, only do the kick out through session timeout.
* Changes around LeaveGroup logic, including version bumping, broker logic, client logic, etc.
* Admin client changes to add ability to batch remove static members
* Deprecate group.initial.rebalance.delay
Reviewers: Liquan Pei <liquanpei@gmail.com>, Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The controller maintains state across `ControllerContext`, `PartitionStateMachine`, `ReplicaStateMachine`, and `TopicDeletionManager`. None of this state is actually isolated from the rest. For example, topics undergoing deletion are intertwined with the partition and replica states. As a consequence of this, each of these components tends to be dependent on all the rest, which makes testing and reasoning about the system difficult. This is a first step toward untangling all the state. This patch moves it all into `ControllerContext` and removes many of the circular dependencies. So far, this is mostly a direct translation, but in the future we can add additional validation in `ControllerContext` to make sure that state is maintained consistently.
Additionally, this patch adds several mock objects to enable easier testing: `MockReplicaStateMachine` and `MockPartitionStateMachine`. These have simplified logic for updating the current state. This is used to create some new test cases for `TopicDeletionManager`.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jun Rao <junrao@gmail.com>
We suspect the problem might be a race condition after broker startup where the consumer has yet to find the coordinator and rebalance. The fix here rolls all the brokers first and then waits for the expected exception.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Gwen Shapira
Closes#6608 from hachikuji/KAFKA-7965
This patch fixes a bug in the append logic which can cause duplicate offsets to be appended to the log when the append to the transaction index fails. Rather than incrementing the log end offset after the index append, we do it immediately after the records are written to the log. If the index append later fails, we do two things:
1) We ensure that the last stable offset cannot advance. This guarantees that the aborted data will not be returned to the user until the transaction index contains the corresponding entry.
2) We skip updating the end offset of the producer state. When recovering the log, we will have to reprocess the log and write the index entries.
Reviewers: Jun Rao <junrao@gmail.com>
We should include partition/offset information when we raise exceptions during producer state validation. This saves a lot of the discovery work to figure out where the problem occurred. This patch also includes a new test case to verify additional coordinator fencing cases.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Most of the time, the group coordinator runs on broker 1. Occasionally the group coordinator will be placed on broker 2. If that's the case, the loop starting at line 320 have no chance to check and update `kickedOutConsumerIdx`. A quick fix is to safely do another round of loop to ensure `kickedOutConsumerIdx` always be checked after the last broker restart.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
* Describe/Delete/Reset offsets on multiple consumer groups at a time (including each group by repeating `--group` parameter)
* Describe/Delete/Reset offsets on ALL consumer groups at a time (add new `--all-groups` option similar to `--all-topics`)
* Reset plan CSV file generation reworked: structure updated to support multiple consumer groups and make sure that CSV file generation is done properly since there are no restrictions on consumer group names and symbols like commas and quotes are allowed.
* Extending data output table format by adding `GROUP` column for all `--describe` queries
This patch updates the InitProducerId request API to use the generated sources. It also fixes a small bug in the DescribeAclsRequest class where we were using the wrong api key.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Colin McCabe <cmccabe@apache.org>
ConsumerBounceTest redundantly executes a couple test cases which were included in the abstract class `BaseConsumerTest`. We should try to keep a cleaner separation of testing logic and utility logic so that this does not happen (the build time is long enough without doing unnecessary work). This PR moves the cluster initialization and consumer utilities out of BaseConsumerTest and into a new class AbstractConsumerTest. We then let ConsumerBounceTest extend AbstractConsumerTest.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This PR should help address the flakiness in the ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup test (https://issues.apache.org/jira/browse/KAFKA-7965). I tested this locally and have verified it significantly reduces flakiness - 25/25 tests now pass. Running the test 25 times in trunk, I'd get `18/25` passes.
It does so by reusing the less-flaky consumer integration testing functionality inside `BaseConsumerTest`. Most notably, the test now makes use of the `ConsumerAssignmentPoller` class - each consumer now polls non-stop rather than the more batch-oriented polling we had in `ConsumerBounceTest#waitForRebalance()`.
Reviewers: Jason Gustafson <jason@confluent.io>
Ensure that modification time is checked against the file used to create the SSLContext that is in-use so that SSLContext is updated whenever file is modified and a config update request is received.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
This change adds waits for metadata updates after killing the broker in order to make the tests more stable.
Author: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6505 from viktorsomogyi/flaky-min-isr-test
A broken can have more than one instance of ZooKeeperClient. For example, SimpleAclAuthorizer creates a separate ZooKeeperClient instance when configured.
This commit makes it possible to optionally specify the name for the ZooKeeperClient instance. The name is specified only for a broker's ZooKeeperClient instances, but not for commands' and tests'.
Reviewers: Jun Rao <junrao@gmail.com>
`map` was being used to convert `Iterable[Integer]` to `Iterable[Int`]. That
operation represented 11% of total CPU time measured under load for us.
We also expect a positive impact on GC.
Reviewers: Joel Koshy <jjkoshy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
- The flaky failure is caused by the fact that the main thread sometimes issues DescribeConsumerGroup request before the consumer assignment takes effect. Added a latch to make sure such situation is not going to happen.
Author: huxihx <huxi_2b@hotmail.com>
Author: huxi <huxi_2b@hotmail.com>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6441 from huxihx/KAFKA-8098
This patch adds additional DEBUG statements in AbstractIndex.scala, OffsetIndex.scala, and TimeIndex.scala. It also changes the logging on append from DEBUG to TRACE to make DEBUG logging less disruptive, and it ensures that exceptions raised from index classes include file/offset information.
Reviewers: Jason Gustafson <jason@confluent.io>
Shutdown session expiry thread prior to closing ZooKeeper client to ensure that new clients are not created by the expiry thread and left active when returning from ZooKeeperClient.close().
Reviewers: Ismael Juma <ismael@juma.me.uk>
We verify that ZK clients are closed in tests since these can affect subsequent tests and that makes it hard to debug test failures. But because of changes to ZooKeeper client, we were checking the wrong thread name. The thread name used now is <creatorThreadName>-EventThread where creatorThreadName varies depending on the test. Fixed ZooKeeperTestHarness to check this format and fixed tests which were leaving ZK clients behind. Also added a test to make sure we can detect changes to the thread name when we update ZK clients in future.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>
https://issues.apache.org/jira/browse/KAFKA-7813
Running the JMX tool without --object-name parameter, results in a NullPointerException.
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6139 from huxihx/KAFKA-7813