When the consumer's fetch request is throttled by the KIP-219 mechanism,
it receives an empty fetch response. The consumer should not log this
as an error.
Reviewers: Jason Gustafson <jason@confluent.io>
Since 2.3, there are 5 new APIs: ElectPreferredLeaders, IncrementalAlterConfigs, AlterPartitionReassignments, DescribePartitionReassignments and OffsetDelete
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
This patch fixes a brittle expectation on the `toString` implementation coming from `Set`. This was failing on jenkins with the following error:
```
java.lang.AssertionError: expected:<Some(producerId:1334,producerEpoch:0,state=Ongoing,partitions=Set(topic-0),txnLastUpdateTimestamp=0,txnTimeoutMs=1000)> but was:<Some(producerId:1334,producerEpoch:0,state=Ongoing,partitions=HashSet(topic-0),txnLastUpdateTimestamp=0,txnTimeoutMs=1000)>
```
Instead we convert the collection to a string directly.
Reviewers: Boyang Chen <boyang@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
* Warnings
1. `kafka/core/src/test/scala/integration/kafka/server/DelayedFetchTest.scala:110: local val partition in method testReplicaNotAvailable is never used`
2. `kafka/core/src/test/scala/unit/kafka/admin/ConfigCommandTest.scala:527: local val alterResourceName in method verifyAlterBrokerConfig is never used`
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Fixes an NPE when UserData in a member's subscription is null and adds similar checks for transaction log parser. Also modifies the output logic so that we show the keys of tombstones for both group and transaction metadata.
Reviewers: David Arthur <mumrah@gmail.com>
This patch removes a spammy log message in the controller which is printed every time the leader imbalance ratio is checked. It is unhelpful because preferred leaders are generally deterministic and is spammy because it includes _every_ partition in the cluster.
Reviewers: Jonathan Santilli <jonathansantilli@users.noreply.github.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
In some system tests a Streams app is started and then prints a message to stdout, which the system test waits for to confirm the node has successfully been brought up. It then greps for certain log messages in a retriable loop.
But waiting on the Streams app to start/print to stdout does not mean the log file has been created yet, so the grep may return an error. Although this occurs in a retriable loop it is assumed that grep will not fail, and the result is piped to wc and then blindly converted to an int in the python function, which fails since the error message is a string (throws ValueError)
We should catch the ValueError and return a 0 so it can try again rather than immediately crash
Reviewers: Bill Bejeck <bbejeck@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
The upgrade system test correctly rolls by upgrading the broker and
leaving the IBP, and then rolling again with the latest IBP version.
Unfortunately, this is not sufficient to pick up many problems in our IBP
gating as we charge through the rolls and after the second roll all of
the brokers will rejoin the ISR and the test will be treated as a
success.
This test adds two new checks:
1. We wait for the ISR to stabilize for all partitions. This is best
practice during rolls, and is enough to tell us if a broker hasn't
rejoined after each roll.
2. We check the broker logs for some common protocol errors. This is a
fail safe as it's possible for the test to be successful even if some
protocols are incompatible and the ISR is rejoined.
Reviewers: Nikhil Bhatia <nikhil@confluent.io>, Jason Gustafson <jason@confluent.io>
The Deserializer interface has two methods, one that gives access to the headers and one that does not. ConsoleConsumer.scala only calls the latter method. It would be nice if it were to call the default method that provides header access, so that custom serde that depends on headers becomes possible.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The v3 JoinGroup logic does not properly complete the initial heartbeat for new members, which then expires after the static 5 minute timeout if the member does not rejoin. The core issue is in the `shouldKeepAlive` method, which returns false when it should return true because of an inconsistent timeout check.
Reviewers: Jason Gustafson <jason@confluent.io>
Similar to KAFKA-5258 which move all partition and replica state transition rules into their states, we move the group state transition rules into their states.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Allow transaction metadata to be reloaded, even if it already exists as of a previous epoch. This helps with cases where a previous become-follower transition failed to unload corresponding metadata.
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
Allow ConfigCommand to handle more operations without using direct ZooKeeper access, as described by KIP-543. Also allow specifying entity type and name via a single flag-- again, as the KIP describes.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Magnus Edenhill <magnus@edenhill.se>, Colin Patrick McCabe <cmccabe@apache.org>
Closes#7818 from omkreddy/zk-note
Adds support for TLSv1.3 in SslTransportLayer. Note that TLSv1.3 is only enabled from Java 11 onwards, so we test the code only when running with Java11 and above.
Tests run on this PR:
- SslTransportLayerTest: This covers testing of our SslTransportLayer and all tests are run with TLSv1.3 when running with Java 11. These tests are also run with TLSv1.2 for all Java versions.
- SslFactoryTest: Also run with TLSv1.3 on Java 11 onwards in addition to TLSv1.2 for all Java versions.
- SslEndToEndAuthorizationTest - Run only with TLSv1.3 on Java 11 onwards and only with TLSv1.2 on earlier Java versions. We have other versions of this test which use SSL that continue to be with TLSv1.2 on Java 11 to avoid reducing test coverage for TLSv1.2
Additional testing for done for TLSv1.3:
- Most tests that use SSL use TestSslUtils.DEFAULT_TLS_PROTOCOL_FOR_TESTS which is set to TLSv1.2. I have run all clients and core tests with DEFAULT_TLS_PROTOCOL_FOR_TESTS=TLSv1.3 with Java 11.
- Ran a few system tests locally with TKSv1.3
Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>
Fixed a small typo in the main README file ("They are also are printed" --> "They are also printed").
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
`KafkaAdminClientTest` contains many code repetitions which could be removed. This PR removes most of the boiler plate code.
Reviewers: Jason Gustafson <jason@confluent.io>
The replication throttle in `testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress` was not setting the quota on the partition correctly, so the test case was not working as expected. This patch fixes the problem and also fixes a bug in `waitForTopicCreated` which caused the function to always wait for the full timeout.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This is part1 of a series of PRs for task management cleanup:
1. Primarily cleanup MockRecordCollectors: remove unnecessary anonymous inheritance but just consolidate on the NoOpRecordCollector -> renamed to MockRecordCollector. Most relevant changes are unit tests that would be relying on this MockRecordCollector.
2. Let StandbyContextImpl#recordCollector() to return null instead of returning a no-op collector, since in standby tasks we should ALWAYS bypass the logging logic and only use the inner store for restoreBatch. Returning null helps us to realize this assertion failed as NPE as early as possible whereas a no-op collector just hides the bug.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The create topic api do not work with older version of the api. It can be reproduced by trying to create a topic with kafka-topics.sh from 2.3. It timeouts.
b94c7f4 has added a check which raises an exception if a field has been set to a non-default value unless the field is marked as "ignorable".
The fields added in the version 5 of the response are always set regardless of the version used by the client. If an older version is used, an exception is thrown during the serialization because the fields have non-default values. We should either not set the fields for older versions in the api layer or mark them as ignorable. I have chosen the later in this case because it looks cleaner.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Mickael Maison <mickael.maison@gmail.com>
Collect and expose the KIP-511 client name and version information the clients now provide to the server as part of ApiVersionsRequests. Also refactor how we handle selector metrics by creating a ChannelMetadataRegistry class. This will make it easier for various parts of the networking code to modify channel metrics.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Fixes case where multiple children merged from a key-changing node causes an NPE.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>
Provide custom configs to ChannelBuilders in addition to parsed `values()`
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Andrew Choi <andchoi@linkedin.com
When sun.security.jgss.native=true, we currently create a new server credential for every new client connection and add to the private credential set of the server's Subject. This is expensive and can result in an unbounded number of private credentials in the Subject used by the broker. The PR creates a credential immediately after login so that a single credential can be reused.
Reviewers: Guozhang Wang <wangguoz@gmail.com