Also document the parameters of `ReplicaAssignment` in `ControllerContext`.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
This change updates the CreatePartitions request and response api objects
to use the generated protocol classes.
Reviewers: Jason Gustafson <jason@confluent.io>
Since `retry` expects a `by name` parameter, passing a nullary function
causes assertion failures to be ignored. I verified this by introducing
a failing assertion before/after the fix.
I checked other `retry` usages and they looked correct.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
This PR fixes the regression introduced in 2.4 from 2 refactoring PRs:
#7249#7419
The bug was introduced by having a logical path leading numPartitionsCandidate to be 0, which is assigned to numPartitions and later being checked by setNumPartitions. In the subsequent check we will throw illegal argument if the numPartitions is 0.
This bug is both impacting new 2.4 application and upgrades to 2.4 in certain types of topology. The example in original JIRA was imported as a new integration test to guard against such regression. We also verify that without the bug fix application will still fail by running this integration test.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The current coordinator loading logic causes an infinite loop when there is a gap between the last record in the log and the log end offset. This is possible because of compaction if the active segment is empty. The patch fixes the problem by breaking from the loading loop when a read from the log returns no additional data.
Reviewers: Jason Gustafson <jason@confluent.io>
The close method calls `Thread.join` to wait for AdminClient thread to
die, but if the close is called in the api completion callback, `join`
waits forever, as the thread calling `join` is same as the thread it
wants to wait to die.
This change checks for this condition and skips the join. The thread
will then return to main loop, where it will check for this condition
and exit.
Reviewers: Jason Gustafson <jason@confluent.io>
Added re-direct for new page and added link to Developer Guide page
Reviewers: Matthias J. Sax <mjsax@apache.org>, Sophie Blee-Goldman <sophie@confluent.io>,
The log context is useful when debugging applications which have multiple clients. This patch propagates the context to the channel builders and the SASL authenticator.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
* Adjust build and documentation.
* Use lambda syntax for SAM types in `core`, `streams-scala` and
`connect-runtime` modules.
* Remove `runnable` and `newThread` from `CoreUtils` as lambda
syntax for SAM types make them unnecessary.
* Remove stale comment in `FunctionsCompatConversions`,
`KGroupedStream`, `KGroupedTable' and `KStream` about Scala 2.11,
the conversions are needed for Scala 2.12 too.
* Deprecate `org.apache.kafka.streams.scala.kstream.Suppressed`
and use `org.apache.kafka.streams.kstream.Suppressed` instead.
* Use `Admin.create` instead of `AdminClient.create`. Static methods
in Java interfaces can be invoked since Scala 2.12. I noticed that
MirrorMaker 2 uses `AdminClient.create`, but I did not change them
as Connectors have restrictions on newer client APIs.
* Improve efficiency in a few `Gauge` implementations by avoiding
unnecessary intermediate collections.
* Remove pointless `Option.apply` in `ZookeeperClient`
`SessionState` metric.
* Fix unused import/variable and other compiler warnings.
* Reduce visibility of some vals/defs.
Reviewers: Manikumar Reddy <manikumar@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Gwen Shapira <gwen@confluent.io>
When the consumer's fetch request is throttled by the KIP-219 mechanism,
it receives an empty fetch response. The consumer should not log this
as an error.
Reviewers: Jason Gustafson <jason@confluent.io>
Since 2.3, there are 5 new APIs: ElectPreferredLeaders, IncrementalAlterConfigs, AlterPartitionReassignments, DescribePartitionReassignments and OffsetDelete
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
This patch fixes a brittle expectation on the `toString` implementation coming from `Set`. This was failing on jenkins with the following error:
```
java.lang.AssertionError: expected:<Some(producerId:1334,producerEpoch:0,state=Ongoing,partitions=Set(topic-0),txnLastUpdateTimestamp=0,txnTimeoutMs=1000)> but was:<Some(producerId:1334,producerEpoch:0,state=Ongoing,partitions=HashSet(topic-0),txnLastUpdateTimestamp=0,txnTimeoutMs=1000)>
```
Instead we convert the collection to a string directly.
Reviewers: Boyang Chen <boyang@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
* Warnings
1. `kafka/core/src/test/scala/integration/kafka/server/DelayedFetchTest.scala:110: local val partition in method testReplicaNotAvailable is never used`
2. `kafka/core/src/test/scala/unit/kafka/admin/ConfigCommandTest.scala:527: local val alterResourceName in method verifyAlterBrokerConfig is never used`
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Fixes an NPE when UserData in a member's subscription is null and adds similar checks for transaction log parser. Also modifies the output logic so that we show the keys of tombstones for both group and transaction metadata.
Reviewers: David Arthur <mumrah@gmail.com>
This patch removes a spammy log message in the controller which is printed every time the leader imbalance ratio is checked. It is unhelpful because preferred leaders are generally deterministic and is spammy because it includes _every_ partition in the cluster.
Reviewers: Jonathan Santilli <jonathansantilli@users.noreply.github.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
In some system tests a Streams app is started and then prints a message to stdout, which the system test waits for to confirm the node has successfully been brought up. It then greps for certain log messages in a retriable loop.
But waiting on the Streams app to start/print to stdout does not mean the log file has been created yet, so the grep may return an error. Although this occurs in a retriable loop it is assumed that grep will not fail, and the result is piped to wc and then blindly converted to an int in the python function, which fails since the error message is a string (throws ValueError)
We should catch the ValueError and return a 0 so it can try again rather than immediately crash
Reviewers: Bill Bejeck <bbejeck@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
The upgrade system test correctly rolls by upgrading the broker and
leaving the IBP, and then rolling again with the latest IBP version.
Unfortunately, this is not sufficient to pick up many problems in our IBP
gating as we charge through the rolls and after the second roll all of
the brokers will rejoin the ISR and the test will be treated as a
success.
This test adds two new checks:
1. We wait for the ISR to stabilize for all partitions. This is best
practice during rolls, and is enough to tell us if a broker hasn't
rejoined after each roll.
2. We check the broker logs for some common protocol errors. This is a
fail safe as it's possible for the test to be successful even if some
protocols are incompatible and the ISR is rejoined.
Reviewers: Nikhil Bhatia <nikhil@confluent.io>, Jason Gustafson <jason@confluent.io>
The Deserializer interface has two methods, one that gives access to the headers and one that does not. ConsoleConsumer.scala only calls the latter method. It would be nice if it were to call the default method that provides header access, so that custom serde that depends on headers becomes possible.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The v3 JoinGroup logic does not properly complete the initial heartbeat for new members, which then expires after the static 5 minute timeout if the member does not rejoin. The core issue is in the `shouldKeepAlive` method, which returns false when it should return true because of an inconsistent timeout check.
Reviewers: Jason Gustafson <jason@confluent.io>
Similar to KAFKA-5258 which move all partition and replica state transition rules into their states, we move the group state transition rules into their states.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Allow transaction metadata to be reloaded, even if it already exists as of a previous epoch. This helps with cases where a previous become-follower transition failed to unload corresponding metadata.
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
Allow ConfigCommand to handle more operations without using direct ZooKeeper access, as described by KIP-543. Also allow specifying entity type and name via a single flag-- again, as the KIP describes.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Magnus Edenhill <magnus@edenhill.se>, Colin Patrick McCabe <cmccabe@apache.org>
Closes#7818 from omkreddy/zk-note
Adds support for TLSv1.3 in SslTransportLayer. Note that TLSv1.3 is only enabled from Java 11 onwards, so we test the code only when running with Java11 and above.
Tests run on this PR:
- SslTransportLayerTest: This covers testing of our SslTransportLayer and all tests are run with TLSv1.3 when running with Java 11. These tests are also run with TLSv1.2 for all Java versions.
- SslFactoryTest: Also run with TLSv1.3 on Java 11 onwards in addition to TLSv1.2 for all Java versions.
- SslEndToEndAuthorizationTest - Run only with TLSv1.3 on Java 11 onwards and only with TLSv1.2 on earlier Java versions. We have other versions of this test which use SSL that continue to be with TLSv1.2 on Java 11 to avoid reducing test coverage for TLSv1.2
Additional testing for done for TLSv1.3:
- Most tests that use SSL use TestSslUtils.DEFAULT_TLS_PROTOCOL_FOR_TESTS which is set to TLSv1.2. I have run all clients and core tests with DEFAULT_TLS_PROTOCOL_FOR_TESTS=TLSv1.3 with Java 11.
- Ran a few system tests locally with TKSv1.3
Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>
Fixed a small typo in the main README file ("They are also are printed" --> "They are also printed").
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
`KafkaAdminClientTest` contains many code repetitions which could be removed. This PR removes most of the boiler plate code.
Reviewers: Jason Gustafson <jason@confluent.io>
The replication throttle in `testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress` was not setting the quota on the partition correctly, so the test case was not working as expected. This patch fixes the problem and also fixes a bug in `waitForTopicCreated` which caused the function to always wait for the full timeout.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This is part1 of a series of PRs for task management cleanup:
1. Primarily cleanup MockRecordCollectors: remove unnecessary anonymous inheritance but just consolidate on the NoOpRecordCollector -> renamed to MockRecordCollector. Most relevant changes are unit tests that would be relying on this MockRecordCollector.
2. Let StandbyContextImpl#recordCollector() to return null instead of returning a no-op collector, since in standby tasks we should ALWAYS bypass the logging logic and only use the inner store for restoreBatch. Returning null helps us to realize this assertion failed as NPE as early as possible whereas a no-op collector just hides the bug.
Reviewers: Guozhang Wang <wangguoz@gmail.com>