This PR changes the TxnOffsetCommit protocol to auto-generated types, and add more unit test coverage to the plain OffsetCommit protocol.
Reviewers: Jason Gustafson <jason@confluent.io>
The `CompletedFetch` and `PartitionRecords` types are somewhat redundant. This PR consolidates the two types.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch ensures that relevant system configurations are included in exception messages when zk security validation fails.
Reviewers: Vikas Singh <soondenana@users.noreply.github.com>, José Armando García Sancio <jsancio@users.noreply.github.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
New Java Authorizer API and a new out-of-the-box authorizer (AclAuthorizer) that implements the new interface.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Splits the existing StickyAssignor logic into an AbstractStickyAssignor class, which is extended by the existing (eager) StickyAssignor and by the new CooperativeStickyAssignor which supports incremental cooperative rebalancing.
There is no actual change to the logic -- most methods from StickyAssignor were moved to AbstractStickyAssignor to be shared with CooperativeStickyAssignor, and the abstract MemberData memberData(Subscription) method converts the Subscription to the embedded list of owned partitions for each assignor.
The "generation" logic is left in, however this is always Optional.empty() for the CooperativeStickyAssignor as onPartitionsLost should always be called when a generation is missed.
Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Author: asutosh936 <asutosh.pandya@hotmail.com>
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Vahid Hashemian <vahid.hashemian@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#7141 from asutosh936/KAFKA-8698
This patch fixes a bug in the handling of MESSAGE_TOO_LARGE errors. The large batch is split, the smaller batches are re-added to the accumulator, and the batch is deallocated, but it was not removed from the list of in-flight batches. When the batch was eventually expired from the in-flight batches, the producer would try to deallocate it a second time, causing an error. This patch changes the behavior to correctly remove the batch from the list of in-flight requests.
Reviewers: Luke Stephenson, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Refactors the DescribeDelegationToken to use the generated RPC classes.
Author: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Author: Viktor Somogyi <viktorsomogyi@gmail.com>
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#7154 from viktorsomogyi/refactor-describe-dt
Credit to @lbradstreet for profiling the producer with a large number of partitions.
Cache `topicMetadata`, `brokers` and `controller` in the `MetadataResponse`
the first time it's needed avoid unnecessary recomputation. We were previously
computing`brokersMap` 4 times per partition in one code path that was invoked from
multiple places. This is a regression introduced via a42f16f980 and first released
in 2.3.0.
The `Cluster` constructor became significantly more allocation heavy due to
2c44e77e2f20, first released in 2.2.0. Replaced `merge` calls with more verbose,
but more efficient code. Added a test to verify that the returned collections are
unmodifiable.
Add `topicAuthorizedOperations` and `clusterAuthorizedOperations` to
`MetadataResponse` and remove `data()` method.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Lucas Bradstreet <lucasbradstreet@gmail.com>, Colin P. McCabe <cmccabe@confluent.io>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Justine Olshan <jolshan@confluent.io>
Add the AlterPartitionReassignments and ListPartitionReassignments APIs. Also remove an unused methodlength suppression for KafkaAdminClient.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>
The RecordAccumulator ready calls `leaderFor` unnecessarily when the ProducerBatch
queue is empty. When producing to many partitions, the queue is often empty and the
`leaderFor` call can be expensive in comparison. Remove the unnecessary call.
Reviewers: Ismael Juma <ismael@juma.me.uk>
When calling readLogToEnd(), the KafkaBasedLog worker thread should catch TimeoutException and log a warning, which can occur if brokers are unavailable, otherwise the worker thread terminates.
Includes an enhancement to MockConsumer that allows simulating exceptions not just when polling but also when querying for offsets, which is necessary for testing the fix.
Author: Paul Whalen <pgwhalen@gmail.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, Arjun Satish <arjun@confluent.io>, Ryanne Dolan <ryannedolan@gmail.com>
Since `Metrics` was constructed with `enableExpiration=false`, this was
not a source of flakiness given the current implementation. This could
change in the future, so good to follow the class contract.
Included a few clean-ups with regards to redundant casts and type parameters
as well as usage of try with resources for inline usage of `Metrics`.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Avoids calling both `containsKey` and `get` from isMuted, when only a single get is necessary.
Also avoid calling `remove` unless necessary. This could be a reduction of map operations
from 3 to 1.
isMuted showed up as a hotspot in profiling when using the producer with high numbers of partitions.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Connector validation fails if an alias is used for the converter since the validation for that is done via `ConfigDef.validateAll(...)`, which in turn invokes `Class.forName(...)` on the alias. Even though the class is successfully loaded by the DelegatingClassLoader, some Java implementations will refuse to return a class from `Class.forName(...)` whose name differs from the argument provided.
This commit alters `ConfigDef.parseType(...)` to first invoke `ClassLoader.loadClass(...)` on the class using our class loader in order to get a handle on the actual class object to be loaded, then invoke `Class.forName(...)` with the fully-qualified class name of the to-be-loaded class and return the result. The invocation of `Class.forName(...)` is necessary in order to allow static initialization to take place; simply calling `ClassLoader.loadClass(...)` is insufficient.
Also corrected a unit test that relied upon the old behavior.
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com>
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Viktor Somogyi <viktorsomogyi@gmail.com>
Closes#7038 from mimaison/KAFKA-8598
1. Add onPartitionsLost into the RebalanceListener, which will be triggered when the consumer found that the generation is reset due to fatal errors in response handling.
2. Semantical behavior change: with COOPERATIVE protocol, if the revoked / lost partitions are empty, do not trigger the corresponding callback at all. For added partitions though, even if it is empty we would still trigger the callback as a way to notify the rebalance event; with EAGER protocol, revoked / assigned callbacks are always triggered.
The ordering of the callback would be the following:
a. Callback onPartitionsRevoked / onPartitionsLost triggered.
b. Update the assignment (both revoked and added).
c. Callback onPartitionsAssigned triggered.
In this way we are assured that users can still access the partitions being revoked, whereas they can also access the partitions being added.
3. Semantical behavior change (KAFKA-4600): if the rebalance listener throws an exception, pass it along all the way to the consumer.poll caller, but still completes the rest of the actions. Also, the newly assigned partitions list does not gets affected with exception thrown since it is just for notifying the users.
4. Semantical behavior change: the ConsumerCoordinator would not try to modify assignor's returned assignments, instead it will validate that assignments and set the error code accordingly: if there are overlaps between added / revoked partitions, it is a fatal error and would be communicated to all members to stop; if revoked is not empty, it is an error indicate re-join; otherwise, it is normal.
5. Minor: with the error code removed from the Assignment, ConsumerCoordinator will request re-join if the revoked partitions list is not empty.
6. Updated ConsumerCoordinatorTest accordingly. Also found a minor bug in MetadataUpdate that removed topic would still be retained with null value of num.partitions.
6. Updated a few other flaky tests that are exposed due to this change.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, A. Sophie Blee-Goldman <sophie@confluent.io>, Jason Gustafson <jason@confluent.io>
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Viktor Somogyi <viktorsomogyi@gmail.com>
Closes#7098 from mimaison/KAFKA-8599
LogConfig.getConfigValue would throw a NoSuchElementException if any log
config was defined without a server default mapping.
Added a unit test for `getConfigValue` and a sanity test for
`toHtml`/`toRst`/`toEnrichedRst`, which were previously not exercised during
the test suite.
Reviewers: Jason Gustafson <jason@confluent.io>, José Armando García Sancio <jsancio@users.noreply.github.com>
<!--
Is there any breaking changes? If so this is a major release, make sure '#major' is in at least one
commit message to get CI to bump the major. This will prevent automatic down stream dependency
bumping / consuming. For more information about semantic versioning see: https://semver.org/
Suggested PR template: Fill/delete/add sections as needed. Optionally delete any commented block.
-->
What
----
<!--
Briefly describe **what** you have changed and **why**.
Optionally include implementation strategy.
-->
References
----------
[**KIP-412**](https://cwiki.apache.org/confluence/display/KAFKA/KIP-412%3A+Extend+Admin+API+to+support+dynamic+application+log+levels)
[**KAFKA-7800**](https://issues.apache.org/jira/browse/KAFKA-7800)
[**Discussion Thread**](http://mail-archives.apache.org/mod_mbox/kafka-dev/201901.mbox/%3CCANZZNGyeVw8q%3Dx9uOQS-18wL3FEmnOwpBnpJ9x3iMLdXY3gEug%40mail.gmail.com%3E)
[**Vote Thread**](http://mail-archives.apache.org/mod_mbox/kafka-dev/201902.mbox/%3CCANZZNGzpTJg5YX1Gpe5S%3DHSr%3DXGvmxvYLTdA3jWq_qwH-UvorQ%40mail.gmail.com%3E)
<!--
Copy&paste links: to Jira ticket, other PRs, issues, Slack conversations...
For code bumps: link to PR, tag or GitHub `/compare/master...master`
-->
Test&Review
------------
Test cases covered:
* DescribeConfigs
* Alter the log level with and without validateOnly, validate the results with DescribeConfigs
Open questions / Follow ups
--------------------------
If you're a reviewer, I'd appreciate your thoughts on these questions I have open:
1. Should we add synchronization to the Log4jController methods? - Seems like we don't get much value from it
2. Should we instantiate a new Log4jController instead of it having static methods? - All operations are stateless, so I thought static methods would do well
3. A logger which does not have a set value returns "null" (as seen in the unit tests). Should we just return the Root logger's level?
Author: Stanislav Kozlovski <familyguyuser192@windowslive.com>
Reviewers: Gwen Shapira
Closes#6903 from stanislavkozlovski/KAFKA-7800-dynamic-log-levels-admin-ap
This is an updated implementation of #5844 by @MayureshGharat (with Mayuresh's permission). As described in the original ticket:
> Today when we call KafkaConsumer.poll(), it will fetch data from Kafka asynchronously and is put in to a local buffer (completedFetches).
>
> If now we pause some TopicPartitions and call KafkaConsumer.poll(), we might throw away any buffered data that we might have in the local buffer for these TopicPartitions. Generally, if an application is calling pause on some TopicPartitions, it is likely to resume those TopicPartitions in near future, which would require KafkaConsumer to re-issue a fetch for the same data that it had buffered earlier for these TopicPartitions. This is a wasted effort from the application's point of view.
This patch fixes the problem by retaining the paused data in the completed fetches queue, essentially moving it to the back on each call to `fetchedRecords`.
Reviewers: Jason Gustafson <jason@confluent.io>
Implement KIP-480, which specifies that the default partitioner should use a "sticky" partitioning strategy for records that have a null key.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Lucas Bradstreet <lucasbradstreet@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jun Rao <junrao@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
Follow up to new PartitionAssignor interface merged in 7108 is merged
Adds a PartitionAssignorAdapter class to maintain backwards compatibility
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Add a new exception, NoReassignmentInProgressException. Modify LeaderAndIsrRequest to include the AddingRepicas and RemovingReplicas fields. Add the ListPartitionReassignments and AlterPartitionReassignments RPCs.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>
This patch is part of KIP-345. We are aiming to support batch leave group request issued from admin client. This diff is the first effort to bump leave group request version.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Main changes of this PR:
* Deprecate old consumer.internal.PartitionAssignor and add public consumer.ConsumerPartitionAssignor with all OOTB assignors migrated to new interface
* Refactor assignor's assignment/subscription related classes for easier to evolve API
* Removed version number from classes as it is only needed for serialization/deserialization
* Other previously-discussed cleanup included in this PR:
* Remove Assignment.error added in pt 1
* Remove ConsumerCoordinator#adjustAssignment added in pt 2
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
* Clean up one redundant and one misplaced metric
* Clarify the relationship among these metrics to avoid future confusion
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This is a bug fix PR to resolve errors introduced in https://github.com/apache/kafka/pull/6188. The PR fixes 2 things:
1. throttle time should be set on version >= 1 instead of version >= 2
2. `getErrorResponse` should set throwable exception within LeaveGroupResponseData
The patch also adds more unit tests to guarantee correctness for leave group protocol.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
This patch changes maybeSendTransactionalRequest to handle both sending and polling transactional requests (and renames it to maybeSendAndPollTransactionalRequest), and skips the call to poll if no request is actually sent. It also removes the inner loop inside maybeSendAndPollTransactionalRequest and relies on the main Sender loop for retries.
Reviewers: Jason Gustafson <jason@confluent.io>
Api exception types usually have a single argument constructor which accepts the exception message. However, some types actually use this constructor to initialize a field. This inconsistency has led to some cases where exception messages were being incorrectly passed to these constructors and interpreted incorrectly. For example, this leads to confusing messages like the following in the log when we hit a GROUP_MAX_SIZE_REACHED error:
```
Attempt to join group failed due to fatal error: Consumer group The consumer group has reached its max size. already has the configured ...
```
This patch fixes the problem by changing these constructors so that the exception message is provided consistently. This affected `GroupAuthorizationException`, `TopicAuthorizationException`, and `GroupMaxSizeReachedException`.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Ismael Juma <ismael@juma.me.uk>
The two digit formatting we use in `Utils.formatBytes` depends on the english locale. If run from a different locale (e.g. German), the test case fails. This patch uses english explicitly.
Reviewers: Lee Dongjin <dongjin@apache.org>, Jason Gustafson <jason@confluent.io>
Ensure that Producer#send() throws topic metadata exceptions only for the topic being sent to and not for other topics in the producer's metadata instance. Also removes topics from consumer's metadata when a topic is removed using manual assignment.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
This patch changes the name of the `Resources` field of AlterConfigsResponse to `Responses`. This makes it consistent with AlterConfigsResponse, which has a differently-named but structurally-identical field. Tested with unit tests.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch fixes a compatibility breaking `MemberDescription` constructor change in #6957. It also updates `equals` and `hashCode` for the new `groupInstanceId` field that was added in the same patch.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.
Reviewers: Ismael Juma <ismael@juma.me.uk>