Corrected the AbstractHerder to correctly identify task configs that contain variables for externalized secrets. The original method incorrectly used `matcher.matches()` instead of `matcher.find()`. The former method expects the entire string to match the regex, whereas the second one can find a pattern anywhere within the input string (which fits this use case more correctly).
Added unit tests to cover various cases of a config with externalized secrets, and updated system tests to cover case where config value contains additional characters besides secret that requires regex pattern to be found anywhere in the string (as opposed to complete match).
Author: Arjun Satish <arjun@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
The assignReplicasToBrokers method has helpful but a large unformatted
javadoc comment that results in a big blob in generated html. This
change formats the comment so that generated javadoc is nice.
Reviewers: Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>
Since `Metrics` was constructed with `enableExpiration=false`, this was
not a source of flakiness given the current implementation. This could
change in the future, so good to follow the class contract.
Included a few clean-ups with regards to redundant casts and type parameters
as well as usage of try with resources for inline usage of `Metrics`.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Avoids calling both `containsKey` and `get` from isMuted, when only a single get is necessary.
Also avoid calling `remove` unless necessary. This could be a reduction of map operations
from 3 to 1.
isMuted showed up as a hotspot in profiling when using the producer with high numbers of partitions.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Connector validation fails if an alias is used for the converter since the validation for that is done via `ConfigDef.validateAll(...)`, which in turn invokes `Class.forName(...)` on the alias. Even though the class is successfully loaded by the DelegatingClassLoader, some Java implementations will refuse to return a class from `Class.forName(...)` whose name differs from the argument provided.
This commit alters `ConfigDef.parseType(...)` to first invoke `ClassLoader.loadClass(...)` on the class using our class loader in order to get a handle on the actual class object to be loaded, then invoke `Class.forName(...)` with the fully-qualified class name of the to-be-loaded class and return the result. The invocation of `Class.forName(...)` is necessary in order to allow static initialization to take place; simply calling `ClassLoader.loadClass(...)` is insufficient.
Also corrected a unit test that relied upon the old behavior.
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com>
We recently introduced a bunch of flaky tests in the AdminClientIntegrationTest.
These tests are failing very frequently. We should ignore the tests in order to
make the build stable until we have a fix.
Reviewers: Ismael Juma <ismael@juma.me.uk>
The introduction of KIP-480: Sticky Producer Partitioner had the
side effect that generateAndProduceMessages can often write
messages to a lower number of partitions to improve batching.
testDescribeLogDirsRequest (and potentially other tests) relies
on the messages being written somewhat uniformly to the topic
partitions. We fix the issue by including a monotonically
increasing key in the produced messages.
I also included a couple of minor clean-ups I noticed while
debugging the issue.
The test failed very frequently when executed locally before the
change and it passed 100 times consecutively after the change.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Viktor Somogyi <viktorsomogyi@gmail.com>
Closes#7038 from mimaison/KAFKA-8598
1. Add onPartitionsLost into the RebalanceListener, which will be triggered when the consumer found that the generation is reset due to fatal errors in response handling.
2. Semantical behavior change: with COOPERATIVE protocol, if the revoked / lost partitions are empty, do not trigger the corresponding callback at all. For added partitions though, even if it is empty we would still trigger the callback as a way to notify the rebalance event; with EAGER protocol, revoked / assigned callbacks are always triggered.
The ordering of the callback would be the following:
a. Callback onPartitionsRevoked / onPartitionsLost triggered.
b. Update the assignment (both revoked and added).
c. Callback onPartitionsAssigned triggered.
In this way we are assured that users can still access the partitions being revoked, whereas they can also access the partitions being added.
3. Semantical behavior change (KAFKA-4600): if the rebalance listener throws an exception, pass it along all the way to the consumer.poll caller, but still completes the rest of the actions. Also, the newly assigned partitions list does not gets affected with exception thrown since it is just for notifying the users.
4. Semantical behavior change: the ConsumerCoordinator would not try to modify assignor's returned assignments, instead it will validate that assignments and set the error code accordingly: if there are overlaps between added / revoked partitions, it is a fatal error and would be communicated to all members to stop; if revoked is not empty, it is an error indicate re-join; otherwise, it is normal.
5. Minor: with the error code removed from the Assignment, ConsumerCoordinator will request re-join if the revoked partitions list is not empty.
6. Updated ConsumerCoordinatorTest accordingly. Also found a minor bug in MetadataUpdate that removed topic would still be retained with null value of num.partitions.
6. Updated a few other flaky tests that are exposed due to this change.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, A. Sophie Blee-Goldman <sophie@confluent.io>, Jason Gustafson <jason@confluent.io>
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Viktor Somogyi <viktorsomogyi@gmail.com>
Closes#7098 from mimaison/KAFKA-8599
2.9.9.1 and 2.9.9.2 include security fixes while 2.9.9.3 fixes a regression
introduced in 2.9.9.2.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Renames method names in StreamsMetricsImpl to make them consistent.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
LogConfig.getConfigValue would throw a NoSuchElementException if any log
config was defined without a server default mapping.
Added a unit test for `getConfigValue` and a sanity test for
`toHtml`/`toRst`/`toEnrichedRst`, which were previously not exercised during
the test suite.
Reviewers: Jason Gustafson <jason@confluent.io>, José Armando García Sancio <jsancio@users.noreply.github.com>
[JIRA](https://issues.apache.org/jira/browse/KAFKA-6263)
- Add metrics to provide visibility for how long group metadata and transaction metadata take to load in order to understand some inactivity seen in the consumer groups
- Tests include mocking load times by creating a delay after each are loaded and ensuring the measured JMX metric is as it should be
Author: anatasiavela <anastasiavela@berkeley.edu>
Reviewers: Gwen Shapira, Jason Gustafson
Closes#7045 from anatasiavela/KAFKA-6263
Reviews: A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
<!--
Is there any breaking changes? If so this is a major release, make sure '#major' is in at least one
commit message to get CI to bump the major. This will prevent automatic down stream dependency
bumping / consuming. For more information about semantic versioning see: https://semver.org/
Suggested PR template: Fill/delete/add sections as needed. Optionally delete any commented block.
-->
What
----
<!--
Briefly describe **what** you have changed and **why**.
Optionally include implementation strategy.
-->
References
----------
[**KIP-412**](https://cwiki.apache.org/confluence/display/KAFKA/KIP-412%3A+Extend+Admin+API+to+support+dynamic+application+log+levels)
[**KAFKA-7800**](https://issues.apache.org/jira/browse/KAFKA-7800)
[**Discussion Thread**](http://mail-archives.apache.org/mod_mbox/kafka-dev/201901.mbox/%3CCANZZNGyeVw8q%3Dx9uOQS-18wL3FEmnOwpBnpJ9x3iMLdXY3gEug%40mail.gmail.com%3E)
[**Vote Thread**](http://mail-archives.apache.org/mod_mbox/kafka-dev/201902.mbox/%3CCANZZNGzpTJg5YX1Gpe5S%3DHSr%3DXGvmxvYLTdA3jWq_qwH-UvorQ%40mail.gmail.com%3E)
<!--
Copy&paste links: to Jira ticket, other PRs, issues, Slack conversations...
For code bumps: link to PR, tag or GitHub `/compare/master...master`
-->
Test&Review
------------
Test cases covered:
* DescribeConfigs
* Alter the log level with and without validateOnly, validate the results with DescribeConfigs
Open questions / Follow ups
--------------------------
If you're a reviewer, I'd appreciate your thoughts on these questions I have open:
1. Should we add synchronization to the Log4jController methods? - Seems like we don't get much value from it
2. Should we instantiate a new Log4jController instead of it having static methods? - All operations are stateless, so I thought static methods would do well
3. A logger which does not have a set value returns "null" (as seen in the unit tests). Should we just return the Root logger's level?
Author: Stanislav Kozlovski <familyguyuser192@windowslive.com>
Reviewers: Gwen Shapira
Closes#6903 from stanislavkozlovski/KAFKA-7800-dynamic-log-levels-admin-ap
* Adds RocksDBMetrics class that provides methods to get sensors from the Kafka metrics registry and to setup the sensors to record RocksDB metrics
* Extends StreamsMetricsImpl with functionality to add the required metrics to the sensors.
Reviewers: Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
This is an updated implementation of #5844 by @MayureshGharat (with Mayuresh's permission). As described in the original ticket:
> Today when we call KafkaConsumer.poll(), it will fetch data from Kafka asynchronously and is put in to a local buffer (completedFetches).
>
> If now we pause some TopicPartitions and call KafkaConsumer.poll(), we might throw away any buffered data that we might have in the local buffer for these TopicPartitions. Generally, if an application is calling pause on some TopicPartitions, it is likely to resume those TopicPartitions in near future, which would require KafkaConsumer to re-issue a fetch for the same data that it had buffered earlier for these TopicPartitions. This is a wasted effort from the application's point of view.
This patch fixes the problem by retaining the paused data in the completed fetches queue, essentially moving it to the back on each call to `fetchedRecords`.
Reviewers: Jason Gustafson <jason@confluent.io>
Implement KIP-480, which specifies that the default partitioner should use a "sticky" partitioning strategy for records that have a null key.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Lucas Bradstreet <lucasbradstreet@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jun Rao <junrao@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
Add a new RandomComponentPayloadGenerator that gives a payload based on random selection of another PayloadGenerator. Additionally, add an example that uses a non-default PayloadGenerator configuration to TROGDOR.md.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Follow up to new PartitionAssignor interface merged in 7108 is merged
Adds a PartitionAssignorAdapter class to maintain backwards compatibility
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Better clarify the auto leader rebalance config documentation to reflect additional leader.imbalance.per.broker.percentage config description.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jun Rao <junrao@gmail.com>
This patch contains a few small cleanups to make topic describe logic a little clearer. It also fixes a minor inconsistency in the output between the --zookeeper and --bootstrap-server behavior when the leader is unknown. Previously we printed -1 when --zookeeper was used, and now we print "none." The patch consolidates the output logic for describing topics and partitions in order to avoid inconsistencies like this in the future.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
To avoid transient system test failures, tolerate a small amount of data loss due to truncation in upgrade system tests using older message format prior to KIP-101, where data loss was possible.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Although we currently cache offset metadata for the high watermark and last stable offset, we don't use it when reading from the log. Instead we always look it up from the index. This patch pushes fetch isolation into `Log.read` so that we are able to reuse the cached offset metadata.
Reviewers: Jun Rao <junrao@gmail.com>
Currently the Metadata response returns an empty ISR if there is no active leader. The behavior is inconsistent since other fields such as the replica list and offline replicas are included. This patch changes the behavior to return the current known ISR. This fixes a problem with the topic describe command which fails to report ISR when a leader is offline.
Reviewers: Jason Gustafson <jason@confluent.io>
Add a new exception, NoReassignmentInProgressException. Modify LeaderAndIsrRequest to include the AddingRepicas and RemovingReplicas fields. Add the ListPartitionReassignments and AlterPartitionReassignments RPCs.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>
And remove redundant call. Closing ZKDatabase is necessary to allow the data
directory to be deleted when running on Windows.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch is part of KIP-345. We are aiming to support batch leave group request issued from admin client. This diff is the first effort to bump leave group request version.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
The bug is that we accidentally used the current state timestamp for the group instead of the real current time. When a group is first loaded, this timestamp is not initialized, so this resulted in a `NoSuchElementException`. Additionally this violated the intended uniqueness of the memberId, which could have broken the group instance fencing. Fix is made and unit test to make sure the timestamp is properly encoded within the returned member.id.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Main changes of this PR:
* Deprecate old consumer.internal.PartitionAssignor and add public consumer.ConsumerPartitionAssignor with all OOTB assignors migrated to new interface
* Refactor assignor's assignment/subscription related classes for easier to evolve API
* Removed version number from classes as it is only needed for serialization/deserialization
* Other previously-discussed cleanup included in this PR:
* Remove Assignment.error added in pt 1
* Remove ConsumerCoordinator#adjustAssignment added in pt 2
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Minor refactoring of the places where we delete log segments, to ensure we always remove the in-memory metadata of the segment before performing physical deletion.
Reviewers: Ismael Juma <ismael@juma.me.uk>, NIkhil Bhatia <rite2nikhil@gmail.com>, Jun Rao <junrao@gmail.com>
* Clean up one redundant and one misplaced metric
* Clarify the relationship among these metrics to avoid future confusion
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
LogManager#getOrCreateLog() selects a log dir for the new replica from
_liveLogDirs, if disk failure is discovered at this point, before
LogDirFailureHandler finds out, try using other log dirs before failing
the operation.
Reviewers: Anna Povzner <anna@confluent.io>, Jason Gustafson <jason@confluent.io>