* Part of KIP-572
* If a TimeoutException happens during restore of active tasks, or updating standby tasks, we need to trigger task.timeout.ms timeout.
Reviewers: John Roesler <john@confluent.io>
This PR implements a basic CLI tool for feature versioning system. The KIP-584 write up has been updated to suit this PR. The following is implemented in this PR:
--describe:
Describe supported and finalized features.
Usage: $> ./bin/kafka-features.sh --bootstrap-server host1:port1, host2:port2 --describe [--from-controller] [--command-config <path_to_java_properties_file>]
Optionally, use the --from-controller option to get features from the controller.
--upgrade-all:
Upgrades all features known to the tool to their highest max version levels.
Usage: $> ./bin/kafka-features.sh --bootstrap-server host1:port1, host2:port2 --upgrade-all [--dry-run] [--command-config <path_to_java_properties_file>]
Optionally, use the --dry-run CLI option to preview the feature updates without actually applying them.
--downgrade-all:
Downgrades existing finalized features to the highest max version levels known to this tool.
Usage: $> ./bin/kafka-features.sh --bootstrap-server host1:port1, host2:port2 --downgrade-all [--dry-run] [--command-config <path_to_java_properties_file>].
Optionally, use the --dry-run CLI option to preview the feature updates without actually applying them.
Reviewers: Boyang Chen <boyang@confluent.io>, Jun Rao <junrao@gmail.com>
Trigger task reconfiguration when:
- topic-partitions are created or deleted on source cluster
- topic-partitions are missing on target cluster
Authors: Mickael Maison <mickael.maison@gmail.com>, Edoardo Comar <ecomar@uk.ibm.com>
Reviewer: Randall Hauch <rhauch@gmail.com>
Other than a Stack Overflow comment (see https://stackoverflow.com/a/61738065) by Colin Patrick McCabe and a proposed design note on KIP-117 wiki, there is no source that verifies the thread-safety of KafkaAdminClient.
This patch updates JavaDoc of KafkaAdminClient to clarify its thread-safety.
Reviewers: Tom Bentley <tbentley@redhat.com>, Chia-Ping Tsai <chia7712@gmail.com>
Connect should not always add an error to configuration values in validation results that don't have a `ConfigKey` defined in the connector's `ConfigDef`, and any errors on such configuration values included by the connector should be counted in the total number of errors. Added more unit tests for `AbstractHerder.generateResult(...)`.
Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Konstantine Karantasis <konstantine@confluent.io>
In 2.7, we added lastFetchedEpoch to fetch requests and divergingEpoch to fetch responses. We are not using these for truncation yet, but in order to use these for truncation with IBP 2.7 onwards in the next release, we should make sure that we handle these in all the supporting classes even in 2.7.
Reviewers: Jason Gustafson <jason@confluent.io>
Currently, we pass multiple object reference (AdminClient,TaskManager, and a few more) into StreamsPartitionAssignor. Furthermore, we (miss)use TaskManager#mainConsumer() to get access to the main consumer (we need to do this, to avoid a cyclic dependency).
This PR unifies how object references are passed into a single ReferenceContainer class to
- not "miss use" the TaskManager as reference container
- unify how object references are passes
Note: we need to use a reference container to avoid cyclic dependencies, instead of using a config for each passed reference individually.
Reviewers: John Roesler <john@confluent.io>
ErrantRecordReporter uses a RetryWithToleranceOperator instance, which is necessarily stateful, having a ProcessingContext of which there's supposed to be one per task. That ProcessingContext is used by both RetryWithToleranceOperator.executeFailed() and execute(), so it's not enough to just synchronize executeFailed().
So make all public methods of RetryWithToleranceOperator synchronized so that RetryWithToleranceOperator is now threadsafe.
Tested with the addition of a multithreaded test case that fails consistently if the methods are not properly synchronized.
Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
When a coordinator module is being elected / resigned, our log entry is usually associated with a background scheduler on loading / unloading entries and hence it is unclear at the exact time when the election or resignation happens, and we have to then compare with the KafkaAPI's log entry for leaderAndISR / StopReplica to infer the actual time. I think add a couple new log entries indicating the exact time when it happens is helpful.
Reviewers: Boyang Chen <boyang@confluent.io>, Lee Dongjin <dongjin@apache.org>, Bruno Cadonna <bruno@confluent.io>
The leader epoch field is added in version 4, and the auto-generated protocol code would throw unsupported version exception if the field is set to any non-default values for version < 4. This would cause older versioned clients to never receive list-offset results.
Reviewers: Boyang Chen <boyang@confluent.io>
When using an error tracking system, two error log messages result into two different alerts.
It's best to group the logs and have one error with all the information.
For example when using with Sentry, this double line of log.error will create 2 different Issues. One can merge the issues but it will be simpler to have a single error log line.
Signed-off-by: Benoit Maggi <benoit.maggi@gmail.com>
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Konstantine Karantasis <k.karantasis@gmail.com>
Fix a bug that was introduced by change 86013dc that resulted in incorrect behavior when
deleting through an iterator.
The bug is that the hash table relies on a denseness invariant... if you remove something,
you might have to move some other things. Calling removeElementAtSlot will do this.
Calling removeFromList is not enough.
Reviewers: Jason Gustafson <jason@confluent.io>
Before `AlterIsr` which was introduced in KIP-497, the controller would register watches in Zookeeper for each reassigning partition so that it could be notified immediately when the ISR was expanded and the reassignment could be completed. This notification is not needed with the latest IBP when `AlterIsr` is enabled because the controller will execute all ISR changes itself.
There is one subtle detail. If we are in the middle of a roll in order to bump the IBP, then it is possible for the controller to be on the latest IBP while some of the brokers are still on the older one. In this case, the brokers on the older IBP will not send `AlterIsr`, but we can still rely on the delayed notification through the `isr_notifications` path to complete reassignments. This seems like a reasonable tradeoff since it should be a short window before the roll is completed.
Reviewers: David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
Changes the Connect `ReplaceField` SMT's configuration properties, deprecating and replacing `blacklist` with `exclude`, and `whitelist` with `include`. The old configurations are still allowed (ensuring backward compatibility), but warning messages are written to the log to suggest users change to `include` and `exclude`.
This is part of KIP-629.
Author: Xavier Léauté <xvrl@apache.org>
Reviewer: Randall Hauch <rhauch@gmail.com>
rename internal classes, methods, and related constants for KIP-629
Author: Xavier Léauté <xvrl@apache.org>
Reviewers: Gwen Shapira
Closes#9400 from xvrl/rename-topic-includelist
Nodes that are materialized should not forward requests to `enableSendingOldValues` to parent nodes, as they themselves can handle fulfilling this request. However, some instances of `KTableProcessorSupplier` were still forwarding requests to parent nodes, which was causing unnecessary materialization of table sources.
The following instances of `KTableProcessorSupplier` have been updated to not forward `enableSendingOldValues` to parent nodes if they themselves are materialized and can handle sending old values downstream:
* `KTableFilter`
* `KTableMapValues`
* `KTableTransformValues`
Other instances of `KTableProcessorSupplier` have not be modified for reasons given below:
* `KTableSuppressProcessorSupplier`: though it has a `storeName` field, it didn't seem right for this to handle sending old values itself. Its only job is to suppress output.
* `KTableKTableAbstractJoin`: doesn't have a store name, i.e. it is never materialized, so can't handle the call itself.
* `KTableKTableJoinMerger`: table-table joins already have materialized sources, which are sending old values. It would be an unnecessary performance hit to have this class do a lookup to retrieve the old value from its store.
* `KTableReduce`: is always materialized and already handling the call without forwarding
* `KTableAggregate`: is always materialized and already handling the call without forwarding
Reviewer: Matthias J. Sax <matthias@confluent.io>
Implements KIP-478 for the test-utils module:
* adds mocks of the new ProcessorContext and StateStoreContext
* adds tests that all stores and store builders are usable with the new mock
* adds tests that the new Processor api is usable with the new mock
* updates the demonstration Processor to the new api
Reviewers: Guozhang Wang <guozhang@apache.org>
This patch adds missing flush logic to `KafkaRaftClient`. The initial flushing behavior is simplistic. We guarantee that the leader will not replicate above the last flushed offset and we guarantee that the follower will not fetch data above its own flush point. More sophisticated flush behavior is proposed in KAFKA-10526.
We have also extended the simulation test so that it covers flush behavior. When a node is shutdown, all unflushed data is lost. We were able to confirm that the monotonic high watermark invariant fails without the added `flush` calls.
This patch also piggybacks a fix to the `TestRaftServer` implementation. The initial check-in contained a bug which caused `RequestChannel` to fail sending responses because the disabled APIs did not have metrics registered. As a result of this, it is impossible to elect leaders.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
In LeaderEpochFileCacheTest.scala, code is identical for `shouldNotResetEpochHistoryHeadIfUndefinedPassed` and `shouldNotResetEpochHistoryTailIfUndefinedPassed`. Seems `truncateFromStart` should be invoked in `shouldNotResetEpochHistoryHeadIfUndefinedPassed` instead of `truncateFromEnd`.
In KIP-455, we changed the behavior of the reassignment tool so that the `--additional` flag is required in order to use the command to alter the throttle. This patch improves the documentation to make this clearer and adds some integration tests to validate the behavior.
This patch also contains a few minor code quality improvements:
- Factor out a helper `calculateCurrentMoveMap` from `calculateMoveMap` to compute the current move map, which makes the logic easier to follow
- Rename `calculateMoveMap` to `calculateProposedMoveMap` to make intention clearer
- Split `modifyBrokerThrottles` into two methods `modifyLogDirThrottle` and `modifyInterBrokerThrottle`
- Move logic to compute leader and follower throttles into a new method `modifyReassignmentThrottle`, which takes it out of the execution path when log dir throttles are changed
- Minor stylistic improvements such as replacing `.map.flatten` with `.flatMap`
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
Previously, we would throw a confusing error, "the store has migrated,"
when users ask for a store that is not in the topology at all, or when the
type of the store doesn't match the QueryableStoreType parameter.
Adds an up-front check that the requested store is registered and also
a better error message when the QueryableStoreType parameter
doesn't match the store's type.
Reviewers: Guozhang Wang <guozhang@apache.org>
https://issues.apache.org/jira/browse/KAFKA-10584
In Scala, we prefer sealed traits over Enumeration since the former gives you exhaustiveness checking. With Scala Enumeration, you don't get a warning if you add a new value that is not handled in a given pattern match.
This PR avoids a performance issue with DeleteRecords when a partition directory contains high numbers of files. Previously, DeleteRecords would iterate the partition directory searching for producer state snapshot files. With this change, the iteration is removed in favor of keeping a 1:1 mapping between producer state snapshot file and segment file. A segment files corresponding producer state snapshot file is now deleted when the segment file is deleted.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
This PR adds missing broker ACLs required to create topics and SCRAM credentials when ACLs are enabled for a system test. This PR also adds support for using PLAINTEXT as the inter broker security protocol when using SCRAM from the client in a system test with a secured cluster-- without this it would always be necessary to set both the inter-broker and client mechanisms to a SCRAM mechanism. Also contains some refactoring to make assumptions clearer.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
While this is not technically part of KIP-629, I believe this makes our codebase more inclusive as well.
cc gwenshap
Author: Xavier Léauté <xvrl@apache.org>
Reviewers: Gwen Shapira
Closes#9398 from xvrl/neutral-term
This replaces code and comment occurrences as described in the KIP
Author: Xavier Léauté <xvrl@apache.org>
Reviewers: Gwen Shapira, Mickael Maison
Closes#9366 from xvrl/kafka-10571
Add additional warning logs and improve existing log messages for `FileNotFoundException` and if /tmp is used as state directory.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
StreamThreadStateStoreProvider excessive loop over calling internalTopologyBuilder.topicGroups(), which is synchronized, thus causing significant performance degradation to the caller, especially when store has many partitions.
Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
In this PR, I have addressed the review comments from @chia7712 in #9001 which were provided after #9001 was merged. The changes are made mainly to KafkaAdminClient:
Improve error message in updateFeatures api when feature name is empty.
Propagate top-level error message in updateFeatures api.
Add an empty-parameter variety for describeFeatures api.
Minor documentation updates to @param and @return to make these resemble other apis.
Reviewers: Chia-Ping Tsai chia7712@gmail.com, Jun Rao junrao@gmail.com