The goals for this small diff are:
1. Give user guidance if they want to relax commit timeout threshold
2. Indicate the code path where timeout exception was caught
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Boyang Chen <boyang@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confuent.io>
When Connect forwards a REST request from one worker to another, the Authorization header was not forwarded. This commit changes the Connect framework to add include the authorization header when forwarding requests to other workers.
Author: Hai-Dang Dam <damquanghaidang@gmail.com>
Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com>
* StreamsMetricsImpl wraps the Kafka Streams' metrics registry and provides logic to create
and register sensors and their corresponding metrics. An example for such logic can be found in
threadLevelSensor(). Furthermore, StreamsMetricsmpl keeps track of the sensors on the
different levels of an application, i.e., thread, task, etc., and provides logic to remove sensors per
level, e.g., removeAllThreadLevelSensors(). There is one StreamsMetricsImpl object per
application instance.
* ThreadMetrics contains only static methods that specify all built-in thread-level sensors and
metrics and provide logic to register and retrieve those thread-level sensors, e.g., commitSensor().
* From anywhere inside the code base with access to StreamsMetricsImpl, thread-level sensors can be accessed by using ThreadMetrics.
* ThreadsMetrics does not inherit from StreamsMetricsImpl anymore.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Temporarily restore the SslFactory.sslContext() function, which some connectors use. This function is not a public API and it will be removed eventually. For now, we will mark it as deprecated.
Restart task on reconfiguration under incremental cooperative rebalancing, and keep execution paths separate for config updates between eager and cooperative. Include the group generation in the log message when the worker receives its assignment.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
According to KIP-297 a parameter is passed to ConfigProvider with syntax "config.providers.{name}.param.{param-name}". Currently AbstractConfig allows parameters of the format "config.providers.{name}.{param-name}". With this fix AbstractConfig will be consistent with KIP-297 syntax.
Reviewers: Robert Yokota <rayokota@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Since the originals map passed to AbstractConfig constructor may be immutable, avoid updating this map while resolving indirect config variables. Instead a new ResolvingMap instance is now used to store resolved configs.
Reviewers: Randall Hauch <rhauch@gmail.com>, Boyang Chen <bchen11@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
When the restored record value is null, we are in danger of NPE during restoration phase.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Sub-task required to allow to define custom processor names with KStreams DSL(KIP-307) :
- overload methods for stateless operations to accept a Named parameter (filter, filterNot, map, mapValues, foreach, peek, branch, transform, transformValue, flatTransform)
- overload process method to accept a Named parameter
- overload join/leftJoin/outerJoin methods
Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>,
Bill Bejeck <bbejeck@gmail.com>
See also #6684
KTable processors must be supplied with a KTableProcessorSupplier, which in turn requires implementing a ValueGetter, for use with joins and groupings.
For suppression, a correct view only includes the previously emitted values (not the currently buffered ones), so this change also involves pushing the Change value type into the suppression buffer's interface, so that it can get the prior value upon first buffering (which is also the previously emitted value).
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Remove processedKeys / processedValues / processedWithTimestamps as they are covered with processed already.
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>
It is possible for the offset of a partition to be changed while we are in the middle of validation. If the OffsetForLeaderEpoch request is in-flight and the offset changes, we need to redo the validation after it returns. We had a check for this situation previously, but it was only checking if the current leader epoch had changed. This patch fixes this and moves the validation in `SubscriptionState` where it can be protected with a lock.
Additionally, this patch adds test cases for the SubscriptionState validation API. We fix a small bug handling broker downgrades. Basically we should skip validation if the latest metadata does not include leader epoch information.
Reviewers: David Arthur <mumrah@gmail.com>
Fix KAFKA-8187: State store record loss across multiple reassignments when using standby tasks.
Do not let the thread to transit to RUNNING until all tasks (including standby tasks) are ready.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
Now that we can configure RocksDB to bound the total memory we should include docs describing how, as well as touching on some possible options that should be considered when taking advantage of this feature.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jim Galasyn <jim.galasyn@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
A lot of confusion seems to have arisen from the StreamBuilder#addGlobalStore(...ProcessorSupplier) method. Users have assumed they can safely use this to transform records before populating their global state store; unfortunately this results in corrupted data as on restore the records are read directly from the source topic changelog, bypassing their custom processor.
We should probably provide a means to do this at some point but for the time being we should clarify the proper use of #addGlobalStore as it currently functions
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <bruno@confluent.io>
When poll is called which resets the offsets to the beginning, followed by a seekToEnd and a position, it could happen that the "reset to earliest" call in poll overrides the "reset to latest" initiated by seekToEnd in a very delicate way:
1. both request has been issued and returned to the client side (listOffsetResponse has happened)
2. in Fetcher.resetOffsetIfNeeded(TopicPartition, Long, OffsetData) the thread scheduler could prefer the heartbeat thread with the "reset to earliest" call, overriding the offset to the earliest and setting the SubscriptionState with that position.
3. The thread scheduler continues execution of the thread (application thread) with the "reset to latest" call and discards it as the "reset to earliest" already set the position - the wrong one.
4. The blocking position call returns with the earliest offset instead of the latest, despite it wasn't expected.
The fix makes SubscriptionState synchronized so that we can verify that the reset is expected while holding the lock.
Reviewers: Jason Gustafson <jason@confluent.io>
The test was failing because other test cases are leaving some metrics in the registry. This patch cleans the registry before and after test runs.
Reviewers: Ismael Juma <ismael@juma.me.uk>
I think it's better just to make single-batch as a universal requirement for all versions for compressed messages, and for V2 and beyond uncompressed messages as well.
Reviewers: Jason Gustafson <jason@confluent.io>
As title suggests, this unit test is just a double check. No need to push in 2.3
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>
As we are planning to add on more supporting features for rebalancing under static membership, we need to make sure the behavior for `group.instance.id` is consistent throughout the whole stack. This patch ensures that the default value is null in the JoinGroup response.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
We have two fields `highWatermarkCheckPointThreadStarted` and `hwThreadInitialized` which appear to be serving the same purpose. This patch gets rid of `hwThreadInitialized`.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Authorized operations must be null when talking to a pre-KIP-430 broker.
If we present this as the empty set instead, it is impossible for clients
to know if they have no permissions, or are talking to an old broker.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
The consumer should await api version information before determining whether the broker supports offset validation. In KAFKA-8422, we skip the validation if we don't have api version information, which means we always skip validation the first time we connect to a node. This bug was detected by the failing system test `tests/client/truncation_test.py`. The test passes again with this fix.
Reviewers: Ismael Juma <ismael@juma.me.uk>
When cleaning transactional data, we need to keep track of which transactions still have data associated with them so that we do not remove the markers. We had logic to do this, but the state was not being carried over when beginning cleaning for a new set of segments. This could cause the cleaner to incorrectly believe a transaction marker was no longer needed. The fix here carries the transactional state between groups of segments to be cleaned.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>
In the olden days, OffsetForLeaderEpoch was exclusively an inter-broker protocol and
required Cluster level permission. With KIP-320, clients can use this API as well and
so we lowered the required permission to Topic Describe. The only way the client can
be sure that the new permissions are in use is to require version 3 of the protocol
which was bumped for 2.3. If the broker does not support this version, we skip the
validation and revert to the old behavior.
Additionally, this patch fixes a problem with the newly added replicaId field when
parsed from older versions which did not have it. If the field was not present, then
we used the consumer's sentinel value, but this would limit the range of visible
offsets by the high watermark. To get around this problem, this patch adds a
separate "debug" sentinel similar to APIs like Fetch and ListOffsets.
Reviewers: Ismael Juma <ismael@juma.me.uk>
An API call for consumer groups must send a FindCoordinatorRequest to find the consumer group coordinator, and then send a follow-up request to that node. But the coordinator might move after the FindCoordinatorRequest but before the follow-up request is sent. In that case we currently fail.
This change fixes that by detecting this error and then retrying. This fixes listConsumerGroupOffsets, deleteConsumerGroups, and describeConsumerGroups.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Boyang Chen <bchen11@outlook.com>
The old docs here used a now deprecated method to set the block cache size. In switching over to the new one we would now need to construct a Cache object and therefore also need to close it, so this is a good opportunity to demonstrate the RocksDBConfigSetter#close method that will need to be implemented by users.
Reviewers: Guozhang Wang <wangguoz@gmail.com>