When poll is called which resets the offsets to the beginning, followed by a seekToEnd and a position, it could happen that the "reset to earliest" call in poll overrides the "reset to latest" initiated by seekToEnd in a very delicate way:
1. both request has been issued and returned to the client side (listOffsetResponse has happened)
2. in Fetcher.resetOffsetIfNeeded(TopicPartition, Long, OffsetData) the thread scheduler could prefer the heartbeat thread with the "reset to earliest" call, overriding the offset to the earliest and setting the SubscriptionState with that position.
3. The thread scheduler continues execution of the thread (application thread) with the "reset to latest" call and discards it as the "reset to earliest" already set the position - the wrong one.
4. The blocking position call returns with the earliest offset instead of the latest, despite it wasn't expected.
The fix makes SubscriptionState synchronized so that we can verify that the reset is expected while holding the lock.
Reviewers: Jason Gustafson <jason@confluent.io>
The test was failing because other test cases are leaving some metrics in the registry. This patch cleans the registry before and after test runs.
Reviewers: Ismael Juma <ismael@juma.me.uk>
I think it's better just to make single-batch as a universal requirement for all versions for compressed messages, and for V2 and beyond uncompressed messages as well.
Reviewers: Jason Gustafson <jason@confluent.io>
As title suggests, this unit test is just a double check. No need to push in 2.3
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>
As we are planning to add on more supporting features for rebalancing under static membership, we need to make sure the behavior for `group.instance.id` is consistent throughout the whole stack. This patch ensures that the default value is null in the JoinGroup response.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
We have two fields `highWatermarkCheckPointThreadStarted` and `hwThreadInitialized` which appear to be serving the same purpose. This patch gets rid of `hwThreadInitialized`.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Authorized operations must be null when talking to a pre-KIP-430 broker.
If we present this as the empty set instead, it is impossible for clients
to know if they have no permissions, or are talking to an old broker.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
The consumer should await api version information before determining whether the broker supports offset validation. In KAFKA-8422, we skip the validation if we don't have api version information, which means we always skip validation the first time we connect to a node. This bug was detected by the failing system test `tests/client/truncation_test.py`. The test passes again with this fix.
Reviewers: Ismael Juma <ismael@juma.me.uk>
When cleaning transactional data, we need to keep track of which transactions still have data associated with them so that we do not remove the markers. We had logic to do this, but the state was not being carried over when beginning cleaning for a new set of segments. This could cause the cleaner to incorrectly believe a transaction marker was no longer needed. The fix here carries the transactional state between groups of segments to be cleaned.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>
In the olden days, OffsetForLeaderEpoch was exclusively an inter-broker protocol and
required Cluster level permission. With KIP-320, clients can use this API as well and
so we lowered the required permission to Topic Describe. The only way the client can
be sure that the new permissions are in use is to require version 3 of the protocol
which was bumped for 2.3. If the broker does not support this version, we skip the
validation and revert to the old behavior.
Additionally, this patch fixes a problem with the newly added replicaId field when
parsed from older versions which did not have it. If the field was not present, then
we used the consumer's sentinel value, but this would limit the range of visible
offsets by the high watermark. To get around this problem, this patch adds a
separate "debug" sentinel similar to APIs like Fetch and ListOffsets.
Reviewers: Ismael Juma <ismael@juma.me.uk>
An API call for consumer groups must send a FindCoordinatorRequest to find the consumer group coordinator, and then send a follow-up request to that node. But the coordinator might move after the FindCoordinatorRequest but before the follow-up request is sent. In that case we currently fail.
This change fixes that by detecting this error and then retrying. This fixes listConsumerGroupOffsets, deleteConsumerGroups, and describeConsumerGroups.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Boyang Chen <bchen11@outlook.com>
The old docs here used a now deprecated method to set the block cache size. In switching over to the new one we would now need to construct a Cache object and therefore also need to close it, so this is a good opportunity to demonstrate the RocksDBConfigSetter#close method that will need to be implemented by users.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
These are important to ensure we don't break compatibility.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Gwen Shapira
Closes#6794 from ijuma/update-version-compat-tests
This patch attempts to simplify the interaction between Partition and the various components from `ReplicaManager`. This is primarily to make unit testing easier. I have also tried to eliminate the OfflinePartition sentinel which has always been unsafe.
Reviewers: Boyang Chen <bchen11@outlook.com>, David Arthur <mumrah@gmail.com>
Because of how config values are converted into strings in the `AbstractHerder.validateClientOverrides()` method after being validated by the client override policy, an exception is thrown if the value returned by the policy isn't already parsed as the type expected by the client `ConfigDef`. The fix here involves parsing client override properties before passing them to the override policy.
A unit test is added to ensure that several different types of configs are validated properly by the herder.
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Magesh Nandakumar <magesh.n.kumar@gmail.com>, Randall Hauch <rhauch@gmail.com>
When consumer coordinator realize the subscription may have changed, today we check again against the joinedSubscription within handleAssignmentMismatch. This checking however is a bit fishy and over-kill as well. It's better just simplifying it to always request re-join.
The joinedSubscription object itself however still need to be maintained for potential augment to avoid extra re-joining the group.
Since testOutdatedCoordinatorAssignment already cover the normal case we also remove the other invalidAssignment test case.
Reviewers: Jason Gustafson <jason@confluent.io>
Prior to this change, the next commit time advances
_each_ time a commit happens -- including when a commit happens
because it was requested by the `Task`. When a `Task` requests a
commit several times, the clock advances far into the future
which prevents expected periodic commits from happening.
This commit changes the behavior, we reset `nextCommit` relative
to the time of the commit.
Reviewers: Jason Gustafson <jason@confluent.io>
Remove created metrics when shutting down `ControllerEventManager`. This fixes transient failures in `ControllerEventManagerTest.testEventQueueTime` and is generally good hygiene.
Reviewers: José Armando García Sancio <jsancio@gmail.com>, Ismael Juma <ismael@juma.me.uk>
As title states. We plan to merge this to both trunk and 2.3 if it could fix the stream system tests globally.
Reference implementation: #6673
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>
KIP-345 and KIP-392 introduced a couple breaking changes for old versions of bumped protocols. This patch fixes them.
Reviewers: Colin Patrick McCabe <cmccabe@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Boyang Chen <bchen11@outlook.com>, Guozhang Wang <wangguoz@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Colin Patrick McCabe <cmccabe@confluent.io>, Andrew Olson <aolson1@cerner.com>, José Armando García Sancio <jsancio@users.noreply.github.com>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#5041 from omkreddy/KAFKA-3143
Some versions of OpenJDK 11 do not properly handle external javadocs links referencing previous Java versions. See: https://bugs.openjdk.java.net/browse/JDK-8212233.
Failure symptom:
`> Task :connect:api:javadoc
javadoc: error - The code being documented uses modules but the packages defined in https://docs.oracle.com/javase/8/docs/api/ are in the unnamed module.
1 error`
This PR conditionally sets the Java api docs link for the affected Gradle tasks. I verified that the links render correctly in the generated documentation when building with `1.8.0_181` and `11.0.3`. For example, in `build/docs/javadoc/org/apache/kafka/connect/source/SourceTask.html` the hyperlink to `java.nio.channels.Selector` points to a valid page on Oracle's site in both cases.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
Return a copy of the ConfigDef in Client Configs. Related to KIP-458.
Author: Magesh Nandakumar <magesh.n.kumar@gmail.com
Reviewer: Randall Hauch <rhauch@gmail.com>
* Remove deprecated class Slf4jRequestLog: use Slf4jRequestLogWriter, CustomRequestLog instread.
1. Remove '@SuppressWarnings("deprecation")' from RestServer#initializeResources, JsonRestServer#start.
2. Remove unused JsonRestServer#httpRequest.
* Fix deprecated class usage: SslContextFactory -> SslContextFactory.[Server, Client]
1. Split SSLUtils#createSslContextFactory into SSLUtils#create[Server, Client]SideSslContextFactory: each method instantiates SslContextFactory.[Server, Client], respectively.
2. SSLUtils#configureSslContextFactoryAuthentication is called from SSLUtils#createServerSideSslContextFactory only.
3. Update SSLUtilsTest following splittion; for client-side SSL Context Factory, SslContextFactory#get[Need, Want]ClientAuth is always false. (SSLUtilsTest#testCreateClientSideSslContextFactory)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
- Make endpoint validation configurable on SslEngineBuilder when creating an engine
- Disable endpoint validation for engines created for inter-broker SSL validation since it is unsafe to use `localhost`
- Use empty hostname in validation engine to ensure tests fail if validation is re-enabled by mistake
- Add tests to verify inter-broker SSL validation
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
When we close a task and EOS is enabled we should always close the producer regardless if the task is in a zombie state (the broker fenced the producer) or not.
I've added tests that fail without this change.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Jason Gustafson <jason@confluent.io>