ToString functions must not get a NullPointException. read() functions
must properly translate a negative array length to a null field.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
The goal of this task is to implement an integration test for the kafka stream metrics.
We have to check 2 things:
1. After streams application are started, all metrics from different levels (thread, task, processor, store, cache) are correctly created and displaying recorded values.
2. When streams application are shutdown, all metrics are correctly de-registered and removed.
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
TopicDescription and ConsumerGroupDescription in org.apache.kafka.clients.admin. are part of the public API, so we should retain the existing public constructor. Changed the new constructor with authorized operations to be package-private to avoid maintaining more public constructors since we only expect admin client to use this.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Adds a new listener config `max.connections` to limit the number of active connections on each listener. The config may be prefixed with listener prefix. This limit may be dynamically reconfigured without restarting the broker.
This is one of the PRs for KIP-402 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-402%3A+Improve+fairness+in+SocketServer+processors). Note that this is currently built on top of PR #6022
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Gwen Shapira <cshapi@gmail.com>
Closes#6034 from rajinisivaram/KAFKA-7730-max-connections
Users have reported (KAFKA-7565) that when consumer poll wake up is used,
it is possible to receive fetch responses that don't match the copied topic
partitions collection for the session when the fetch request was created.
This commit improves the error handling here by throwing an
IllegalStateException instead of a NullPointerException. And by
generating a message for the exception that includes a bit of more
information.
Reviewers: Jason Gustafson <jason@confluent.io>
There is a small timing window where ```time.sleep(retryBackoff)``` will get executed before adminClient adds retry request to the queue. Added a check to wait until the retry call added to the queue in AdminClient.
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#6418 from omkreddy/KAFKA-7939
Metadata may be updated from the background thread, so we need to protect access to SubscriptionState. This patch restructures the metadata handling so that we only check pattern subscriptions in the foreground. Additionally, it improves the following:
1. SubscriptionState is now the source of truth for the topics that will be fetched. We had a lot of messy logic previously to try and keep the the topic set in Metadata consistent with the subscription, so this simplifies the logic.
2. The metadata needs for the producer and consumer are quite different, so it made sense to separate the custom logic into separate extensions of Metadata. For example, only the producer requires topic expiration.
3. We've always had an edge case in which a metadata change with an inflight request may cause us to effectively miss an expected update. This patch implements a separate version inside Metadata which is bumped when the needed topics changes.
4. This patch removes the MetadataListener, which was the cause of https://issues.apache.org/jira/browse/KAFKA-7764.
Reviewers: David Arthur <mumrah@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
In KAFKA-5503, we have added a check for `running` flag in the loop inside maybeWaitForProducerId. This is to handle concurrent call to Sender close(), while we attempt to get the ProducerId. This avoids blocking indefinitely when the producer is shutting down.
This created a corner case, where Sender thread gets blocked, if we had concurrent producerId reset and call to Sender thread close. The fix here is to check the `forceClose` flag in the loop inside maybeWaitForProducerId instead of the `running` flag.
Reviewers: Jason Gustafson <jason@confluent.io>
SelectorTest.testCloseConnectionInClosingState sends and receives messages to get the channel into a state with staged receives and then waits for idle timeout to close the channel. When run with SSL, the channel may have buffered bytes that prevent the channel from being closed. Updated the test to wait until there are no buffered bytes as well.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Test uses 100ms as connections.max.reauth.ms and checks that a second reauthentication doesn't occur within the hard-coded 1 second minimum interval. But since the interval is small, we cannot guarantee that the time between the two checks is not higher than 1 second. Change the test to use MockTime so that we can control the time.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
* Fix for KAFKA-7974: Avoid calling disconnect() when not connecting
* Resolve host only when currentAddress() is called
Moves away from automatically resolving the host when the connection entry is constructed, which can leave ClusterConnectionStates in a confused state.
Instead, resolution is done on demand, ensuring that the entry in the connection list is present even if the resolution failed.
* Add Javadoc to ClusterConnectionStates.connecting()
Refactors the various maps used in TransactionManager into one map to simplify bookkeeping of inflight batches, offsets and sequence numbers.
Reviewers: Jason Gustafson <jason@confluent.io>
* KAFKA-7962: StickyAssignor: throws NullPointerException during assignments if topic is deleted
https://issues.apache.org/jira/browse/KAFKA-7962
Consumer using StickyAssignor throws NullPointerException if a subscribed topic was removed.
* addressed vahidhashemian's comments
* lower NPath Complexity
* added a unit test
Whenever the consumer coordinator sends a response that doesn't match the client consumer subscription, we should check the subscription to see if it has changed. If it has, we can ignore the assignment and request a rebalance. Otherwise, we can throw an exception as before.
Testing strategy: create a mocked client that first sends an assignment response that doesn't match the client subscription followed by an assignment response that does match the client subscription.
Reviewers: Jason Gustafson <jason@confluent.io>
Currently, commitTransaction and abortTransaction wait indefinitely for the respective operation to be completed. This patch uses the producer's max block time to limit the time that we will wait. If the timeout elapses, we raise a TimeoutException, which allows the user to either close the producer or retry the operation.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Use of `MetadataRequest.isAllTopics` is not consistently defined for all versions of the api. For v0, it evaluates to false. This patch makes the behavior consistent for all versions.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Since we are logging offset resets and such at info level, it makes sense to use the same level for subscriptions and assignments.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Fail produce requests using zstd until the inter.broker.protocol.version is large enough that replicas are ensured to support it. Otherwise, followers receive the `UNSUPPORTED_COMPRESSION_TYPE` when fetching zstd data and ISRs shrink.
Reviewers: Jason Gustafson <jason@confluent.io>
Fix the following situations, where pending members (one that has a member-id, but hasn't joined the group) can cause rebalance operations to fail:
- In AbstractCoordinator, a pending consumer should be allowed to leave.
- A rebalance operation must successfully complete if a pending member either joins or times out.
- During a rebalance operation, a pending member must be able to leave a group.
Reviewers: Boyang Chen <bchen11@outlook.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
When debugging KafkaConsumer production issues, it's pretty
useful to have log entries related to seeking and committed
offset retrieval enabled by default. These are currently present,
but only when debug logging is enabled. Change them to `info`.
Also included a minor code simplication and a slight improvement
to an exception message.
Reviewers: Jason Gustafson <jason@confluent.io>
It used to preallocate an array of responses and then complete each response from the original collection sequentially. The problem was that the original collection could have been modified (another thread completing the response) while this was hapenning
Avoid unnecessary lock acquire when KafkaConsumer commits offsets.
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
As part of KIP-320, we should add the expected leader epoch to Fetch and ListOffsets requests. This will allow us ultimately to detect log truncation.
Reviewers: Jason Gustafson <jason@confluent.io>
- Include more detail in the client log message if the disconnection happens
during authentication.
- Include exception message in the Selector info entry when authentication
fails and unwrap `DelayedResponseAuthenticationException`.
- Remove duplicate debug log on authentication failure in the Selector.
Empty username or password would result in the "expected 3 tokens"
error instead of "username not specified" or "password not specified".
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
JUnit 4.13 fixes the issue where `Category` and `Parameterized` annotations
could not be used together. It also deprecates `ExpectedException` and
`assertThat`. Given this, we:
- Replace `ExpectedException` with the newly introduced `assertThrows`.
- Replace `Assert.assertThat` with `MatcherAssert.assertThat`.
- Annotate `AbstractLogCleanerIntegrationTest` with `IntegrationTest` category.
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, David Arthur <mumrah@gmail.com>
Don't return error messages from `SaslException` to clients. Error messages to be returned to clients to aid debugging must be thrown as AuthenticationExceptions. This is a fix for a regression from KAFKA-7352.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Ismael Juma <ismael@juma.me.uk
This helps narrow down the specific broker they came from when debugging
ACL propagation issues.
Reviewers: Vahid Hashemian <vahid.hashemian@gmail.com>, Ismael Juma <ismael@juma.me.uk>
When an older message format is in use, we should disable the leader epoch cache so that we resort to truncation by high watermark. Previously we updated the cache for all versions when a broker became leader for a partition. This can cause large and unnecessary truncations after leader changes because we relied on the presence of _any_ cached epoch in order to tell whether to use the improved truncation logic possible with the OffsetsForLeaderEpoch API.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Jun Rao <junrao@gmail.com>
Replaces original loginContext if login fails in the refresh thread to ensure that the refresh thread is left in a clean state when there are exceptions while connecting to an OAuth server. Also makes client callback handler more robust by using the token with the longest remaining time for expiry instead of throwing an exception if multiple tokens are found.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
This patch introduces a new config - "group.max.size", which caps the maximum size any group can reach. It has a default value of Int.MAX_VALUE. Once a group is of the maximum size, subsequent JoinGroup requests receive a MAX_SIZE_REACHED error.
In the case where the config is changed and a Coordinator broker with the new config loads an old group that is over the threshold, members are kicked out of the group and a rebalance is forced.
Reviewers: Vahid Hashemian <vahid.hashemian@gmail.com>, Boyang Chen <bchen11@outlook.com>, Gwen Shapira <cshapi@gmail.com>, Jason Gustafson <jason@confluent.io>