Fix the direct cause of the observed issue on the client side: when heartbeat getting errors and resetting generation, we only need to set it to UNJOINED when it was not already in REBALANCING; otherwise, the join-group handler would throw the retriable UnjoinedGroupException to force the consumer to re-send join group unnecessarily.
Fix the root cause of the issue on the broker side: we should still trigger rebalance when static member joins in CompletingRebalance phase; otherwise the member.ids would be changed when the assignment is received from the leader, hence causing the new member.id's assignment to be empty.
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
Invoke `waitForQuotaUpdate` after the quotas are removed. It also changes
the default request quota to `Long.MaxValue`.
Reviewers: Anna Povzner <anna@confluent.io>, Ismael Juma <ismael@juma.me.uk>
90bbeedf52 introduced a regression resulting in passing an action per resource
name to the `Authorizer` instead of passing one per unique resource name. Refactor
the signatures of both `filterAuthorized` and `authorize` to make them easier to test
and add a test for each.
Reviewers: Ismael Juma <ismael@juma.me.uk>
The upper limit offset is displayed incorrectly in the log cleaner summary message. For example:
```
Log cleaner thread 0 cleaned log __consumer_offsets-47 (dirty section = [358800359, 358800359])
```
We should be using the next dirty offset as the upper limit.
Reviewers: David Arthur <mumrah@gmail.com>
As title suggested, consumers would first do an OffsetFetch before starting the normal processing. It makes sense to add it to the concurrent test suite to verify whether there would be a blocking behavior.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
If the high-watermark is updated in the middle of a read with the `read_committed` isolation level, it is possible to return data above the LSO. In the worst case, this can lead to the read of an aborted transaction. The root cause is that the logic depends on reading the high-watermark twice. We fix the problem by reading it once and caching the value.
Reviewers: David Arthur <mumrah@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
There is a race on receiving a LeaderAndIsr request for a replica with an active log dir reassignment. If the reassignment completes just before the LeaderAndIsr handler updates epoch information, it can lead to an illegal state error since no future log dir exists. This patch fixes the problem by ensuring that the future log dir exists when the fetcher is started. Removal cannot happen concurrently because it requires access the same partition state lock.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
Co-authored-by: Chia-Ping Tsai <chia7712@gmail.com>
The runtime type of Metric.metricValue() needn't always be a Double,
for example, if it's a gauge from IntGaugeSuite.
Since it's impossible to format non-double values with 3 point precision
IllegalFormatConversionException resulted.
Author: Tom Bentley <tbentley@redhat.com>
Author: Tom Bentley <tombentley@users.noreply.github.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#8373 from tombentley/KAFKA-9775-IllegalFormatConversionException
Revert the decision for the sendOffsetsToTransaction(groupMetadata) API to fail with old version of brokers for the sake of making the application easier to adapt between versions. This PR silently downgrade the TxnOffsetCommit API when the build version is small than 3.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Rename the test suite to later add unit tests that don't depend on
ZK or the AdminClient TopiCommand types.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Once Scala 2.13.2 is officially released, I will submit a follow up PR
that enables `-Xfatal-warnings` with the necessary warning
exclusions. Compiler warning exclusions were only introduced in 2.13.2
and hence why we have to wait for that. I used a snapshot build to
test it in the meantime.
Changes:
* Remove Deprecated annotation from internal request classes
* Class.newInstance is deprecated in favor of
Class.getConstructor().newInstance
* Replace deprecated JavaConversions with CollectionConverters
* Remove unused kafka.cluster.Cluster
* Don't use Map and Set methods deprecated in 2.13:
- collection.Map +, ++, -, --, mapValues, filterKeys, retain
- collection.Set +, ++, -, --
* Add scala-collection-compat dependency to streams-scala and
update version to 2.1.4.
* Replace usages of deprecated Either.get and Either.right
* Replace usage of deprecated Integer(String) constructor
* `import scala.language.implicitConversions` is not needed in Scala 2.13
* Replace usage of deprecated `toIterator`, `Traversable`, `seq`,
`reverseMap`, `hasDefiniteSize`
* Replace usage of deprecated alterConfigs with incrementalAlterConfigs
where possible
* Fix implicit widening conversions from Long/Int to Double/Float
* Avoid implicit conversions to String
* Eliminate usage of deprecated procedure syntax
* Remove `println`in `LogValidatorTest` instead of fixing the compiler
warning since tests should not `println`.
* Eliminate implicit conversion from Array to Seq
* Remove unnecessary usage of 3 argument assertEquals
* Replace `toStream` with `iterator`
* Do not use deprecated SaslConfigs.DEFAULT_SASL_ENABLED_MECHANISMS
* Replace StringBuilder.newBuilder with new StringBuilder
* Rename AclBuffers to AclSeqs and remove usage of `filterKeys`
* More consistent usage of Set/Map in Controller classes: this also fixes
deprecated warnings with Scala 2.13
* Add spotBugs exclusion for inliner artifact in KafkaApis with Scala 2.12.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
We find that brokers may send empty assignment for some members unexpectedly, and would need more logs investigating this issue.
Reviewers: John Roesler <vvcephei@apache.org>
This patch addresses a locking issue with DelayTxnMarker completion. Because of the reliance on the shared read lock in TransactionStateManager and the deadlock avoidance algorithm in `DelayedOperation`, we cannot guarantee that a call to checkAndComplete will offer an opportunity to complete the job. This patch removes the reliance on this lock in two ways:
1. We replace the transaction marker purgatory with a map of transaction with pending markers. We were not using purgatory expiration anyway, so this avoids the locking issue and simplifies usage.
2. We were also relying on the read lock for the `DelayedProduce` completion when calling `ReplicaManager.appendRecords`. As far as I can tell, this was not necessary. The lock order is always 1) state read/write lock, 2) txn metadata locks. Since we only call `appendRecords` while holding the read lock, a deadlock does not seem possible.
Reviewers: Jun Rao <junrao@gmail.com>
This PR enhances the epoch checking logic for endTransaction call in TransactionCoordinator. Previously it relaxes the checking by allowing a producer epoch bump, which is error-prone since there is no reason to see a producer epoch bump from client.
Reviewers: Jason Gustafson <jason@confluent.io>
1. Defaults state-change log level to INFO.
2. INFO level state-change log includes (a) request level logging with just partition counts; (b) the leader/isr changes per partition in the controller and in the broker (reduced to mostly just 1 logging per partition).
Reviewers: Jun Rao <junrao@gmail.com>
A write lock is currently taken out whenever an ACL update is triggered. This update requires a round trip to ZK to get the ACLs for the resource (https://github.com/apache/kafka/pull/7882/files#diff-852b9cb2ceb2b85ec25b422f72c42620R489). This round trip to ZK can block any ACL lookups, which will block any requests and that require authorization, and their corresponding handler threads. This PR attempts to avoid these read locks by snapshotting the aclCache which is a threadsafe scala immutable.TreeMap.
Author: Lucas Bradstreet <lucas@confluent.io>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk
Closes#7882 from lbradstreet/acl-read-lock
KAFKA-7283 enabled lazy mmap on index files by initializing indices
on-demand rather than performing costly disk/memory operations when
creating all indices on broker startup. This helped reducing the startup
time of brokers. However, segment indices are still created on closing
segments, regardless of whether they need to be closed or not.
This is a cleaned up version of #7900, which was submitted by @efeg. It
eliminates unnecessary disk accesses and memory map operations while
deleting, renaming or closing offset and time indexes.
In a cluster with 31 brokers, where each broker has 13K to 20K segments,
@efeg and team observed up to 2 orders of magnitude faster LogManager
shutdown times - i.e. dropping the LogManager shutdown time of each
broker from 10s of seconds to 100s of milliseconds.
To avoid confusion between `renameTo` and `setFile`, I replaced the
latter with the more restricted updateParentDir` (it turns out that's
all we need).
Reviewers: Jun Rao <junrao@gmail.com>, Andrew Choi <a24choi@edu.uwaterloo.ca>
Co-authored-by: Adem Efe Gencer <agencer@linkedin.com>
Co-authored-by: Ismael Juma <ismael@juma.me.uk>
This PR works to improve high watermark checkpointing performance.
`ReplicaManager.checkpointHighWatermarks()` was found to be a major contributor to GC pressure, especially on Kafka clusters with high partition counts and low throughput.
Added a JMH benchmark for `checkpointHighWatermarks` which establishes a
performance baseline. The parameterized benchmark was run with 100, 1000 and
2000 topics.
Modified `ReplicaManager.checkpointHighWatermarks()` to avoid extra copies and cached
the Log parent directory Sting to avoid frequent allocations when calculating
`File.getParent()`.
A few clean-ups:
* Changed all usages of Log.dir.getParent to Log.parentDir and Log.dir.getParentFile to
Log.parentDirFile.
* Only expose public accessor for `Log.dir` (consistent with `Log.parentDir`)
* Removed unused parameters in `Partition.makeLeader`, `Partition.makeFollower` and `Partition.createLogIfNotExists`.
Benchmark results:
| Topic Count | Ops/ms | MB/sec allocated |
|-------------|---------|------------------|
| 100 | + 51% | - 91% |
| 1000 | + 143% | - 49% |
| 2000 | + 149% | - 50% |
Reviewers: Lucas Bradstreet <lucas@confluent.io>. Ismael Juma <ismael@juma.me.uk>
Co-authored-by: Gardner Vickers <gardner@vickers.me>
Co-authored-by: Ismael Juma <ismael@juma.me.uk>
Older versions of the JoinGroup rely on a new member timeout to keep the group from growing indefinitely in the case of client disconnects and retrying. The logic for resetting the heartbeat expiration task following completion of the rebalance failed to account for an implicit expectation that shouldKeepAlive would return false the first time it is invoked when a heartbeat expiration is scheduled. This patch fixes the issue by making heartbeat satisfaction logic explicit.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
When handling a WriteTxnResponse, the TransactionMarkerRequestCompletionHandler throws an IllegalStateException when the remote broker responds with a KAFKA_STORAGE_ERROR and does not retry the request. This leaves the transaction state stuck in PendingAbort or PendingCommit, with no way to change that state other than restarting the broker, because both EndTxnRequest and InitProducerIdRequest return CONCURRENT_TRANSACTIONS if the state is PendingAbort or PendingCommit. This patch changes the error handling behavior in TransactionMarkerRequestCompletionHandler to retry KAFKA_STORAGE_ERRORs. This matches the existing client behavior, and makes sense because KAFKA_STORAGE_ERROR causes the host broker to shut down, meaning that the partition being written to will move its leadership to a new, healthy broker.
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
This patch modifies the loading time metric to account for time spent waiting for the loading time task to be scheduled.
Reviewers: Jason Gustafson <jason@confluent.io>
Rewrite ReassignPartitionsCommand to use the KIP-455 API when possible, rather
than direct communication with ZooKeeper. Direct ZK access is still supported,
but deprecated, as described in KIP-455.
As specified in KIP-455, the tool has several new flags. --cancel stops
an assignment which is in progress. --preserve-throttle causes the
--verify and --cancel commands to leave the throttles alone.
--additional allows users to execute another partition assignment even
if there is already one in progress. Finally, --show displays all of
the current partition reassignments.
Reorganize the reassignment code and tests somewhat to rely more on unit
testing using the MockAdminClient and less on integration testing. Each
integration test where we bring up a cluster seems to take about 5 seconds, so
it's good when we can get similar coverage from unit tests. To enable this,
MockAdminClient now supports incrementalAlterConfigs, alterReplicaLogDirs,
describeReplicaLogDirs, and some other APIs. MockAdminClient is also now
thread-safe, to match the real AdminClient implementation.
In DeleteTopicTest, use the KIP-455 API rather than invoking the reassignment
command.
Currently when there is a leader change with a log dir reassignment in progress, we do not update the leader epoch in the partition state maintained by `ReplicaAlterLogDirsThread`. This can lead to a FENCED_LEADER_EPOCH error, which results in the partition being marked as failed, which is a permanent failure until the broker is restarted. This patch fixes the problem by updating the epoch in `ReplicaAlterLogDirsThread` after receiving a new LeaderAndIsr request from the controller.
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
The `TxnOffsetCommit` API suffers from a bug affecting older client versions which treat `COORDINATOR_LOAD_IN_PROGRESS` errors as fatal. This PR changes the handling on the broker to instead return `COORDINATOR_NOT_AVAILABLE` in this case so that clients won't crash upon doing txn commit.
Reviewers: Jason Gustafson <jason@confluent.io>
* Broker throttles were incorrectly marked as sensitive configurations. Fix this, so that their values can be returned via DescribeConfigs as expected.
* Previously, changes to broker configs that consisted only of deletions were ignored by the brokers because of faulty delta calculation logic that didn't consider deletions as changes, only alterations as changes. Fix this and add a regression test.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
In prepareAddPartitions the txnStartTimestamp could be updated as updateTimestamp, which is assumed to be always larger then the original startTimestamp. However, due to ntp time shift the timer may go backwards and hence the newStartTimestamp be smaller than the original one. Then later in completeTransitionTo the time check would fail with an IllegalStateException, and the txn would not transit to Ongoing.
An indirect result of this, is that this txn would NEVER be expired anymore because only Ongoing ones would be checked for expiration.
We should do the same as in #3286 to remove this check.
Also added test coverage for both KAFKA-5415 and KAFKA-8803.
Reviewers: Jason Gustafson<jason@confluent.io>
This PR removes intermediate conversions between `MetadataResponse.TopicMetadata` => `MetadataResponseTopic` and `MetadataResponse.PartitionMetadata` => `MetadataResponsePartition` objects.
There is 15-20% reduction in object allocations and 5-10% improvement in metadata request performance.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson<jason@confluent.io>
When we changed quota communication with KIP-219, fetch requests get throttled by returning empty response with the delay in throttle_time_ms and Kafka consumer retries again after the delay. With default configs, the maximum fetch size could be as big as 50MB (or 10MB per partition). The default broker config (1-second window, 10 full windows of tracked bandwidth/thread utilization usage) means that < 5MB/s consumer quota (per broker) may block consumers from being able to fetch any data.
This PR ensures that consumers cannot get blocked by quota by capping fetchMaxBytes in KafkaApis.handleFetchRequest() to quota window * consume bandwidth quota. In the example of default configs (10-second quota window) and 1MB/s consumer bandwidth quota, fetchMaxBytes would be capped to 10MB.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
To dismiss the usage of operation ++ against Set which is slow when Set has many entries. This pr introduces a new class 'AclSets' which takes multiple Sets as parameters and do 'find' against them one by one. For more details about perf and benchmark, refer to [KAFKA-9685](https://issues.apache.org/jira/browse/KAFKA-9685)
Author: jiao <jiao.zhang@linecorp.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8261 from jiao-zhangS/jira-9685
This PR tries to add an internal flag to throw if we hit an unexpected protocol version for offset fetch. It could be used together with EOS_BETA flag so that if server side downgrades unexpectedly, we shall fail the application ASAP.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Problem
----
The `incrementalAlterConfigs` API supports OpType.APPEND and OpType.SUBTRACT for configuration properties of LIST type. If an APPEND or SUBTRACT OpType is submitted for a config property which currently has no value, then the operation fails with a NullPointerException on the broker side (conveyed as an "unknown server error" to the client).
This is because the alter code does a `getProperty` of the existing configuration value
with no concern as to whether or not the property actually exists.
This change handles the case of existing null properties.
Testing
-----
This change includes 2 test cases in the unit test that demonstrate the issue for OpType.SUBTRACT and OpType.APPEND.
Author: Steve Rodrigues <srodrigues@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Bob Barrett <bob.barrett@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#8216 from steverod/steverod.kafka-9644
Throw InvalidRequestException if null configs are specified for CreateTopics, AlterConfigs or IncrementalAlterConfigs.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Adding (add-config) default user, user, or <user, client-id> quota and then removing it via delete-config does not update quota bound in ClientQuotaManager.Metrics for existing users or <user,client-id>. This causes brokers to continue to throttle with the previously set quotas until brokers restart (or <user,client> stops sending traffic for sometime and sensor expires). This happens only when removing the user or user,client-id where there are no more quotas to fall back to. Common example where the issue happens: Initial no quota state --> add default user quota --> remove default user quota.
The cause of the issue was `DefaultQuotaCallback.quotaLimit` was returning `null` when no default user quota set, which caused `ClientQuotaManager.updateQuotaMetricConfigs` to skip updating the appropriate sensor, which left it unchanged with the previous quota. Since `null` is an acceptable return value for `ClientQuotaCallback.quotaLimit`, which is already treated as unlimited quota in other parts of the code, this PR ensures that `ClientQuotaManager.updateQuotaMetricConfigs` updates the quotas for which `ClientQuotaCallback.quotaLimit` returns `null` to unlimited quota.
Reviewers: Jason Gustafson <jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>