Rename the test suite to later add unit tests that don't depend on
ZK or the AdminClient TopiCommand types.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Co-authored-by: Andrew Olson <aolson1@cerner.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Once Scala 2.13.2 is officially released, I will submit a follow up PR
that enables `-Xfatal-warnings` with the necessary warning
exclusions. Compiler warning exclusions were only introduced in 2.13.2
and hence why we have to wait for that. I used a snapshot build to
test it in the meantime.
Changes:
* Remove Deprecated annotation from internal request classes
* Class.newInstance is deprecated in favor of
Class.getConstructor().newInstance
* Replace deprecated JavaConversions with CollectionConverters
* Remove unused kafka.cluster.Cluster
* Don't use Map and Set methods deprecated in 2.13:
- collection.Map +, ++, -, --, mapValues, filterKeys, retain
- collection.Set +, ++, -, --
* Add scala-collection-compat dependency to streams-scala and
update version to 2.1.4.
* Replace usages of deprecated Either.get and Either.right
* Replace usage of deprecated Integer(String) constructor
* `import scala.language.implicitConversions` is not needed in Scala 2.13
* Replace usage of deprecated `toIterator`, `Traversable`, `seq`,
`reverseMap`, `hasDefiniteSize`
* Replace usage of deprecated alterConfigs with incrementalAlterConfigs
where possible
* Fix implicit widening conversions from Long/Int to Double/Float
* Avoid implicit conversions to String
* Eliminate usage of deprecated procedure syntax
* Remove `println`in `LogValidatorTest` instead of fixing the compiler
warning since tests should not `println`.
* Eliminate implicit conversion from Array to Seq
* Remove unnecessary usage of 3 argument assertEquals
* Replace `toStream` with `iterator`
* Do not use deprecated SaslConfigs.DEFAULT_SASL_ENABLED_MECHANISMS
* Replace StringBuilder.newBuilder with new StringBuilder
* Rename AclBuffers to AclSeqs and remove usage of `filterKeys`
* More consistent usage of Set/Map in Controller classes: this also fixes
deprecated warnings with Scala 2.13
* Add spotBugs exclusion for inliner artifact in KafkaApis with Scala 2.12.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
We find that brokers may send empty assignment for some members unexpectedly, and would need more logs investigating this issue.
Reviewers: John Roesler <vvcephei@apache.org>
Measure the percentage ratio the stream thread spent on processing each task among all assigned active tasks (KIP-444). Also add unit tests to cover the added metrics in this PR and the previous #8358. Also trying to fix the flaky test reported in KAFKA-5842
Co-authored-by: John Roesler <vvcephei@apache.org>
Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>
This patch addresses a locking issue with DelayTxnMarker completion. Because of the reliance on the shared read lock in TransactionStateManager and the deadlock avoidance algorithm in `DelayedOperation`, we cannot guarantee that a call to checkAndComplete will offer an opportunity to complete the job. This patch removes the reliance on this lock in two ways:
1. We replace the transaction marker purgatory with a map of transaction with pending markers. We were not using purgatory expiration anyway, so this avoids the locking issue and simplifies usage.
2. We were also relying on the read lock for the `DelayedProduce` completion when calling `ReplicaManager.appendRecords`. As far as I can tell, this was not necessary. The lock order is always 1) state read/write lock, 2) txn metadata locks. Since we only call `appendRecords` while holding the read lock, a deadlock does not seem possible.
Reviewers: Jun Rao <junrao@gmail.com>
* Fixed DataException thrown when handling tombstone events with null value
* Passes through original record when finding a null key when it's configured for keys or a null value when it's configured for values.
* Added unit tests for schema and schemaless data
The tasks `unitTest` and `integrationTest` used to run tests don't exclude
the ```**/*Suite``` so the tests included by Suite class are executed twice.
For example:
```
11:42:25 org.apache.kafka.streams.integration.StoreQuerySuite > org.apache.kafka.streams.integration.QueryableStateIntegrationTest.shouldBeAbleToQueryMapValuesState STARTED
11:42:26
11:42:26 org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > shouldKStreamGlobalKTableJoin PASSED
11:42:30
11:42:30 org.apache.kafka.streams.integration.StoreQuerySuite > org.apache.kafka.streams.integration.QueryableStateIntegrationTest.shouldBeAbleToQueryMapValuesState PASSED
...
11:48:42 org.apache.kafka.streams.integration.QueryableStateIntegrationTest > shouldBeAbleToQueryMapValuesState STARTED
11:48:46
11:48:46 org.apache.kafka.streams.integration.QueryableStateIntegrationTest > shouldBeAbleToQueryMapValuesState PASSED
```
For consistency, move the existing exclusion for the `test` task from the
`streams` project to `subprojects`.
Reviewers: Ismael Juma <ismael@juma.me.uk>
In case of an error while flattening a record with schema, the Flatten transformation was reporting an error about a record without schema, as follows:
```
org.apache.kafka.connect.errors.DataException: Flatten transformation does not support ARRAY for record without schemas (for field ...)
```
The expected behaviour would be an error message specifying "with schemas".
This looks like a simple copy/paste typo from the schemaless equivalent methods, in the same file
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Konstantine Karantasis <konstantine@confluent.io>
Simple doc fix in a code snippet in connect.html
Co-authored-by: Scott Ferguson <smferguson@gmail.com>
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Konstantine Karantasis <konstantine@confluent.io>
Add missing @Override annotations and use lambdas when declaring threads to suppress warnings in IDEs and improve readability.
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>
This PR enhances the epoch checking logic for endTransaction call in TransactionCoordinator. Previously it relaxes the checking by allowing a producer epoch bump, which is error-prone since there is no reason to see a producer epoch bump from client.
Reviewers: Jason Gustafson <jason@confluent.io>
When a caching state store is closed it calls its flush() method.
If flush() throws an exception the underlying state store is not closed.
This commit ensures that state stores underlying a wrapped state stores
are closed even when preceding operations in the close method throw.
Co-authored-by: John Roesler <vvcephei@apache.org>
Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>
For reasons outlined in https://issues.apache.org/jira/browse/KAFKA-9771
we can't upgrade to a version of Jetty with the bug fixed, or downgrade to one prior to the introduction of the bug. Luckily, the actual fix is pretty straightforward and can be ported over to Connect for use until it's possible to upgrade to a version of Jetty with that bug fixed: https://github.com/eclipse/jetty.project/pull/4404/files#diff-58640db0f8f2cd84b7e653d1c1540913R2188-R2193
The changes here have been verified locally; a test with multiple certificates/multiple hostnames will be submitted in a follow up.
Reviewers: Jeff Huang <47870461+jeffhuang26@users.noreply.github.com>, Konstantine Karantasis <konstantine@confluent.io>
* delete topics before tearing down multi-node clusters to avoid leader elections during shutdown
* tear down all nodes concurrently instead of sequentially
Reviewers: Matthias J. Sax <matthias@confluent.io>
1. Within a single while loop, process the tasks in AAABBBCCC instead of ABCABCABC. This also helps the follow-up PR to time the per-task processing ratio to record less time, hence less overhead.
2. Add thread-level process / punctuate / poll / commit ratio metrics.
3. Fixed a few issues discovered (inline commented).
Reviewers: John Roesler <vvcephei@apache.org>
1. Defaults state-change log level to INFO.
2. INFO level state-change log includes (a) request level logging with just partition counts; (b) the leader/isr changes per partition in the controller and in the broker (reduced to mostly just 1 logging per partition).
Reviewers: Jun Rao <junrao@gmail.com>
* KAFKA-9707: Fix InsertField.Key not applying to tombstone events
* Fix typo that hardcoded .value() instead of abstract operatingValue
* Add test for Key transform that was previously not tested
Signed-off-by: Greg Harris <gregh@confluent.io>
* Add null value assertion to tombstone test
* Remove mis-named function and add test for passing-through a null-keyed record.
Signed-off-by: Greg Harris <gregh@confluent.io>
* Simplify unchanged record assertion
Signed-off-by: Greg Harris <gregh@confluent.io>
* Replace assertEquals with assertSame
Signed-off-by: Greg Harris <gregh@confluent.io>
* Fix checkstyleTest indent issue
Signed-off-by: Greg Harris <gregh@confluent.io>
A write lock is currently taken out whenever an ACL update is triggered. This update requires a round trip to ZK to get the ACLs for the resource (https://github.com/apache/kafka/pull/7882/files#diff-852b9cb2ceb2b85ec25b422f72c42620R489). This round trip to ZK can block any ACL lookups, which will block any requests and that require authorization, and their corresponding handler threads. This PR attempts to avoid these read locks by snapshotting the aclCache which is a threadsafe scala immutable.TreeMap.
Author: Lucas Bradstreet <lucas@confluent.io>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk
Closes#7882 from lbradstreet/acl-read-lock
KAFKA-7283 enabled lazy mmap on index files by initializing indices
on-demand rather than performing costly disk/memory operations when
creating all indices on broker startup. This helped reducing the startup
time of brokers. However, segment indices are still created on closing
segments, regardless of whether they need to be closed or not.
This is a cleaned up version of #7900, which was submitted by @efeg. It
eliminates unnecessary disk accesses and memory map operations while
deleting, renaming or closing offset and time indexes.
In a cluster with 31 brokers, where each broker has 13K to 20K segments,
@efeg and team observed up to 2 orders of magnitude faster LogManager
shutdown times - i.e. dropping the LogManager shutdown time of each
broker from 10s of seconds to 100s of milliseconds.
To avoid confusion between `renameTo` and `setFile`, I replaced the
latter with the more restricted updateParentDir` (it turns out that's
all we need).
Reviewers: Jun Rao <junrao@gmail.com>, Andrew Choi <a24choi@edu.uwaterloo.ca>
Co-authored-by: Adem Efe Gencer <agencer@linkedin.com>
Co-authored-by: Ismael Juma <ismael@juma.me.uk>
This PR works to improve high watermark checkpointing performance.
`ReplicaManager.checkpointHighWatermarks()` was found to be a major contributor to GC pressure, especially on Kafka clusters with high partition counts and low throughput.
Added a JMH benchmark for `checkpointHighWatermarks` which establishes a
performance baseline. The parameterized benchmark was run with 100, 1000 and
2000 topics.
Modified `ReplicaManager.checkpointHighWatermarks()` to avoid extra copies and cached
the Log parent directory Sting to avoid frequent allocations when calculating
`File.getParent()`.
A few clean-ups:
* Changed all usages of Log.dir.getParent to Log.parentDir and Log.dir.getParentFile to
Log.parentDirFile.
* Only expose public accessor for `Log.dir` (consistent with `Log.parentDir`)
* Removed unused parameters in `Partition.makeLeader`, `Partition.makeFollower` and `Partition.createLogIfNotExists`.
Benchmark results:
| Topic Count | Ops/ms | MB/sec allocated |
|-------------|---------|------------------|
| 100 | + 51% | - 91% |
| 1000 | + 143% | - 49% |
| 2000 | + 149% | - 50% |
Reviewers: Lucas Bradstreet <lucas@confluent.io>. Ismael Juma <ismael@juma.me.uk>
Co-authored-by: Gardner Vickers <gardner@vickers.me>
Co-authored-by: Ismael Juma <ismael@juma.me.uk>
Startup scripts for the early version of Kafka contain removed JVM options like `-XX:+PrintGCDateStamps` or `-XX:UseParNewGC`.
When system tests run on JVM that doesn't support these options we should set up
environment variables with correct options.
Reviewers: Guozhang Wang <guozhang@confluent.io>, Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk
Older versions of the JoinGroup rely on a new member timeout to keep the group from growing indefinitely in the case of client disconnects and retrying. The logic for resetting the heartbeat expiration task following completion of the rebalance failed to account for an implicit expectation that shouldKeepAlive would return false the first time it is invoked when a heartbeat expiration is scheduled. This patch fixes the issue by making heartbeat satisfaction logic explicit.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
There are cases where `currentEstimation` is less than
`COMPRESSION_RATIO_IMPROVING_STEP` causing
`estimatedCompressionRatio` to be negative. This, in turn,
may result in `MESSAGE_TOO_LARGE`.
Reviewers: Ismael Juma <ismael@juma.me.uk>
SSLEngine#beginHandshake is possible to throw authentication failures (for example, no suitable cipher suites) so we ought to catch SSLException and then convert it to SslAuthenticationException so as to process authentication failures correctly.
Reviewers: Jun Rao <junrao@gmail.com>
* The Herder interface is extended with a default method that allows choosing whether to log all the connector configurations during connector validation or not.
* The `PUT /connector-plugins/{connector-type}/config/validate` is modified to stop logging the connector's configurations when a validation request is hitting this endpoint. Validations during creation or reconfiguration of a connector are still logging all the connector configurations at the INFO level, which is useful in general and during troubleshooting in particular.
Co-authored-by: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ismael Juma <github@juma.me.uk>
This algorithm assigns tasks to clients and tries to
- balance the distribution of the partitions of the
same input topic over stream threads and clients,
i.e., data parallel workload balance
- balance the distribution of work over stream threads.
The algorithm does not take into account potentially existing states
on the client.
The assignment is considered balanced when the difference in
assigned tasks between the stream thread with the most tasks and
the stream thread with the least tasks does not exceed a given
balance factor.
The algorithm prioritizes balance over stream threads
higher than balance over clients.
Reviewers: John Roesler <vvcephei@apache.org>
When handling a WriteTxnResponse, the TransactionMarkerRequestCompletionHandler throws an IllegalStateException when the remote broker responds with a KAFKA_STORAGE_ERROR and does not retry the request. This leaves the transaction state stuck in PendingAbort or PendingCommit, with no way to change that state other than restarting the broker, because both EndTxnRequest and InitProducerIdRequest return CONCURRENT_TRANSACTIONS if the state is PendingAbort or PendingCommit. This patch changes the error handling behavior in TransactionMarkerRequestCompletionHandler to retry KAFKA_STORAGE_ERRORs. This matches the existing client behavior, and makes sense because KAFKA_STORAGE_ERROR causes the host broker to shut down, meaning that the partition being written to will move its leadership to a new, healthy broker.
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
And hence restore / global consumers should never expect FencedInstanceIdException.
When such exception is thrown, it means there's another instance with the same instance.id taken over, and hence we should treat it as fatal and let this instance to close out instead of handling as task-migrated.
Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Simple fix to return in all cases an immutable list from poll in SchemaSourceTask.
Co-authored-by: David Mollitor <dmollitor@apache.org>
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <github@juma.me.uk>, Konstantine Karantasis <konstantine@confluent.io>