Previously the InMemoryKeyValue store would throw a ConcurrentModificationException if the store was modified beneath an open iterator. The TreeMap implementation was swapped with a ConcurrentSkipListMap for similar performance while supporting concurrent access.
Added one test to AbstractKeyValueStoreTest, no existing tests caught this.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This patch adds several new log messages to provide more information about errors during log dir movement and to make it clear when each partition movement is finished.
Reviewers: Jason Gustafson <jason@confluent.io>
* KAFKA-7962: StickyAssignor: throws NullPointerException during assignments if topic is deleted
https://issues.apache.org/jira/browse/KAFKA-7962
Consumer using StickyAssignor throws NullPointerException if a subscribed topic was removed.
* addressed vahidhashemian's comments
* lower NPath Complexity
* added a unit test
Second PR in series to inline the generic parameters of the following bytes stores
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
In some test cases it's desirable to instantiate a subclass of `ShutdownableThread` without starting it. Since most subclasses of `ShutdownableThread` put cleanup logic in `ShutdownableThread.shutdown()`, being able to call `shutdown()` on the non-running thread would be useful.
This change allows us to avoid blocking in `ShutdownableThread.shutdown()` if the thread's `run()` method has not been called. We also add a check that `initiateShutdown()` was called before `awaitShutdown()`, to protect against the case where a user calls `awaitShutdown()` before the thread has been started, and unexpectedly is not blocked on the thread shutting down.
Reviewers : Dhruvil Shah <dhruvil@confluent.io>, Jun Rao <junrao@gmail.com>
In order to debug problems with log directory reassignments, it is helpful to know when the fetcher thread begins moving a particular partition. This patch refactors the fetch logic so that we stick to a selected partition as long as it is available and log a message when a different partition is selected.
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Dong Lin <lindong28@gmail.com>, Jun Rao <junrao@gmail.com>
I found this defect while inspecting [KAFKA-7293: Merge followed by groupByKey/join might violate co-partitioning](https://issues.apache.org/jira/browse/KAFKA-7293); This flag is never used now. Instead, `KStreamImpl#repartitionRequired` is now covering its functionality.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
MINOR: Increase produce timeout for EmbeddedKafkaCluster to 120 seconds
Previous value was 500ms. This change gives more room to pass tests on systems with low resources running many parallel tests.
Reviewers: Randall Hauch <randall@confluent.io>
First PR in series to inline the generic parameters of the following bytes stores:
[x] InMemoryKeyValueStore
[ ] RocksDBWindowStore
[ ] RocksDBSessionStore
[ ] MemoryLRUCache
[ ] MemoryNavigableLRUCache
[ ] (awaiting merge) InMemoryWindowStore
A number of tests took advantage of the generic InMemoryKeyValueStore and had to be reworked somewhat -- this PR covers everything related to the in-memory key-value store.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
This fix is aiming for #2 issue pointed out within https://issues.apache.org/jira/browse/KAFKA-7672
In the current setup, we do offset checkpoint file write when EOS is turned on during #suspend, which introduces the potential race condition during StateManager #closeSuspend call. To mitigate the problem, we attempt to always write checkpoint file in #suspend call.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Whenever the consumer coordinator sends a response that doesn't match the client consumer subscription, we should check the subscription to see if it has changed. If it has, we can ignore the assignment and request a rebalance. Otherwise, we can throw an exception as before.
Testing strategy: create a mocked client that first sends an assignment response that doesn't match the client subscription followed by an assignment response that does match the client subscription.
Reviewers: Jason Gustafson <jason@confluent.io>
* In activeTasks.suspend, we should also close all restoring tasks as well. Closing restoring tasks would not require `task.close` as in `closeNonRunningTasks `, since the topology is not initialized yet, instead only state stores are initialized. So we only need to call `task.closeStateManager`.
* Also add @linyli001 's fix.
* Unit tests updated accordingly.
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>
This is an update to the existing javadocs for KGroupedStream class.
Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Currently, commitTransaction and abortTransaction wait indefinitely for the respective operation to be completed. This patch uses the producer's max block time to limit the time that we will wait. If the timeout elapses, we raise a TimeoutException, which allows the user to either close the producer or retry the operation.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Use of `MetadataRequest.isAllTopics` is not consistently defined for all versions of the api. For v0, it evaluates to false. This patch makes the behavior consistent for all versions.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Since we are logging offset resets and such at info level, it makes sense to use the same level for subscriptions and assignments.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Removed quotes from LogDir variable generation as there are additional quotes in Line 127.
This caused problems when those batch files are invoked from a path that contains space characters.
Per the KIP-263 discussion, we think we can improve broker restart time by avoiding performing costly disk operations when sanity checking index files for segments below recovery point on broker startup.
This PR includes the following changes:
1. Mmap the index file and populate fields of the index file on-demand rather than performing costly disk operations when creating the index object on broker startup.
2. Skip sanity checks on the time index and offset index of segments.
1. For segment with offset below the flushed point (recovery point), these segments are safely flushed so we don't need to sanity check the index files. if there are indeed data corruption on disk, given that we don't sanity check the segment file, sanity checking only the indexes adds little benefit.
2. For segment with offset above the flushed point (recovery point), we will recover these segments in `recoveryLog()` (Log.scala) in any case so sanity checking the index files for these segments is redundant.
We did experiments on a cluster with 15 brokers, each of which has ~3k segments (and there are 31.8k partitions with RF=3 which are evenly distributed across brokers; total bytes-in-rate is around 400 MBps). The results show that rolling bounce time reduces from 135 minutes to 55 minutes.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Implemented an in-memory window store allowing for range queries. A finite retention period defines how long records will be kept, ie the window of time for fetching, and the grace period defines the window within which late-arriving data may still be written to the store.
Unit tests were written to test the functionality of the window store, including its insert/update/delete and fetch operations. Single-record, all records, and range fetch were tested, for both time ranges and key ranges. The logging and metrics for late-arriving (dropped)records were tested as well as the ability to restore from a changelog.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This PR fixes the issue found in the soak testing cluster regarding using RocksDBTimestampedStore when a regular RocksDB store should have been used.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>
The StreamsBrokerBounceTest.test_broker_type_bounce experienced what looked like a transient failure. After looking over this test and failure, it seems like it is vulnerable to timing error that streams will start before the kafka service creates all topics.
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>
Fail produce requests using zstd until the inter.broker.protocol.version is large enough that replicas are ensured to support it. Otherwise, followers receive the `UNSUPPORTED_COMPRESSION_TYPE` when fetching zstd data and ISRs shrink.
Reviewers: Jason Gustafson <jason@confluent.io>
Fix the following situations, where pending members (one that has a member-id, but hasn't joined the group) can cause rebalance operations to fail:
- In AbstractCoordinator, a pending consumer should be allowed to leave.
- A rebalance operation must successfully complete if a pending member either joins or times out.
- During a rebalance operation, a pending member must be able to leave a group.
Reviewers: Boyang Chen <bchen11@outlook.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
- Compare last offset of first batch (instead of first offset) with index offset
- Early exit from loop due to zero entries must happen before checking for mismatch
- {TimeIndex,OffsetIndex}.entry should return absolute offset like other methods.
These methods are only used by DumpLogSegments.
- DumpLogSegments now calls `closeHandlers` on OffsetIndex, TimeIndex
and FileRecords.
- Add OffsetIndex, TimeIndex and DumpLogSegments tests
- Remove unnecessary casts by using covariant returns in OffsetIndex and TimeIndex
- Minor clean-ups
- Fix `checkArgs` so that it does what it says only.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Sriharsha Chintalapani <sriharsha@apache.org>
Replaced `forall` with `exists`. Added a unit test to `KafkaApisTest` that failed before the change.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
We identified that we spend a lot of time in the creation of Logger instances
when creating OffsetIndex/TimeIndex due to the Logging mixin.
When the broker is bootstrapping it's just doing this in a tight loop, so the
time adds up.
This patch moves the logger to the companion objects of OffsetIndex,
TimeIndex and AbstractIndex resolving this issue.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Co-authored-by: Kyle Ambroff <kyle@ambroff.com>
Co-authored-by: Ismael Juma <ismael@juma.me.uk>