Third (and final) PR in series to inline the generic parameters of the following bytes stores:
[Pt. I] InMemoryKeyValueStore
[Pt. II] RocksDBWindowStore
[Pt. II] RocksDBSessionStore
[Pt. II] MemoryLRUCache
[Pt. II] MemoryNavigableLRUCache
[x] InMemoryWindowStore
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This patch fixes a regression in the replica fetcher which occurs when the replica fetcher manager simultaneously calls `removeFetcherForPartitions`, removing the corresponding partitionStates, while a replica fetcher thread attempts to truncate the same partition(s) in `truncateToHighWatermark`. This causes an NPE which causes the fetcher to crash.
This change simply checks that the `partitionState` is not null first. Note that a similar guard exists in `truncateToEpochEndOffsets`.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
In the RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated() and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted() a race condition exists where the ConsumerRebalanceListener in the test modifies the list of subscribed topics when the condition for the test success is comparing the same array instance against expected values.
This PR should fix this race condition by using a CopyOnWriteArrayList which guarantees safe traversal of the list even when a concurrent modification is happening.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Previously the InMemoryKeyValue store would throw a ConcurrentModificationException if the store was modified beneath an open iterator. The TreeMap implementation was swapped with a ConcurrentSkipListMap for similar performance while supporting concurrent access.
Added one test to AbstractKeyValueStoreTest, no existing tests caught this.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This patch adds several new log messages to provide more information about errors during log dir movement and to make it clear when each partition movement is finished.
Reviewers: Jason Gustafson <jason@confluent.io>
* KAFKA-7962: StickyAssignor: throws NullPointerException during assignments if topic is deleted
https://issues.apache.org/jira/browse/KAFKA-7962
Consumer using StickyAssignor throws NullPointerException if a subscribed topic was removed.
* addressed vahidhashemian's comments
* lower NPath Complexity
* added a unit test
Second PR in series to inline the generic parameters of the following bytes stores
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
In some test cases it's desirable to instantiate a subclass of `ShutdownableThread` without starting it. Since most subclasses of `ShutdownableThread` put cleanup logic in `ShutdownableThread.shutdown()`, being able to call `shutdown()` on the non-running thread would be useful.
This change allows us to avoid blocking in `ShutdownableThread.shutdown()` if the thread's `run()` method has not been called. We also add a check that `initiateShutdown()` was called before `awaitShutdown()`, to protect against the case where a user calls `awaitShutdown()` before the thread has been started, and unexpectedly is not blocked on the thread shutting down.
Reviewers : Dhruvil Shah <dhruvil@confluent.io>, Jun Rao <junrao@gmail.com>
In order to debug problems with log directory reassignments, it is helpful to know when the fetcher thread begins moving a particular partition. This patch refactors the fetch logic so that we stick to a selected partition as long as it is available and log a message when a different partition is selected.
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Dong Lin <lindong28@gmail.com>, Jun Rao <junrao@gmail.com>
I found this defect while inspecting [KAFKA-7293: Merge followed by groupByKey/join might violate co-partitioning](https://issues.apache.org/jira/browse/KAFKA-7293); This flag is never used now. Instead, `KStreamImpl#repartitionRequired` is now covering its functionality.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
MINOR: Increase produce timeout for EmbeddedKafkaCluster to 120 seconds
Previous value was 500ms. This change gives more room to pass tests on systems with low resources running many parallel tests.
Reviewers: Randall Hauch <randall@confluent.io>
First PR in series to inline the generic parameters of the following bytes stores:
[x] InMemoryKeyValueStore
[ ] RocksDBWindowStore
[ ] RocksDBSessionStore
[ ] MemoryLRUCache
[ ] MemoryNavigableLRUCache
[ ] (awaiting merge) InMemoryWindowStore
A number of tests took advantage of the generic InMemoryKeyValueStore and had to be reworked somewhat -- this PR covers everything related to the in-memory key-value store.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
This fix is aiming for #2 issue pointed out within https://issues.apache.org/jira/browse/KAFKA-7672
In the current setup, we do offset checkpoint file write when EOS is turned on during #suspend, which introduces the potential race condition during StateManager #closeSuspend call. To mitigate the problem, we attempt to always write checkpoint file in #suspend call.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Whenever the consumer coordinator sends a response that doesn't match the client consumer subscription, we should check the subscription to see if it has changed. If it has, we can ignore the assignment and request a rebalance. Otherwise, we can throw an exception as before.
Testing strategy: create a mocked client that first sends an assignment response that doesn't match the client subscription followed by an assignment response that does match the client subscription.
Reviewers: Jason Gustafson <jason@confluent.io>
* In activeTasks.suspend, we should also close all restoring tasks as well. Closing restoring tasks would not require `task.close` as in `closeNonRunningTasks `, since the topology is not initialized yet, instead only state stores are initialized. So we only need to call `task.closeStateManager`.
* Also add @linyli001 's fix.
* Unit tests updated accordingly.
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>
This is an update to the existing javadocs for KGroupedStream class.
Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Currently, commitTransaction and abortTransaction wait indefinitely for the respective operation to be completed. This patch uses the producer's max block time to limit the time that we will wait. If the timeout elapses, we raise a TimeoutException, which allows the user to either close the producer or retry the operation.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Use of `MetadataRequest.isAllTopics` is not consistently defined for all versions of the api. For v0, it evaluates to false. This patch makes the behavior consistent for all versions.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Since we are logging offset resets and such at info level, it makes sense to use the same level for subscriptions and assignments.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Removed quotes from LogDir variable generation as there are additional quotes in Line 127.
This caused problems when those batch files are invoked from a path that contains space characters.
Per the KIP-263 discussion, we think we can improve broker restart time by avoiding performing costly disk operations when sanity checking index files for segments below recovery point on broker startup.
This PR includes the following changes:
1. Mmap the index file and populate fields of the index file on-demand rather than performing costly disk operations when creating the index object on broker startup.
2. Skip sanity checks on the time index and offset index of segments.
1. For segment with offset below the flushed point (recovery point), these segments are safely flushed so we don't need to sanity check the index files. if there are indeed data corruption on disk, given that we don't sanity check the segment file, sanity checking only the indexes adds little benefit.
2. For segment with offset above the flushed point (recovery point), we will recover these segments in `recoveryLog()` (Log.scala) in any case so sanity checking the index files for these segments is redundant.
We did experiments on a cluster with 15 brokers, each of which has ~3k segments (and there are 31.8k partitions with RF=3 which are evenly distributed across brokers; total bytes-in-rate is around 400 MBps). The results show that rolling bounce time reduces from 135 minutes to 55 minutes.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Implemented an in-memory window store allowing for range queries. A finite retention period defines how long records will be kept, ie the window of time for fetching, and the grace period defines the window within which late-arriving data may still be written to the store.
Unit tests were written to test the functionality of the window store, including its insert/update/delete and fetch operations. Single-record, all records, and range fetch were tested, for both time ranges and key ranges. The logging and metrics for late-arriving (dropped)records were tested as well as the ability to restore from a changelog.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This PR fixes the issue found in the soak testing cluster regarding using RocksDBTimestampedStore when a regular RocksDB store should have been used.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>
The StreamsBrokerBounceTest.test_broker_type_bounce experienced what looked like a transient failure. After looking over this test and failure, it seems like it is vulnerable to timing error that streams will start before the kafka service creates all topics.
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>
Fail produce requests using zstd until the inter.broker.protocol.version is large enough that replicas are ensured to support it. Otherwise, followers receive the `UNSUPPORTED_COMPRESSION_TYPE` when fetching zstd data and ISRs shrink.
Reviewers: Jason Gustafson <jason@confluent.io>