The goal of this task is to implement an integration test for the kafka stream metrics.
We have to check 2 things:
1. After streams application are started, all metrics from different levels (thread, task, processor, store, cache) are correctly created and displaying recorded values.
2. When streams application are shutdown, all metrics are correctly de-registered and removed.
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Sub-task required to allow to define custom processor names with KStreams DSL(KIP-307) :
- add new public interface NamedOperation
- deprecate methods Joined.as() and Joined.name()
- update Suppredded interface to extend NamedOperation
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Use concurrent data structure for the underlying cache in NamedCache, and iterate over it with subMap instead of many calls to get()
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
In a previous commit #6091, we've fixed a couple of edge cases and hence do not need to remove state listener anymore (before that we removed the state listener intentionally to avoid some race conditions, which has been gone for now).
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
TopicDescription and ConsumerGroupDescription in org.apache.kafka.clients.admin. are part of the public API, so we should retain the existing public constructor. Changed the new constructor with authorized operations to be package-private to avoid maintaining more public constructors since we only expect admin client to use this.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
The Java API can pass a Properties object to StreamsBuilder#build, to allow, e.g., topology optimization, while the Scala API does not yet. The latter only delegates the work to the underlying Java implementation.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
This is a draft cleanup for KAFKA-7502. Here is the details:
* Make KTableKTableJoinNode abstract, and define its child classes ([NonMaterialized,Materialized]KTableKTableJoinNode) instead: now, all materialization-related routines are separated into the other classes.
* KTableKTableJoinNodeBuilder#build now instantiates [NonMaterialized,Materialized]KTableKTableJoinNode classes instead of KTableKTableJoinNode.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
Just a minor cleanup to use Java 8 lambdas vs anonymous classes in this test.
I ran all tests in the streams test suite
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
* add a normal windowed suppress with short windows and a short grace
period
* improve the smoke test so that it actually verifies the intended
conditions
See https://issues.apache.org/jira/browse/KAFKA-7944
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
If deprecated interface methods are inherited, the @Deprication tag should be used (instead on suppressing the deprecation warning).
Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Add TimestampedWindowStore builder/runtime classes
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
As of 2.0, Producer.initTransactions may throw a TimeoutException, which is retriable. Streams should retry instead of crashing when we encounter this exception
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
Metadata may be updated from the background thread, so we need to protect access to SubscriptionState. This patch restructures the metadata handling so that we only check pattern subscriptions in the foreground. Additionally, it improves the following:
1. SubscriptionState is now the source of truth for the topics that will be fetched. We had a lot of messy logic previously to try and keep the the topic set in Metadata consistent with the subscription, so this simplifies the logic.
2. The metadata needs for the producer and consumer are quite different, so it made sense to separate the custom logic into separate extensions of Metadata. For example, only the producer requires topic expiration.
3. We've always had an edge case in which a metadata change with an inflight request may cause us to effectively miss an expected update. This patch implements a separate version inside Metadata which is bumped when the needed topics changes.
4. This patch removes the MetadataListener, which was the cause of https://issues.apache.org/jira/browse/KAFKA-7764.
Reviewers: David Arthur <mumrah@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Resolves the compiler warnings when building streams-scala.
Reviewers: A. Sophie Blee-Goldman <ableegoldman@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Improve JavaDocs about global state stores.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Rewrote the InMemoryWindowStore implementation by moving the work of a fetch to the iterator, and cleaned up the iterators as well.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
Moved hard-coded 'expired-window-record-drop' and 'late-record-drop' to static Strings in StreamsMetricsImpl
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
Third (and final) PR in series to inline the generic parameters of the following bytes stores:
[Pt. I] InMemoryKeyValueStore
[Pt. II] RocksDBWindowStore
[Pt. II] RocksDBSessionStore
[Pt. II] MemoryLRUCache
[Pt. II] MemoryNavigableLRUCache
[x] InMemoryWindowStore
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
In the RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated() and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted() a race condition exists where the ConsumerRebalanceListener in the test modifies the list of subscribed topics when the condition for the test success is comparing the same array instance against expected values.
This PR should fix this race condition by using a CopyOnWriteArrayList which guarantees safe traversal of the list even when a concurrent modification is happening.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Previously the InMemoryKeyValue store would throw a ConcurrentModificationException if the store was modified beneath an open iterator. The TreeMap implementation was swapped with a ConcurrentSkipListMap for similar performance while supporting concurrent access.
Added one test to AbstractKeyValueStoreTest, no existing tests caught this.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Second PR in series to inline the generic parameters of the following bytes stores
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
I found this defect while inspecting [KAFKA-7293: Merge followed by groupByKey/join might violate co-partitioning](https://issues.apache.org/jira/browse/KAFKA-7293); This flag is never used now. Instead, `KStreamImpl#repartitionRequired` is now covering its functionality.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
First PR in series to inline the generic parameters of the following bytes stores:
[x] InMemoryKeyValueStore
[ ] RocksDBWindowStore
[ ] RocksDBSessionStore
[ ] MemoryLRUCache
[ ] MemoryNavigableLRUCache
[ ] (awaiting merge) InMemoryWindowStore
A number of tests took advantage of the generic InMemoryKeyValueStore and had to be reworked somewhat -- this PR covers everything related to the in-memory key-value store.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
This fix is aiming for #2 issue pointed out within https://issues.apache.org/jira/browse/KAFKA-7672
In the current setup, we do offset checkpoint file write when EOS is turned on during #suspend, which introduces the potential race condition during StateManager #closeSuspend call. To mitigate the problem, we attempt to always write checkpoint file in #suspend call.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
* In activeTasks.suspend, we should also close all restoring tasks as well. Closing restoring tasks would not require `task.close` as in `closeNonRunningTasks `, since the topology is not initialized yet, instead only state stores are initialized. So we only need to call `task.closeStateManager`.
* Also add @linyli001 's fix.
* Unit tests updated accordingly.
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>
This is an update to the existing javadocs for KGroupedStream class.
Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Implemented an in-memory window store allowing for range queries. A finite retention period defines how long records will be kept, ie the window of time for fetching, and the grace period defines the window within which late-arriving data may still be written to the store.
Unit tests were written to test the functionality of the window store, including its insert/update/delete and fetch operations. Single-record, all records, and range fetch were tested, for both time ranges and key ranges. The logging and metrics for late-arriving (dropped)records were tested as well as the ability to restore from a changelog.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This PR fixes the issue found in the soak testing cluster regarding using RocksDBTimestampedStore when a regular RocksDB store should have been used.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>