Add `SessionWindowedKStream` and implementation. Deprecate existing `SessionWindow` `aggregate` methods on `KGroupedStream`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3902 from dguy/kafka-5922
In `AssignedTasks` log at debug all task ids that are yet to be initialized.
In `StreamsTask` log at trace when the task is initialized.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3905 from dguy/minor-task-init-logging
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3903 from dguy/deprectate-to-through
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3884 from mjsax/minor-fixed-discoverd-via-exception-handling-investigation
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3892 from dguy/cleanup-state-stores
Add `Materialized` overloads to `WindowedKStream`. Deprecate existing methods on `KGroupedStream`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3889 from dguy/kafka-5921
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
Closes#3893 from mjsax/kafka-5893-reset-integration-test
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ted Yu <yuzhihong@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#3841 from mjsax/kafka-5833-interrupts
Add overloads for `table` and `globalTable` that use `Materialized`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3837 from dguy/kafka-5873
The parameter is already called `mapper` in the KStreamImpl class. I think it was probably named `processor` here because it was copy/pasted from some other signature. This sees trivial enough to not require a jira as per the contribution guidelines.
Author: Andy Chambers <andy.chambers@fundingcircle.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3888 from cddr/fix-kstream-flatMapValues-signature
Remove date formatting from `Segments` and use the `segementId` instead.
Add tests to make sure can load old segments.
Rename old segment dirs to new formatting at load time.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: tedyu <yuzhihong@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3783 from dguy/kafka-5515
I'm doing this in my spare time, so don't let reviewing this PR take away actual work time. This is just me going over the code with the Intellij analyzer and implementing the most easily implementable fixes.
This PR is focused only on seemingly erronous log statements.
1: A log statement that has 4 arguments supplied but only 3 `{}` statements
2: A log statement that checks is debug is enabled, but then logs on `info` level.
Author: coscale_kdegroot <koen.degroote@coscale.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3886 from KoenDG/loggingErrors
Add overloads of `count`, `reduce`, and `aggregate` that are `Materialized` to `KGroupedStream`.
Refactor common parts between `KGroupedStream` and `WindowedKStream`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3827 from dguy/kafka-5654
This PR utilizes `org.apache.kafka.common.utils.LogContext` for logging in `KafkaStreams`. hachikuji, ijuma please review this and let me know your thoughts.
Author: umesh chaudhary <umesh9794@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3727 from umesh9794/KAFKA-5754
The `NextIteratorFunction` in `CompositeReadOnlyWindowStore` was incorrectly using the `timeFrom` as the `timeTo`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3868 from dguy/window-store-range-scan
The contribution is my original work and I license the work to the project under the project's open source licence.
Author: lperry <lperry@simplemachines.com.au>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3843 from leigh-perry/trunk
I have decided to use the following approach to fixing this bug:
1) Since the Window Size in WindowedDeserializer was originally unknown, I have initialized
a field _windowSize_ and created a constructor to allow it to be instantiated
2) The default size for __windowSize__ is _Long.MAX_VALUE_. If that is the case, then the
deserialize method will return an Unlimited Window, or else will return Timed one.
3) Temperature Demo was modified to demonstrate how to use this new constructor, given
that the window size is known.
Author: Richard Yu <richardyu@Richards-Air.attlocal.net>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3745 from ConcurrencyPractitioner/trunk
Add overloads of `count`, `aggregate`, `reduce` using `Materialized` to `KGroupedTable`
deprecate other overloads
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3829 from dguy/kafka-5655
Add `join`, `leftJoin`, `outerJoin` overloads that use `Materialized` to `KTable`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3826 from dguy/kafka-5653
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3828 from bbejeck/MINOR_update_processor_topology_test_driver
Add `SerializedInternal` class and remove getters from `Serialized`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3825 from dguy/kafka-5817-follow-up
Create `ProducedInternal` and remove getters from `Produced`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3810 from dguy/kafka-5816-follow-up
1. Now instead of just generic `Exception` methods declare more concrete
exceptions throwing or don't declare any throwing at all, if not needed.
2. `SimpleBenchmark.run()` throws `RuntimeException`
3. `SimpleBenchmark.produce()` throws `IllegalArgumentException`
4. Expect `ProcessorStateException` in
`StandbyTaskTest.testUpdateNonPersistentStore()`
/cc enothereska
Author: Evgeny Veretennikov <evg.veretennikov@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3485 from evis/5531-throw-concrete-exceptions
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3816 from dguy/consumed-ctor
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3817 from dguy/printed-ctor-protected
Add overloads of `filter`, `filterNot`, `mapValues` that take `Materialized` as a param to `KTable`. Deprecate overloads using `storeName` and `storeSupplier`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3807 from dguy/ktable-filter-map
Part of KIP-182
- Add `Printed` class and `KStream#print(Printed)`
- deprecate all other `print` and `writeAsText` methods
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3768 from dguy/kafka-5652-printed
Add the `WindowedKStream` interface and implementation of methods that don't require `Materialized`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3809 from dguy/kgrouped-stream-windowed-by
I removed synchronized keyword from 3 methods.
I ran the change thru streams module where test suite passed.
Author: tedyu <yuzhihong@gmail.com>
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#3777 from tedyu/trunk
1. Sort processor nodes within a sub-topology by its sub-tree size: nodes with largest sizes are source nodes and hence printed earlier.
2. Sort sub-topologies by ids; sort global stores by the source topic names.
3. Open for discussion: start newlines for predecessor and successor.
4. Minor: space between processor nodes and stores / topics; maintain `[]` for the topic names.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ted Yu <yuzhihong@gmail.com>
Closes#3618 from guozhangwang/K5698-topology-description-sorting
add `KTable#groupBy(KeyValueMapper, Serialized)` and deprecate the overload with `Serde` params
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
Closes#3802 from dguy/kip-182-ktable-groupby
Part of KIP-182
- Add `StateStoreBuilder` interface and `WindowStateStoreBuilder`, `KeyValueStateStoreBuilder`, and `SessionStateStoreBuilder` implementations
- Add `StoreSupplier`, `WindowBytesStoreSupplier`, `KeyValueBytesStoreSupplier`, `SessionBytesStoreSupplier` interfaces and implementations
- Add new methods to `Stores` to create the newly added `StoreSupplier` and `StateStoreBuilder` implementations
- Update `Topology` and `InternalTopology` to use the interfaces
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3767 from dguy/kafka-5650
Add the `Produced` class and `KStream` overloads that use it:
`KStream#to(String, Produced)`
`KStream#through(String, Produced)`
Deprecate all other to and through methods accept the single param methods that take a topic param
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3770 from dguy/kafka-5652-produced
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3796 from guozhangwang/kip-138-minor-renames
Some other minor changes:
1. Do not throw the exception form callback as it would only be swallowed by consumer coordinator; remembering it and re-throw in the next loop is good enough.
2. Change Creating to Defining in Stores to avoid confusions that the stores have already been successfully created at that time.
3. Do not need unAssignChangeLogPartitions as the restore consumer will be unassigned already inside changelog reader.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3769 from guozhangwang/KMinor-logging-before-throwing
Add the `Joined` class and the overloads to `KStream` that use it.
Deprecate existing methods that have `Serde` params
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3776 from dguy/kip-182-stream-join
Part of KIP-182
- Add the `Serialized` class
- implement overloads of `KStream#groupByKey` and KStream#groupBy`
- deprecate existing methods that have more than default arguments
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3772 from dguy/kafka-5817
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3719 from mjsax/kafka-5603-dont-abort-tx-for-zombie-tasks-2
Simple implementation of the feature : [KAFKA-4819](https://issues.apache.org/jira/browse/KAFKA-4819)
KAFKA-4819
This PR adds a new method `threadStates` to public API of `KafkaStreams` which returns all currently states of running threads and active tasks.
Below is a example for a simple topology consuming from topics; test-p2 and test-p4.
[{"name":"StreamThread-1","state":"RUNNING","activeTasks":[{"id":"0_0", "assignments":["test-p4-0","test-p2-0"], "consumedOffsetsByPartition":[{"topicPartition":"test-p2-0","offset":"test-p2-0"}]}, {"id":"0_2", "assignments":["test-p4-2"], "consumedOffsetsByPartition":[]}]}, {"name":"StreamThread-2","state":"RUNNING","activeTasks":[{"id":"0_1", "assignments":["test-p4-1","test-p2-1"], "consumedOffsetsByPartition":[{"topicPartition":"test-p2-1","offset":"test-p2-1"}]}, {"id":"0_3", "assignments":["test-p4-3"], "consumedOffsetsByPartition":[]}]}]
Author: Florian Hussonnois <florian.hussonnois@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#2612 from fhussonnois/KAFKA-4819
- need to check that state is CRATED at startup
- some minor test cleanup
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3775 from mjsax/kafka-5818-kafkaStreams-state-transition
All current implementations process records using the same timestamp. This makes it difficult to test operations that require time windows, like `KStream-KStream joins`.
This change would allow tests to simulate records created at different times, thus making it possible to test operations like the above mentioned joins.
Author: Sebastian Gavril <sgavril@wehkamp.nl>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3753 from sebigavril/allow-timestamps-in-test-driver
1. Core concepts (added the stream time definition), upgrade guide and developer guide.
2. Related Java docs changes.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3732 from guozhangwang/KMinor-kip138-docs
1. Remove timeout-based validatePartitionExists from StoreChangelogReader; instead only try to refresh metadata once after all tasks have been created and their topology initialized (hence all stores have been registered).
2. Add the logic to refresh partition metadata at the end of initialization if some restorers needing initialization cannot find their changelogs, hoping that in the next run loop these stores can find their changelogs.
As a result, after `initialize` is called we may not be able to start initializing all the `needsInitializing` ones.
As an optimization, we would not call `consumer#partitionsFor` any more, but only `consumer#listTopics` fetching all the topic metadata; so the only blocking calls left are `listTopics` and `endOffsets`, and we always capture timeout exceptions around these two calls, and delay to retry in the next run loop after refreshing the metadata. By doing this we can also reduce the number of request round trips between consumer and brokers.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Closes#3748 from guozhangwang/K5797-handle-metadata-available
`ChangeLoggingWindowBytesStore` needs to have the same `retainDuplicates` functionality as `RocksDBWindowStore` else data could be lost upon failover/restoration.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3754 from dguy/hotfix-changelog-window-store