Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3816 from dguy/consumed-ctor
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3817 from dguy/printed-ctor-protected
Add overloads of `filter`, `filterNot`, `mapValues` that take `Materialized` as a param to `KTable`. Deprecate overloads using `storeName` and `storeSupplier`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3807 from dguy/ktable-filter-map
Part of KIP-182
- Add `Printed` class and `KStream#print(Printed)`
- deprecate all other `print` and `writeAsText` methods
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3768 from dguy/kafka-5652-printed
Add the `WindowedKStream` interface and implementation of methods that don't require `Materialized`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3809 from dguy/kgrouped-stream-windowed-by
I removed synchronized keyword from 3 methods.
I ran the change thru streams module where test suite passed.
Author: tedyu <yuzhihong@gmail.com>
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#3777 from tedyu/trunk
1. Sort processor nodes within a sub-topology by its sub-tree size: nodes with largest sizes are source nodes and hence printed earlier.
2. Sort sub-topologies by ids; sort global stores by the source topic names.
3. Open for discussion: start newlines for predecessor and successor.
4. Minor: space between processor nodes and stores / topics; maintain `[]` for the topic names.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ted Yu <yuzhihong@gmail.com>
Closes#3618 from guozhangwang/K5698-topology-description-sorting
add `KTable#groupBy(KeyValueMapper, Serialized)` and deprecate the overload with `Serde` params
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
Closes#3802 from dguy/kip-182-ktable-groupby
Part of KIP-182
- Add `StateStoreBuilder` interface and `WindowStateStoreBuilder`, `KeyValueStateStoreBuilder`, and `SessionStateStoreBuilder` implementations
- Add `StoreSupplier`, `WindowBytesStoreSupplier`, `KeyValueBytesStoreSupplier`, `SessionBytesStoreSupplier` interfaces and implementations
- Add new methods to `Stores` to create the newly added `StoreSupplier` and `StateStoreBuilder` implementations
- Update `Topology` and `InternalTopology` to use the interfaces
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3767 from dguy/kafka-5650
Add the `Produced` class and `KStream` overloads that use it:
`KStream#to(String, Produced)`
`KStream#through(String, Produced)`
Deprecate all other to and through methods accept the single param methods that take a topic param
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3770 from dguy/kafka-5652-produced
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3796 from guozhangwang/kip-138-minor-renames
Some other minor changes:
1. Do not throw the exception form callback as it would only be swallowed by consumer coordinator; remembering it and re-throw in the next loop is good enough.
2. Change Creating to Defining in Stores to avoid confusions that the stores have already been successfully created at that time.
3. Do not need unAssignChangeLogPartitions as the restore consumer will be unassigned already inside changelog reader.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3769 from guozhangwang/KMinor-logging-before-throwing
Add the `Joined` class and the overloads to `KStream` that use it.
Deprecate existing methods that have `Serde` params
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3776 from dguy/kip-182-stream-join
Part of KIP-182
- Add the `Serialized` class
- implement overloads of `KStream#groupByKey` and KStream#groupBy`
- deprecate existing methods that have more than default arguments
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3772 from dguy/kafka-5817
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3719 from mjsax/kafka-5603-dont-abort-tx-for-zombie-tasks-2
Simple implementation of the feature : [KAFKA-4819](https://issues.apache.org/jira/browse/KAFKA-4819)
KAFKA-4819
This PR adds a new method `threadStates` to public API of `KafkaStreams` which returns all currently states of running threads and active tasks.
Below is a example for a simple topology consuming from topics; test-p2 and test-p4.
[{"name":"StreamThread-1","state":"RUNNING","activeTasks":[{"id":"0_0", "assignments":["test-p4-0","test-p2-0"], "consumedOffsetsByPartition":[{"topicPartition":"test-p2-0","offset":"test-p2-0"}]}, {"id":"0_2", "assignments":["test-p4-2"], "consumedOffsetsByPartition":[]}]}, {"name":"StreamThread-2","state":"RUNNING","activeTasks":[{"id":"0_1", "assignments":["test-p4-1","test-p2-1"], "consumedOffsetsByPartition":[{"topicPartition":"test-p2-1","offset":"test-p2-1"}]}, {"id":"0_3", "assignments":["test-p4-3"], "consumedOffsetsByPartition":[]}]}]
Author: Florian Hussonnois <florian.hussonnois@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#2612 from fhussonnois/KAFKA-4819
- need to check that state is CRATED at startup
- some minor test cleanup
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3775 from mjsax/kafka-5818-kafkaStreams-state-transition
All current implementations process records using the same timestamp. This makes it difficult to test operations that require time windows, like `KStream-KStream joins`.
This change would allow tests to simulate records created at different times, thus making it possible to test operations like the above mentioned joins.
Author: Sebastian Gavril <sgavril@wehkamp.nl>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3753 from sebigavril/allow-timestamps-in-test-driver
1. Core concepts (added the stream time definition), upgrade guide and developer guide.
2. Related Java docs changes.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3732 from guozhangwang/KMinor-kip138-docs
1. Remove timeout-based validatePartitionExists from StoreChangelogReader; instead only try to refresh metadata once after all tasks have been created and their topology initialized (hence all stores have been registered).
2. Add the logic to refresh partition metadata at the end of initialization if some restorers needing initialization cannot find their changelogs, hoping that in the next run loop these stores can find their changelogs.
As a result, after `initialize` is called we may not be able to start initializing all the `needsInitializing` ones.
As an optimization, we would not call `consumer#partitionsFor` any more, but only `consumer#listTopics` fetching all the topic metadata; so the only blocking calls left are `listTopics` and `endOffsets`, and we always capture timeout exceptions around these two calls, and delay to retry in the next run loop after refreshing the metadata. By doing this we can also reduce the number of request round trips between consumer and brokers.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Closes#3748 from guozhangwang/K5797-handle-metadata-available
`ChangeLoggingWindowBytesStore` needs to have the same `retainDuplicates` functionality as `RocksDBWindowStore` else data could be lost upon failover/restoration.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3754 from dguy/hotfix-changelog-window-store
Author: Tommy Becker <tobecker@tivo.com>
Author: Tommy Becker <twbecker@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3455 from twbecker/kafka-5379
1. StreamThread: prevent `PARTITIONS_REVOKED` to transit to itself in `setState` by returning false. And only execute the task suspension logic when `setState(PARTITIONS_REVOKED)` returns true in `onPartitionsRevoked`.
2. StreamThread: minor, renaming `shutdown` to `completeShutdown`, and `close` to `shutdown`, `stillRunning` to `isRunning`, `isInitialized` to `isRunningAndNotRebalancing`.
3. GlobalStreamThread: tighten the transition a bit in `setState`. Force transiting to `PENDING_SHUTDOWN` and `DEAD` when initialization failed.
4. GlobalStreamThread: minor, add logPrefix to StateConsumer. Also removing its state change listener when closing the thread.
5. KafkaStreams: because of 1) above we can now prevent its `REBALANCING` to `REBALANCING`.
6. KafkaStreams: prevent `CREATED` to ever go to `REBALANCING` first to force it transit to `RUNNING` when starting. Also prevent `CREATED` to go to `ERROR`.
7. KafkaStreams: collapse `validateStartOnce` and `checkFirstTimeClosing ` into `setState`.
8. KafkaStreams: in `close` and `start`, only execute the logic when `setState` succeeds.
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Closes#3713 from guozhangwang/KMinor-set-state
If a task fails during initialization due to a LockException, its changelog partitions are not immediately added to the StoreChangelogReader as the thread doesn't hold the lock. However StoreChangelogReader#restore will be called and it sets the initialized flag. On a subsequent successfull call to initialize the new tasks the partitions are added to the StoreChangelogReader, however as it is already initialized these new partitions will never be restored. So the task would remain in a non-running state forever.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3736 from dguy/kafka-5787
Make MeteredSessionStore the outermost store.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3729 from dguy/kafka-5749
In onPartitionsAssigned:
release all locks for non-assigned suspended tasks.
resume any suspended tasks.
Create new tasks, but don't attempt to take the state lock.
Pause partitions for any new tasks.
set the state to PARTITIONS_ASSIGNED
In StreamThread#runLoop
poll
if state is PARTITIONS_ASSIGNED
2.1 attempt to initialize any new tasks, i.e, take out the state locks and init state stores
2.2 restore some data for changelogs, i.e., poll once on the restore consumer and return the partitions that have been fully restored
2.3 update tasks with restored partitions and move any that have completed restoration to running
2.4 resume consumption for any tasks where all partitions have been restored.
2.5 if all active tasks are running, transition to RUNNING and assign standby partitions to the restoreConsumer.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3675 from dguy/kafka-5152
Cherrypicking additional changes made as part of this PR https://github.com/apache/kafka/pull/3622, back to trunk.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3691 from enothereska/hotfix-deadlock-state
Add MeteredWindowStore and ChangeLoggingWindowBytesStore.
Refactor Store hierarchy such that Metered is always the outermost store
Do serialization in MeteredWindowStore
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3692 from dguy/kafka-5689
Author: Kamal C <kamal.chandraprakash@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3695 from Kamal15/typo_error
Fix range queries in `CompositeReadOnlyWindowStore` and `CompositeReadOnlySessionStore` to fetch across all stores (was previously just looking in the first store)
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3685 from dguy/kafka-5668
This is to complete Bill's PR #3664 on KAFKA-5733, incorporating the suggestion in https://github.com/facebook/rocksdb/issues/2734.
Some minor changes: move `open = true` in `openDB`.
Author: Bill Bejeck <bill@confluent.io>
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3681 from guozhangwang/K5733-rocksdb-bulk-load
The commit brings improved test coverage for StreamsKafkaClientTest.java
Author: Andrey Dyachkov <andrey.dyachkov@zalando.de>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3663 from adyach/kafka-4643
refactor StateStoreSuppliers such that a `MeteredKeyValueStore` is the outermost store.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#3592 from dguy/key-value-store-refactor
0. Minor fixes on the existing examples to merge all on a single input topic; also do not use `common.utils.Exit` as it is for internal usage only.
1. Add the archetype project for the quickstart. Steps to try it out:
a. `mvn install` on the quickstart directory.
b. `mvn archetype:generate \
-DarchetypeGroupId=org.apache.kafka \
-DarchetypeArtifactId=streams-quickstart-java \
-DarchetypeVersion=1.0.0-SNAPSHOT \
-DgroupId=streams-quickstart \
-DartifactId=streams-quickstart \
-Dversion=0.1 \
-Dpackage=StreamsQuickstart \
-DinteractiveMode=false` at any directory to create the project.
c. build the streams jar with version `1.0.0-SNAPSHOT` to local maven repository with `./gradlew installAll`; `cd streams-quickstart; mvn clean package`
d. create the input / output topics, start the console producer and consumer.
e. start the program: `mvn exec:java -Dexec.mainClass=StreamsQuickstart.Pipe/LineSplit/WordCount`.
f. type data on console producer and observe data on console consumer.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Ewen Cheslack-Postava <me@ewencp.org>, Eno Thereska <eno.thereska@gmail.com>
Closes#3630 from guozhangwang/KMinor-streams-quickstart-tutorial
Extracted `TaskManager` to handle all task related activities.
Make `StandbyTaskCreator`, `TaskCreator`, and `RebalanceListener` static classes so they must define their dependencies and can be testing independently of `StreamThread`
Added interfaces between `StreamPartitionAssignor` & `StreamThread` to reduce coupling.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Eno Thereska <eno.thereska@gmail.com>
Closes#3624 from dguy/stream-thread-refactor
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#3644 from bbejeck/KAFKA-5711_bulk_restore_should_handle_deletes
Fixed a bug in the InMemoryKeyValueStore restoration where a key with a `null` value is written in to the map rather than being deleted.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#3650 from dguy/kafka-5717
1. Remove separate thread from test failing periodically due to race condition.
2. Remove anonymous `AbstractNotifyingBatchingRestoreCallback` declare as concrete inner class `RocksDBBatchingRestoreCallback` and set as package private variable. Class is static so it has to initialize it's dependency on `RocksDBStore`
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3640 from bbejeck/KAFKA-5701_fix_flaky_unit_test
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3625 from bbejeck/HOTFIX_need_to_correct_stanby_task_restoration_to_use_new_restore_api
With current tests, the deserialization inside the KStreamPrint node processor which happens when key and/or values are byte[] isn't tested. This PR fixes that.
Author: Paolo Patierno <ppatierno@live.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
Closes#3611 from ppatierno/minor-kstream-print-test
A couple of fixes to metric names to match the KIP
- Removed extra strings in the metric names that are already in the tags
- add a separate metric for "all"
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3491 from enothereska/hotfix-metric-names
1. Remove rest deprecation warnings in streams:jar.
2. Consolidate all unit test classes' reflections to access internal topology builder from packages other than `o.a.k.streams`. We need to refactor the hierarchies of StreamTask, StreamThread and KafkaStreams to remove these hacky reflections.
3. Minor fixes such as reference path, etc.
4. Minor edits on web docs for the describe function under developer-guide.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Damian Guy <damian.guy@gmail.com>
Closes#3603 from guozhangwang/K5671-followup-comments
1. Only log an ERROR on the first encountered exception from the callback.
2. Wrap the exception message with the first thrown message information, and throw the exception whenever `checkException` is called.
Therefore, for the `store.put` call, it will throw a `KafkaException` with the error message a bit more intuitive.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Xavier Léauté <xavier@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3534 from guozhangwang/K5006-exception-record-collector
Kafka Streams does not allow users to modify some consumer configurations.
Currently, it does not allow modifying the value of 'enable.auto.commit'.
If the user modifies this property, currently an exception is thrown.
The following changes were made in this patch:
- Defined a new array 'NON_CONFIGURABLE_CONSUMER_CONFIGS' to hold the names
of the configuration parameters that is not allowed to be modified
- Defined a new method 'checkIfUnexpectedUserSpecifiedConsumerConfig' to
check if user overwrote the values of any of the non configurable configuration
parameters. If so, then log a warning message and reset the default values
- Updated the javadoc to include the configuration parameters that cannot be
modified by users.
- Updated the corresponding tests in StreamsConfigTest.java to reflect the changes
made in StreamsConfig.java
Author: Mariam John <mariamj@us.ibm.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Eno Thereska <eno.thereska@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#2990 from johnma14/bug/kafka-5096
In the streams project, there are a number of unit tests that has duplicate
code with respect to the tearDown() method, in which it tries to close the
KStreamTestDriver connection. The goal of this changeset is to eliminate
this duplication by converting the KStreamTestDriver class to an ExternalResource
class which is the base class of JUnit Rule.
In every unit tests that calls KStreamTestDriver, we annotate the KStreamTestDriver
using Rule annotation. In the KStreamTestDriver class, we override the after()
method. This after() method in turn calls the close() method which was previously
called in the tearDown() method in the unit tests. By annotating the KStreamTestDriver
as a Rule, the after() method will be called automatically after every testcase.
Author: johnma14 <mariamj@us.ibm.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3589 from johnma14/bug/KAFKA-3623
1. Capture `CommitFailedException` in `StreamThread#suspendTasksAndState`.
2. Remove `Cache` from AbstractTask as it is not needed any more; remove not used cleanup related variables from StreamThread (cc dguy to double check).
3. Also fix log4j outputs for error and warn, such that for WARN we do not print stack trace, and for ERROR we remove the dangling colon since the exception stack trace will start in newline.
4. Update one log4j entry to always print as WARN for errors closing a zombie task (cc mjsax ).
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3574 from guozhangwang/KHotfix-handle-commit-failed-exception-in-suspend