add `KTable#groupBy(KeyValueMapper, Serialized)` and deprecate the overload with `Serde` params
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
Closes#3802 from dguy/kip-182-ktable-groupby
Part of KIP-182
- Add `StateStoreBuilder` interface and `WindowStateStoreBuilder`, `KeyValueStateStoreBuilder`, and `SessionStateStoreBuilder` implementations
- Add `StoreSupplier`, `WindowBytesStoreSupplier`, `KeyValueBytesStoreSupplier`, `SessionBytesStoreSupplier` interfaces and implementations
- Add new methods to `Stores` to create the newly added `StoreSupplier` and `StateStoreBuilder` implementations
- Update `Topology` and `InternalTopology` to use the interfaces
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3767 from dguy/kafka-5650
Add the `Produced` class and `KStream` overloads that use it:
`KStream#to(String, Produced)`
`KStream#through(String, Produced)`
Deprecate all other to and through methods accept the single param methods that take a topic param
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3770 from dguy/kafka-5652-produced
The iterator interface usage has some examples missing explicit close operation after usage. We should remind the user to do so because un-closed iterator will leave the underlying file descriptor open, thus eating up memory.
guozhangwang Ishiihara
Author: Boyang Chen <bychen@pinterest.com>
Author: Boyang Chen <bchen11@outlook.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3714 from abbccdda/add_iterator_close_on_doc
minor fixes
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3796 from guozhangwang/kip-138-minor-renames
Some other minor changes:
1. Do not throw the exception form callback as it would only be swallowed by consumer coordinator; remembering it and re-throw in the next loop is good enough.
2. Change Creating to Defining in Stores to avoid confusions that the stores have already been successfully created at that time.
3. Do not need unAssignChangeLogPartitions as the restore consumer will be unassigned already inside changelog reader.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3769 from guozhangwang/KMinor-logging-before-throwing
- changed the interface & implementations
- updated tests to use the new method where applicable
Author: Attila Kreiner <attila@kreiner.hu>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3669 from attilakreiner/KAFKA-5726
Add the `Joined` class and the overloads to `KStream` that use it.
Deprecate existing methods that have `Serde` params
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3776 from dguy/kip-182-stream-join
Part of KIP-182
- Add the `Serialized` class
- implement overloads of `KStream#groupByKey` and KStream#groupBy`
- deprecate existing methods that have more than default arguments
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3772 from dguy/kafka-5817
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Sharon Liu <sharonliu.cup@gmail.com>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#3787 from mjsax/kafka-5823-kip120-docs
When we install kafka on path with spaces, batch files were failing, this PR is trying to fix this issue.
Author: Vladimír Kleštinec <klestinec@gmail.com>
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Jason Gustafson <jason@confluent.io>
Closes#2649 from klesta490/trunk
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3719 from mjsax/kafka-5603-dont-abort-tx-for-zombie-tasks-2
Simple implementation of the feature : [KAFKA-4819](https://issues.apache.org/jira/browse/KAFKA-4819)
KAFKA-4819
This PR adds a new method `threadStates` to public API of `KafkaStreams` which returns all currently states of running threads and active tasks.
Below is a example for a simple topology consuming from topics; test-p2 and test-p4.
[{"name":"StreamThread-1","state":"RUNNING","activeTasks":[{"id":"0_0", "assignments":["test-p4-0","test-p2-0"], "consumedOffsetsByPartition":[{"topicPartition":"test-p2-0","offset":"test-p2-0"}]}, {"id":"0_2", "assignments":["test-p4-2"], "consumedOffsetsByPartition":[]}]}, {"name":"StreamThread-2","state":"RUNNING","activeTasks":[{"id":"0_1", "assignments":["test-p4-1","test-p2-1"], "consumedOffsetsByPartition":[{"topicPartition":"test-p2-1","offset":"test-p2-1"}]}, {"id":"0_3", "assignments":["test-p4-3"], "consumedOffsetsByPartition":[]}]}]
Author: Florian Hussonnois <florian.hussonnois@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#2612 from fhussonnois/KAFKA-4819
Added a simple table of contents for the developer section.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3760 from enothereska/minor-docs-toc
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>, Colin P. Mccabe <cmccabe@confluent.io>
Closes#3621 from lindong28/KAFKA-5694
If a request for a broker configuration failed due to a timeout or
the broker not being available, we would fail the futures
associated with the non-broker request instead (and never fail the
broker future, which would be left uncompleted forever).
We would also do an unnecessary request if only broker configs were
requested.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3585 from cmccabe/KAFKA-5659
- need to check that state is CRATED at startup
- some minor test cleanup
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3775 from mjsax/kafka-5818-kafkaStreams-state-transition
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jiangjie Qin<becket.qin@gmail.com>
Closes#3709 from lindong28/KAFKA-5759
All current implementations process records using the same timestamp. This makes it difficult to test operations that require time windows, like `KStream-KStream joins`.
This change would allow tests to simulate records created at different times, thus making it possible to test operations like the above mentioned joins.
Author: Sebastian Gavril <sgavril@wehkamp.nl>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3753 from sebigavril/allow-timestamps-in-test-driver
The previous timeout was 10 seconds, but system test failures have occurred when Zookeeper has started after about 11 seconds. Increasing the timeout to 30 seconds, since most of the time this extra time will not be required, and when it is it will prevent a failed system test.
In addition to merging to `trunk`, please backport to the `0.11.x` and `0.10.2.x` branches.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#3774 from rhauch/MINOR-Increase-timeout-of-zookeeper-service-in-system-tests
In the coordinator, we should check that 'shutdown' is not true before going to sleep waiting for the condition.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#3755 from cmccabe/KAFKA-5806
1. Core concepts (added the stream time definition), upgrade guide and developer guide.
2. Related Java docs changes.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3732 from guozhangwang/KMinor-kip138-docs
1. Remove timeout-based validatePartitionExists from StoreChangelogReader; instead only try to refresh metadata once after all tasks have been created and their topology initialized (hence all stores have been registered).
2. Add the logic to refresh partition metadata at the end of initialization if some restorers needing initialization cannot find their changelogs, hoping that in the next run loop these stores can find their changelogs.
As a result, after `initialize` is called we may not be able to start initializing all the `needsInitializing` ones.
As an optimization, we would not call `consumer#partitionsFor` any more, but only `consumer#listTopics` fetching all the topic metadata; so the only blocking calls left are `listTopics` and `endOffsets`, and we always capture timeout exceptions around these two calls, and delay to retry in the next run loop after refreshing the metadata. By doing this we can also reduce the number of request round trips between consumer and brokers.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Closes#3748 from guozhangwang/K5797-handle-metadata-available
`ChangeLoggingWindowBytesStore` needs to have the same `retainDuplicates` functionality as `RocksDBWindowStore` else data could be lost upon failover/restoration.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3754 from dguy/hotfix-changelog-window-store
- client id is part of the log context, so removed ad-hoc usages
- Fixed an issue where the response was not printed correctly,
use `toString(version)` instead of `toString()`
- Capitalized all log statements for consistency
- Fixed a number of double spaces after period
Author: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3741 from Kamal15/kafka-5762
Author: Tommy Becker <tobecker@tivo.com>
Author: Tommy Becker <twbecker@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3455 from twbecker/kafka-5379
1. StreamThread: prevent `PARTITIONS_REVOKED` to transit to itself in `setState` by returning false. And only execute the task suspension logic when `setState(PARTITIONS_REVOKED)` returns true in `onPartitionsRevoked`.
2. StreamThread: minor, renaming `shutdown` to `completeShutdown`, and `close` to `shutdown`, `stillRunning` to `isRunning`, `isInitialized` to `isRunningAndNotRebalancing`.
3. GlobalStreamThread: tighten the transition a bit in `setState`. Force transiting to `PENDING_SHUTDOWN` and `DEAD` when initialization failed.
4. GlobalStreamThread: minor, add logPrefix to StateConsumer. Also removing its state change listener when closing the thread.
5. KafkaStreams: because of 1) above we can now prevent its `REBALANCING` to `REBALANCING`.
6. KafkaStreams: prevent `CREATED` to ever go to `REBALANCING` first to force it transit to `RUNNING` when starting. Also prevent `CREATED` to go to `ERROR`.
7. KafkaStreams: collapse `validateStartOnce` and `checkFirstTimeClosing ` into `setState`.
8. KafkaStreams: in `close` and `start`, only execute the logic when `setState` succeeds.
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Closes#3713 from guozhangwang/KMinor-set-state
* When a call is aborted, that should count as a "try" in the failure log message.
* FailureInjectingTimeoutProcessorFactory should fail the first request it is asked about.
* testCallTimeouts should expect the first request it makes to fail because of the timeout we injected.
* FailureInjectingTimeoutProcessorFactory should track how many failures it has injected, and the test should verify that one has been injected.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3731 from cmccabe/KAFKA-5720
Remove the previous TODO to remove the clean shutdown file with some of the discussion from https://github.com/apache/kafka/pull/2104.
Author: Holden Karau <holden@us.ibm.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3742 from holdenk/KAFKA-4380-document-clean-shutdown-file
…h-benchmarks/jmh.sh
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Closes#2654 from bbejeck/KAFKA-3989_follow_up
Expose the ClassCastException as the cause for the SerializationException.
Author: Jeremy Custenborder <jcustenborder@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3556 from jcustenborder/KAFKA-5620
If a task fails during initialization due to a LockException, its changelog partitions are not immediately added to the StoreChangelogReader as the thread doesn't hold the lock. However StoreChangelogReader#restore will be called and it sets the initialized flag. On a subsequent successfull call to initialize the new tasks the partitions are added to the StoreChangelogReader, however as it is already initialized these new partitions will never be restored. So the task would remain in a non-running state forever.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3736 from dguy/kafka-5787
Make MeteredSessionStore the outermost store.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3729 from dguy/kafka-5749
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3699 from cmccabe/trogdor-review