Previously, we depicted creating a Jackson serde for every pojo class, which becomes a burden in practice. There are many ways to avoid this and just have a single serde, so we've decided to model this design choice instead.
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Updated the upgrade doc as well since we do not have an overloaded function without the deprecated parameter before. Also renamed the 1.2 release version to 2.0.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
While working on this, I also refactored the MockProcessor out of the MockProcessorSupplier to cleanup the unit test paths.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Now that we have augmented WindowSerde with non-arg parameters, extract it out as part of the public APIs so that users who want to I/O windowed streams can use it. This is originally introduced by @vitaly-pushkar
This PR grows out to be a much larger one, as I found a few tech debts and bugs while working on it. Here is a summary of the PR:
Public API changes (I will propose a KIP after a first round of reviews):
Add TimeWindowedSerializer, TimeWindowedDeserializer, SessionWindowedSerializer, SessionWindowedDeserializer into o.a.k.streams.kstream. The serializers would implemented an internal WindowedSerializer interface for the serializeBaseKey function used in 3) below.
Add WindowedSerdes into o.a.k.streams.kstream. The reason to now add them into o.a.k.clients's Serdes is that it then needs dependency of streams.
Add "default.windowed.key.serde.inner" and "default.windowed.value.serde.inner" into StreamsConfig, used when "default.key.serde" is specified to use time or session windowed serde. Note this requires the serde class, not the type class.
Consolidated serde format from multiple classes, including SessionKeySerde.java for session, and WindowStoreUtils for time window, into SessionKeySchema and WindowKeySchema.
Bug fix: WindowedStreamPartitioner needs to consider both time window and session window serdes.
Removed RocksDBWindowBytesStore etc optimization since after KIP-182 all the serde know happens on metered store, hence this optimization is not worth.
Bug fix: for time window, the serdes used for store and the serdes used for piping (source and sink node) are different: the former needs to append sequence number but not for the later.
Other minor cleanups: remove unnecessary throws, etc.
Authors: Guozhang Wang <wangguoz@gmail.com>, Vitaly Pushkar <vitaly.pushkar@gmail.com>
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bill@confluent.io>, Xi Hu
- fixes examples with regard to new API
- fixes `Topology#addGlobalStore` parameters
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#4003 from mjsax/minor-deprecated
Add overloads for `table` and `globalTable` that use `Materialized`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3837 from dguy/kafka-5873
I have decided to use the following approach to fixing this bug:
1) Since the Window Size in WindowedDeserializer was originally unknown, I have initialized
a field _windowSize_ and created a constructor to allow it to be instantiated
2) The default size for __windowSize__ is _Long.MAX_VALUE_. If that is the case, then the
deserialize method will return an Unlimited Window, or else will return Timed one.
3) Temperature Demo was modified to demonstrate how to use this new constructor, given
that the window size is known.
Author: Richard Yu <richardyu@Richards-Air.attlocal.net>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3745 from ConcurrencyPractitioner/trunk
Part of KIP-182
- Add `StateStoreBuilder` interface and `WindowStateStoreBuilder`, `KeyValueStateStoreBuilder`, and `SessionStateStoreBuilder` implementations
- Add `StoreSupplier`, `WindowBytesStoreSupplier`, `KeyValueBytesStoreSupplier`, `SessionBytesStoreSupplier` interfaces and implementations
- Add new methods to `Stores` to create the newly added `StoreSupplier` and `StateStoreBuilder` implementations
- Update `Topology` and `InternalTopology` to use the interfaces
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3767 from dguy/kafka-5650
Part of KIP-182
- Add the `Serialized` class
- implement overloads of `KStream#groupByKey` and KStream#groupBy`
- deprecate existing methods that have more than default arguments
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3772 from dguy/kafka-5817
0. Minor fixes on the existing examples to merge all on a single input topic; also do not use `common.utils.Exit` as it is for internal usage only.
1. Add the archetype project for the quickstart. Steps to try it out:
a. `mvn install` on the quickstart directory.
b. `mvn archetype:generate \
-DarchetypeGroupId=org.apache.kafka \
-DarchetypeArtifactId=streams-quickstart-java \
-DarchetypeVersion=1.0.0-SNAPSHOT \
-DgroupId=streams-quickstart \
-DartifactId=streams-quickstart \
-Dversion=0.1 \
-Dpackage=StreamsQuickstart \
-DinteractiveMode=false` at any directory to create the project.
c. build the streams jar with version `1.0.0-SNAPSHOT` to local maven repository with `./gradlew installAll`; `cd streams-quickstart; mvn clean package`
d. create the input / output topics, start the console producer and consumer.
e. start the program: `mvn exec:java -Dexec.mainClass=StreamsQuickstart.Pipe/LineSplit/WordCount`.
f. type data on console producer and observe data on console consumer.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Ewen Cheslack-Postava <me@ewencp.org>, Eno Thereska <eno.thereska@gmail.com>
Closes#3630 from guozhangwang/KMinor-streams-quickstart-tutorial
Added a Kafka Streams example (IoT oriented) using "tumbling" window
Author: Paolo Patierno <ppatierno@live.com>
Author: ppatierno <ppatierno@live.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Michael G. Noll <michael@confluent.io>
Closes#3352 from ppatierno/stream-temperature-example
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3602 from mjsax/kafka-5671-add-streamsbuilder
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3590 from mjsax/kafka-3856-replace-topology-builder-by-topology
1. Make the WordCountDemo application to not stop automatically but via "ctrl-C".
2. Update the quickstart html file to let users type input messages one-by-one, and observe added output in an interactive manner.
3. Some minor fixes on the parent documentation page pointing to streams sub-pages, added a new recommended Scala version number.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Michael G. Noll <michael@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3515 from guozhangwang/KMinor-interactive-quickstart
Implementation for KIP-138: Change punctuate semantics
Author: Michal Borowiecki <michal.borowiecki@openbet.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bbejeck@gmail.com>, Eno Thereska <eno.thereska@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3055 from mihbor/KIP-138
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J Sax <matthias@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Closes#3403 from guozhangwang/KMinor-turn-off-caching-in-demo
Author: Jeyhun Karimov <je.karimov@gmail.com>
Reviewers: Damian Guy, Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2466 from jeyhunkarimov/KAFKA-4144
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Eno Thereska <eno@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2812 from miguno/trunk-streams-examples-docs
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jun Rao <junrao@gmail.com>, Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2691 from cmccabe/KAFKA-4902
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Michael G. Noll <michael@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2727 from cmccabe/KAFKA-4944
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Michael G. Noll, Eno Thereska, Matthias J. Sax, Elias Levy, Guozhang Wang
Closes#2725 from dguy/minor-processor-java-doc
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2303 from mjsax/licenseHeader
`bin-kafka-console-producer.sh` should be `bin/kafka-console-producer.sh`.
Author: Will Marshall <wcm214@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2410 from wmarshall484/typo-fix
dguy guozhangwang This is a new PR for KAFKA-4060.
Author: Hojjat Jafarpour <hojjat@Hojjat-Jafarpours-MBP.local>
Author: Hojjat Jafarpour <hojjat@HojjatJpoursMBP.attlocal.net>
Reviewers: Damian Guy, Matthias J. Sax, Isamel Juma, Guozhang Wang
Closes#1884 from hjafarpour/KAFKA-4060-Remove-ZkClient-dependency-in-Kafka-Streams-new
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Michael G. Noll, Eno Thereska, Damian Guy, Guozhang Wang
Closes#2117 from mjsax/kafka-4393-improveInvalidTsHandling
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax, Michael G. Noll, Guozhang Wang
Closes#1526 from enothereska/expose-names-dsl
guozhangwang enothereska mjsax miguno
If you get a chance can you please take a look at this. I've done the repartitioning in the join, but it results in 2 internal topics for each join. This seems like overkill as sometimes we wouldn't need to repartition at all, others just 1 topic, and then sometimes both, but I'm not sure how we can know that.
I'd also need to implement something similar for leftJoin, but again, i'd like to see if i'm heading down the right path or if anyone has any other bright ideas.
For reference - https://github.com/apache/kafka/pull/1453 - the previous PR
Thanks for taking the time and looking forward to getting some welcome advice :-)
Author: Damian Guy <damian.guy@gmail.com>
Author: Damian Guy <damian@continuum.local>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1472 from dguy/KAFKA-3561
This PR includes the same code as https://github.com/apache/kafka/pull/1261 but is rebased on latest trunk.
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#1277 from miguno/KAFKA-3613-v2
For enums and other constant strings, use locale independent case conversions to enable comparisons to work regardless of the default locale.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Manikumar Reddy, Ismael Juma, Guozhang Wang, Gwen Shapira
Closes#1220 from rajinisivaram/KAFKA-3548
Also include:
1) remove streams specific configs before passing to producer and consumer to avoid warning message;
2) add `ConsumerRecord` timestamp extractor and set as the default extractor.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Michael G. Noll, Ewen Cheslack-Postava
Closes#1093 from guozhangwang/KConfigWarn
guozhangwang ymatsuda : please review.
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1081 from miguno/KAFKA-3411
Also remove some unused imports.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#992 from guozhangwang/KSExamples
and remove TOTAL_RECORDS_TO_PROCESS
guozhangwang
Author: Yasuhiro Matsuda <yasuhiro@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#985 from ymatsuda/config_params
Changes include:
1) Move logging logic from MeteredXXXStore to internal stores, and leave WindowedStore API clean by removed all internalPut/Get functions.
2) Wrap common logging behavior of InMemory and LRUCache stores into one class.
3) Fix a bug for StoreChangeLogger where byte arrays are not comparable in HashSet by using a specified RawStoreChangeLogger.
4) Add a caching layer on top of RocksDBStore with object caching, it relies on the object's equals and hashCode function to be consistent with serdes.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Yasuhiro Matsuda <yasuhiro@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#826 from guozhangwang/K3060