For session-windows, the result record should have the window-end timestamp as record timestamp.
Rebased to resolve merge conflicts. Removed unused classes TupleForwarder and ForwardingCacheFlushListener (replace with TimestampedTupleForwarder, SessionTupleForwarder, TimestampedCacheFlushListerner, and SessionCacheFlushListener)
Reviewers: John Roesler <john@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Detailed description
* Adds KTable.suppress to the Scala API.
* Fixed count in KGroupedStream, SessionWindowedKStream, and TimeWindowedKStream so that the value serde gets passed down to the KTable returned by the internal mapValues method.
* Suppress API support for Java 1.8 + Scala 2.11
Testing strategy
I added unit tests covering:
* Windowed KTable.count.suppress w/ Suppressed.untilTimeLimit
* Windowed KTable.count.suppress w/ Suppressed.untilWindowCloses
* Non-windowed KTable.count.suppress w/ Suppressed.untilTimeLimit
* Session-windowed KTable.count.suppress w/ Suppressed.untilWindowCloses
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
The Java API can pass a Properties object to StreamsBuilder#build, to allow, e.g., topology optimization, while the Scala API does not yet. The latter only delegates the work to the underlying Java implementation.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Resolves the compiler warnings when building streams-scala.
Reviewers: A. Sophie Blee-Goldman <ableegoldman@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Improve JavaDocs about global state stores.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
- Use Xlint:all with 3 exclusions (filed KAFKA-7613 to remove the exclusions)
- Use the same javac options when compiling tests (seems accidental that
we didn't do this before)
- Replaced several deprecated method calls with non-deprecated ones:
- `KafkaConsumer.poll(long)` and `KafkaConsumer.close(long)`
- `Class.newInstance` and `new Integer/Long` (deprecated since Java 9)
- `scala.Console` (deprecated in Scala 2.11)
- `PartitionData` taking a timestamp (one of them seemingly a bug)
- `JsonMappingException` single parameter constructor
- Fix unnecessary usage of raw types in several places.
- Add @SuppressWarnings for deprecations, unchecked and switch fallthrough in
several places.
- Scala clean-ups (var -> val, ETA expansion warnings, avoid reflective calls)
- Use lambdas to simplify code in a few places
- Add @SafeVarargs, fix varargs usage and remove unnecessary `Utils.mkList` method
Reviewers: Matthias J. Sax <mjsax@apache.org>, Manikumar Reddy <manikumar.reddy@gmail.com>, Randall Hauch <rhauch@gmail.com>, Bill Bejeck <bill@confluent.io>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
While working on the documentation updates I realized the Streams Scala API needs
to get updated for the addition of Grouped
Added a test for Grouped.scala ran all streams-scala tests and streams tests
Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Stop using current system time by default, as it introduces non-determinism.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Reviewers: Johne Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Currently, scala.Serdes.String, for example, invokes Serdes.String() once and caches the result.
However, the implementation of the String serde has a non-empty configure method that is variant in whether it's used as a key or value serde. So we won't get correct execution if we create one serde and use it for both keys and values.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This PR fixes the previously recursive call of Streams Scala peek
Reviewers: Joan Goyeau <joan@goyeau.com>, Guozhang Wang <guozhang@confluent.io>, John Roesler <john@confluent.io>
Due to lack of conversion to kstream Predicate, existing filter method in KTable.scala would result in StackOverflowError.
This PR fixes the bug and adds testing for it.
Reviewers: Guozhang Wang <guozhang@confluent.io>, John Roesler <john@confluent.io>
Join in the Scala streams API is currently unusable in 2.0.0 as reported by @mowczare:
#5019 (comment)
This due to an overload of it with the same signature in the first curried parameter.
See compiler issue that didn't catch it: https://issues.scala-lang.org/browse/SI-2628
Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>, John Roesler <john@confluent.io>
#5468 introduced a breaking API change that was actually avoidable. This PR re-introduces the old API as deprecated and alters the API introduced by #5468 to be consistent with the other methods
also, fixed misc syntax problems
1. In each iteration, decide if a task is processable if all of its partitions contains data, so it can decide which record to process next.
1.a Add one exception that, if the task indeed have data on some but not all of its partitions, we only consider as not processable for some finite round of iterations.
1.b Add a task-level metric to record whenever we are forced to process a task that is only "partially data available", since it may leads to non-determinism.
2. Break the main loop on put-raw-data and process-them. Since now not all data put into the queue would be processed completely within a single iteration.
3. NOTE that within an iteration, if a task has exhausted one of its queue it will still be processed, since we only update processable list once in each iteration, I'm improving on this on the follow-up part III PR.
4. Found and fixed a bug in metrics recording: the taskName and sensorName parameters were exchanged.
5. Optimized task stream time computation again since our current partition stream time reasoning has been simplified.
6. Added unit tests.
Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <vvcephei@users.noreply.github.com>, Bill Bejeck <bbejeck@gmail.com>
1. At the beginning of assign, we first check that all the non-repartition source topics are included in the metadata. If not, we log an error at the leader and set an error in the Assignment userData bytes, indicating that leader cannot complete assignment and the error code would indicate the root cause of it.
2. Upon receiving the assignment, if the error is not NONE the streams will shutdown itself with a log entry re-stating the root cause interpreted from the error code.
Author: tedyu <yuzhihong@gmail.com>
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Closes#5322 from tedyu/trunk
In #4919 we propagate the SerDes for each of these aggregation operators.
As @guozhangwang mentioned in that PR:
```
reduce: inherit the key and value serdes from the parent XXImpl class.
count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes.
aggregate: inherit the key serdes, do not set for value serdes internally.
```
Although it's all good for reduce and count, it is quiet unsafe to have aggregate without Materialized given. In fact I don't see why we would not give a Materialized for the aggregate since the result type will always be different (otherwise use reduce) and also the value Serde is simply not propagated.
This has been discussed previously in a broader PR before but I believe for aggregate we could pass implicitly a Materialized the same way we pass a Joined, just to avoid the stupid case. Then if the user wants to specialize, he can give his own Materialized.
Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>
* Removed Scala producers, request classes, kafka.tools.ProducerPerformance, encoders,
tests.
* Updated ConsoleProducer to remove Scala producer support (removed `BaseProducer`
and several options that are not used by the Java producer).
* Updated a few Scala consumer tests to use the new producer (including a minor
refactor of `produceMessages` methods in `TestUtils`).
* Updated `ClientUtils.fetchTopicMetadata` to use `SimpleConsumer` instead of
`SyncProducer`.
* Removed `TestKafkaAppender` as it looks useless and it defined an `Encoder`.
* Minor import clean-ups
No new tests added since behaviour should remain the same after these changes.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5045 from ijuma/kafka-6921-remove-old-producer
The type inference doesn't currently work for the join functions in Scala as it doesn't know yet the types of the given KStream[K, V] or KTable[K, V].
The fix here is to curry the joiner function. I personally prefer this notation but this also means it differs more from the Java API.
I believe the diff with the Java API is worth in this case as it's not only solving the type inference but also fits better the Scala way of coding (ex: fold).
Moreover any Scala dev will bug and spend little time on these functions trying to understand why the type inference is not working and then get frustrated to be obliged to be explicit here where it's not harmful to be inferred.
Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Serdes are confusing in the Scala wrapper:
* We have wrappers around Serializer, Deserializer and Serde which are not very useful.
* We have Serdes in 2 places org.apache.kafka.common.serialization.Serde and in DefaultSerdes, instead we should be having only one place where to find all the Serdes.
I wanted to do this PR before the release as this is a breaking change.
This shouldn't add more so the current tests should be enough.
Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>
Reviewer: Matthias J. Sax <matthias@confluent.io>, Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
Several build and documentation updates were required after the merge of KAFKA-6670: Implement a Scala wrapper library for Kafka Streams.
Encode Scala major version into streams-scala artifacts.
To differentiate versions of the kafka-streams-scala artifact across Scala major versions it's required to encode the version into the artifact name before its published to a maven repository. This is accomplished by following a similar release process as kafka core, which encodes the Scala major version and then runs the build for each major version of Scala supported. This is considered standard practice when releasing Scala libraries, but is not handled for us automatically with the basic Scala for Gradle support.
After this change you can generate and install the kafka-streams-scala artifact into the local maven repository:
$ ./gradlew -PscalaVersion=2.11 install
$ ./gradlew -PscalaVersion=2.12 install
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
While working on this, I also refactored the MockProcessor out of the MockProcessorSupplier to cleanup the unit test paths.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR implements a Scala wrapper library for Kafka Streams. The library is implemented as a project under streams, namely `:streams:streams-scala`. The PR contains the following:
* the library implementation of the wrapper abstractions
* the test suite
* the changes in `build.gradle` to build the library jar
The library has been tested running the tests as follows:
```
$ ./gradlew -Dtest.single=StreamToTableJoinScalaIntegrationTestImplicitSerdes streams:streams-scala:test
$ ./gradlew -Dtest.single=StreamToTableJoinScalaIntegrationTestImplicitSerdesWithAvro streams:streams-scala:test
$ ./gradlew -Dtest.single=WordCountTest streams:streams-scala:test
```
Author: Debasish Ghosh <ghosh.debasish@gmail.com>
Author: Sean Glover <seglo@randonom.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>, John Roesler <john@confluent.io>, Damian Guy <damian@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4756 from debasishg/scala-streams