Also made a pass over the streams unit tests, with the following changes:
1. Removed three integration tests as they are already covered by other integration tests.
2. Merged `KGroupedTableImplTest` into `KTableAggregateTest`.
3. Use mocks whenever possible to reduce code duplicates.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1604 from guozhangwang/Kminor-unit-tests-consolidation
It was previously only deleting files/folders where the path started with /tmp. Changed it to delete from the value of the System Property `java.io.tmpdir`. Also changed the tests that were creating State dirs under /tmp to just use `TestUtils.tempDirectory(..)`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1600 from dguy/kafka-3942
Author: Nafer Sanabria <nafr.snabr@gmail.com>
Reviewers: Grant Henke <granthenke@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1595 from naferx/minor-typo
Also handle Null value in SmokeTestUtil.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#1597 from guozhangwang/KHotfix-check-null
Minor changes to check null changes.
Author: Jeyhun Karimov <je.karimov@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1591 from jeyhunkarimov/KAFKA-3836
…int to the console.
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#1577 from bbejeck/KAFKA-3794-add-prefix-to-print-functions
`Windows.segments(...)` and `Windows.until(...)` currently aren't returning the `Window` with its type param `W`. This causes the generic type to be lost and therefore methods using this can't infer the correct return types.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#1587 from dguy/windows-generics
The `KStream.groupBy(..)` calls don't change the value, only the key, so they don't need the type param `V1` as the new stream will always be of type `KStream<K1, V>`.
The `Serde` in the overloaded `groupBy` should have a type param of `V` to match the returned `KStream`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#1584 from dguy/kstream-generics
follow-up to auto-through feature:
- add sourceNode to transform()
- enable auto-repartitioning in merge()
- null check not required anymore (always join-able due to auto-through)
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1580 from mjsax/hotfix
The contribution is my original work and that I license the work to the project under the project's open source license.
Contributors: Guozhang Wang, Phil Derome
guozhangwang
Added checkEmpty to validate processor does nothing and added a inhibit check for filter to fix issue.
Author: Philippe Derome <phderome@gmail.com>
Author: Phil Derome <phderome@gmail.com>
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1556 from phderome/DEROME-3902
This is the part I of the work to add the StreamsConfig to ProcessorContext.
We need to access StreamsConfig in the ProcessorContext so other components (e.g. RocksDBWindowStore or LRUCache can retrieve config parameter from application)
Author: Henry Cai <hcai@pinterest.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1553 from HenryCaiHaiying/config
Current task assignment in TaskAssignor is not deterministic.
During cluster restart or rolling restart, we have the same set of participating worker nodes. But the current TaskAssignor is not able to maintain a deterministic mapping, so about 20% partitions will be reassigned which would cause state repopulation.
When the topology of work nodes (# of worker nodes, the TaskIds they are carrying with) is not changed, we really just want to keep the old task assignment.
Add the code to check whether the node topology is changing or not:
- when the prevAssignedTasks from the old clientStates is the same as the new task list
- when there is no new node joining (its prevAssignTasks would be either empty or conflict with some other nodes)
- when there is no node dropping out (the total of prevAssignedTasks from other nodes would not be equal to the new task list)
When the topology is not changing, we would just use the old mapping.
I also added the code to check whether the previous assignment is balanced (whether each node's task list is within [1/2 average -- 2 * average]), if it's not balanced, we will still start the a new task assignment.
Author: Henry Cai <hcai@pinterest.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1543 from HenryCaiHaiying/upstream
ijuma i checked the cases where this test has failed and it seems to always be on the verification of the left join. I've ran this test plenty of times and i can't get it to fail. However in the interest of having stable builds, i've removed just the part of the test that is failing (which happens to be the last verification).
Thanks,
Damian
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma, Guozhang Wang
Closes#1549 from dguy/kafka-3896
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>
Closes#1550 from guozhangwang/Kminor-grouppartitioner-javadoc
…stUtils, added method for pausing tests to TestUtils
Changes made:
1. Added utility method for creating consumer configs.
2. Added methods for creating producer, consumer configs with default values for de/serializers.
3. Pulled out method for waiting for test state to TestUtils (not using Thread.sleep).
4. Added utility class for creating streams configs and methods providing default de/serializers.
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1532 from bbejeck/KAFKA_3842_add_helper_functions_test_utils
The method `RocksDB.open` assumes an absolute file path. If a relative path is configured, it leads to an exception like the following:
```
org.apache.kafka.streams.errors.ProcessorStateException: Error opening store CustomerIdToUserIdLookup at location ./tmp/rocksdb/CustomerIdToUserIdLookup
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:183)
at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:214)
at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:165)
at org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:170)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:85)
at org.apache.kafka.test.KStreamTestDriver.<init>(KStreamTestDriver.java:64)
at org.apache.kafka.test.KStreamTestDriver.<init>(KStreamTestDriver.java:50)
at com.simple.estuary.transform.streaming.CartesianTransactionEnrichmentJobTest.testBuilder(CartesianTransactionEnrichmentJobTest.java:41)
```
Is there any risk to always fetching the absolute path as proposed here?
Let me know if you think this requires a JIRA issue or a unit test. I started working on a unit test, but don't know of a great solution for writing out a file to a relative directory.
This contribution is my original work and I license the work to the project under the project's open source license.
Author: Jeff Klukas <jeff@klukas.net>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1481 from jklukas/rocksdb-abspath
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Michael G. Noll, Damian Guy, Eno Thereska, Guozhang Wang
Closes#1529 from mjsax/kafka-3880-join-windows
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1533 from enothereska/KAFKA-3872-oom-integration-tests
... by
a) merging some for startup/shutdown efficiency.
b) use independent state dirs.
c) remove some tests that are covered elsewhere
guozhangwang ewencp - tests are running much quicker now, i.e, down to about 1 minute on my laptop (from about 2 - 3 minutes). There were some issues with state-dirs in some of the integration tests that was causing the shutdown of the streams apps to take a long time.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma, Guozhang Wang
Closes#1525 from dguy/integration-tests
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax
Closes#1520 from guozhangwang/KHotfix-iter-hasNext-window-value-getter
- Check if DB is null before flushing or closing. In some cases, a state store is closed twice. This happens in `StreamTask.close()` where both `node.close()` and `super.close` (in `ProcessorManager`) are called in a sequence. If the user's processor defines a `close` that closes the underlying state store, then the second close will be redundant.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Andrés Gómez, Ismael Juma, Guozhang Wang
Closes#1485 from enothereska/KAFKA-3805-locks
guozhangwang enothereska mjsax miguno
If you get a chance can you please take a look at this. I've done the repartitioning in the join, but it results in 2 internal topics for each join. This seems like overkill as sometimes we wouldn't need to repartition at all, others just 1 topic, and then sometimes both, but I'm not sure how we can know that.
I'd also need to implement something similar for leftJoin, but again, i'd like to see if i'm heading down the right path or if anyone has any other bright ideas.
For reference - https://github.com/apache/kafka/pull/1453 - the previous PR
Thanks for taking the time and looking forward to getting some welcome advice :-)
Author: Damian Guy <damian.guy@gmail.com>
Author: Damian Guy <damian@continuum.local>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1472 from dguy/KAFKA-3561
This PR is the follow on to the closed PR #1410.
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1477 from bbejeck/KAFKA-3443_streams_support_for_regex_sources
See https://issues.apache.org/jira/browse/KAFKA-3753
This contribution is my original work and I license the work to the project under the project's open source license.
cc guozhangwang kichristensen ijuma
Author: Jeff Klukas <jeff@klukas.net>
Reviewers: Ismael Juma, Guozhang Wang
Closes#1486 from jklukas/kvstore-size
guozhangwang mjsax enothereska
Currently, Kafka Streams does not have a util to get access to the sequence number added to the key of windows state store changelogs. I'm interested in exposing it so the the contents of a changelog topic can be 1) inspected for debugging purposes and 2) saved to text file and loaded from text file
Author: Roger Hoover <roger.hoover@gmail.com>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1501 from theduderog/expose-seq-num
- Fixed the logic calculating the windows that are affected by a new …event in the case of hopping windows and a small overlap.
- Added a unit test that tests for the issue
Author: Tom Rybak <trybak@gmail.com>
Reviewers: Michael G. Noll, Matthias J. Sax, Guozhang Wang
Closes#1462 from trybak/bugfix/KAFKA-3784-TimeWindows#windowsFor-false-positives
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1422 from enothereska/minor-integration-timeout2
Initially proposed by ijuma in https://github.com/apache/kafka/pull/1362#issuecomment-218293662
mjsax commented:
> StreamThread.close() should be extended to call metrics.close() (the class need a private member to reference the Metrics object, too)
The `Metrics` instance is created in the `KafkaStreams` constructor and shared between all threads, so closing it within the threads doesn't seem like the right approach. This PR calls `Metrics.close()` in `KafkaStreams.close()` instead.
cc guozhangwang
Author: Jeff Klukas <jeff@klukas.net>
Reviewers: Ismael Juma, Guozhang Wang
Closes#1379 from jklukas/close-streams-metrics
I removed the hamcrest matcher to unbreak the build, but we probably want to tweak the `import-control.xml` as it currently only allows it for `<subpackage name="integration">`, which is weird.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1380 from ijuma/fix-streams-config-test-checkstyle
This is an improved version of https://github.com/apache/kafka/pull/1374, where we include a unit test.
/cc ijuma and guozhangwang
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Michael G. Noll <michael@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#1377 from miguno/streamsconfig-multiple-bootstrap-servers
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Michael G. Noll <michael@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#1311 from guozhangwang/K3639
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Ismael Juma, Michael G. Noll, Guozhang Wang
Closes#1285 from enothereska/more-integration-tests