Also made a pass over the streams unit tests, with the following changes:
1. Removed three integration tests as they are already covered by other integration tests.
2. Merged `KGroupedTableImplTest` into `KTableAggregateTest`.
3. Use mocks whenever possible to reduce code duplicates.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1604 from guozhangwang/Kminor-unit-tests-consolidation
Fix timing window in producer by holding onto cluster object while processing send requests so that changes to cluster during metadata refresh don't cause NPE if a topic is deleted.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Sriharsha Chintalapani <harsha@hortonworks.com>, Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#1478 from rajinisivaram/KAFKA-3562
It was previously only deleting files/folders where the path started with /tmp. Changed it to delete from the value of the System Property `java.io.tmpdir`. Also changed the tests that were creating State dirs under /tmp to just use `TestUtils.tempDirectory(..)`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1600 from dguy/kafka-3942
I've updated the ops documentation with information on using the XFS filesystem, based on LinkedIn's testing (and subsequent switch from EXT4).
I've also added some information to clarify the potential risk to the suggested EXT4 options (again, based on my experience with a multiple broker failure situation).
Author: Todd Palino <tpalino@linkedin.com>
Reviewers: Sriharsha Chintalapani <harsha@hortonworks.com>, Dana Powers <dana.powers@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1605 from toddpalino/trunk
Author: Ashish Singh <asingh@cloudera.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Grant Henke <granthenke@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1515 from SinghAsDev/KAFKA-3849
Author: Nafer Sanabria <nafr.snabr@gmail.com>
Reviewers: Grant Henke <granthenke@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1595 from naferx/minor-typo
Since 0.9.0.1 Configuration parameter log.cleaner.enable is now true by default.
Author: Nihed MBAREK <nihedmm@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1592 from nihed/patch-1
Interval lengths for ConsumerPerformance could sometime be calculated as zero. In such cases, when the bytes read or messages read are also zero a NaN output is returned for mbRead per second or for nMsg per second, whereas zero would be a more appropriate output.
In cases where interval length is zero but there have been data and messages to read, an output of Infinity is returned, as expected.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Dong Lin <lindong28@gmail.com>, Jun Rao <junrao@gmail.com>
Closes#788 from vahidhashemian/KAFKA-3111
Also handle Null value in SmokeTestUtil.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#1597 from guozhangwang/KHotfix-check-null
Minor changes to check null changes.
Author: Jeyhun Karimov <je.karimov@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1591 from jeyhunkarimov/KAFKA-3836
…int to the console.
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#1577 from bbejeck/KAFKA-3794-add-prefix-to-print-functions
There seems to be a bug in the JDK that on some versions the mtime of
the file is modified on FileChannel.truncate() even if the javadoc states
`If the given size is greater than or equal to the file's current size then
the file is not modified.`.
This causes problems with log retention, as all the files then look like
they contain recent data to Kafka. Therefore this is only done if the channel size is different to the target size.
Author: Moritz Siuts <m.siuts@emetriq.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1497 from msiuts/KAFKA-3802-log_mtimes_reset_on_broker_shutdown
`Windows.segments(...)` and `Windows.until(...)` currently aren't returning the `Window` with its type param `W`. This causes the generic type to be lost and therefore methods using this can't infer the correct return types.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#1587 from dguy/windows-generics
The `KStream.groupBy(..)` calls don't change the value, only the key, so they don't need the type param `V1` as the new stream will always be of type `KStream<K1, V>`.
The `Serde` in the overloaded `groupBy` should have a type param of `V` to match the returned `KStream`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#1584 from dguy/kstream-generics
follow-up to auto-through feature:
- add sourceNode to transform()
- enable auto-repartitioning in merge()
- null check not required anymore (always join-able due to auto-through)
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1580 from mjsax/hotfix
This patch fixes two issues:
1. Subsequent regex subscriptions fail with the new consumer.
2. Subsequent regex subscriptions would not immediately refresh metadata to change the subscription of the new consumer and trigger a rebalance.
The final note on the JIRA stating that a later created topic that matches a consumer's subscription pattern would not be assigned to the consumer upon creation seems to be as designed. A repeat
`subscribe()` to the same pattern or some wait time until the next automatic metadata refresh would handle that case.
An integration test was also added to verify these issues are fixed with this PR.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1572 from vahidhashemian/KAFKA-3854
The contribution is my original work and that I license the work to the project under the project's open source license.
Contributors: Guozhang Wang, Phil Derome
guozhangwang
Added checkEmpty to validate processor does nothing and added a inhibit check for filter to fix issue.
Author: Philippe Derome <phderome@gmail.com>
Author: Phil Derome <phderome@gmail.com>
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1556 from phderome/DEROME-3902
This is the part I of the work to add the StreamsConfig to ProcessorContext.
We need to access StreamsConfig in the ProcessorContext so other components (e.g. RocksDBWindowStore or LRUCache can retrieve config parameter from application)
Author: Henry Cai <hcai@pinterest.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1553 from HenryCaiHaiying/config
Current task assignment in TaskAssignor is not deterministic.
During cluster restart or rolling restart, we have the same set of participating worker nodes. But the current TaskAssignor is not able to maintain a deterministic mapping, so about 20% partitions will be reassigned which would cause state repopulation.
When the topology of work nodes (# of worker nodes, the TaskIds they are carrying with) is not changed, we really just want to keep the old task assignment.
Add the code to check whether the node topology is changing or not:
- when the prevAssignedTasks from the old clientStates is the same as the new task list
- when there is no new node joining (its prevAssignTasks would be either empty or conflict with some other nodes)
- when there is no node dropping out (the total of prevAssignedTasks from other nodes would not be equal to the new task list)
When the topology is not changing, we would just use the old mapping.
I also added the code to check whether the previous assignment is balanced (whether each node's task list is within [1/2 average -- 2 * average]), if it's not balanced, we will still start the a new task assignment.
Author: Henry Cai <hcai@pinterest.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1543 from HenryCaiHaiying/upstream
The reasons to remove it are:
1. It's currently broken. The purpose of the [JIRA](https://issues.apache.org/jira/browse/KAFKA-3761) was to report that the RunningAsController state gets overwritten back to "RunningAsBroker".
2. It's not a useful state.
a. If clients want to use this metric to know whether a broker is ready to receive requests or not, they do not care whether or not the broker is the controller
b. there is already a separate boolean property, KafkaController.isActive which contains this information.
Author: Roger Hoover <roger.hoover@gmail.com>
Reviewers: Grant Henke <granthenke@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1437 from theduderog/KAFKA-3761-broker-state
ijuma i checked the cases where this test has failed and it seems to always be on the verification of the left join. I've ran this test plenty of times and i can't get it to fail. However in the interest of having stable builds, i've removed just the part of the test that is failing (which happens to be the last verification).
Thanks,
Damian
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma, Guozhang Wang
Closes#1549 from dguy/kafka-3896
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>
Closes#1550 from guozhangwang/Kminor-grouppartitioner-javadoc
Was just reading kafka source code, my favourite Friday afternoon activity, when I found these small grammatical errors in some `DataException` messages.
Could someone please review? ewencp dguy
Author: Laurier Mantel <laurier.mantel@shopify.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1551 from LaurierMantel/maps-typos
…stUtils, added method for pausing tests to TestUtils
Changes made:
1. Added utility method for creating consumer configs.
2. Added methods for creating producer, consumer configs with default values for de/serializers.
3. Pulled out method for waiting for test state to TestUtils (not using Thread.sleep).
4. Added utility class for creating streams configs and methods providing default de/serializers.
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1532 from bbejeck/KAFKA_3842_add_helper_functions_test_utils
The failure could manifest itself if the default metrics registry had some entries from other tests:
`java.lang.AssertionError: Unexpected meter count expected:<0> but was:<3>`
I also removed an unused variable and improved the error message to include the metric name.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Gwen Shapira
Closes#1544 from ijuma/fix-transient-session-expire-listener-metrics-failure
The method `RocksDB.open` assumes an absolute file path. If a relative path is configured, it leads to an exception like the following:
```
org.apache.kafka.streams.errors.ProcessorStateException: Error opening store CustomerIdToUserIdLookup at location ./tmp/rocksdb/CustomerIdToUserIdLookup
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:183)
at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:214)
at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:165)
at org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:170)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:85)
at org.apache.kafka.test.KStreamTestDriver.<init>(KStreamTestDriver.java:64)
at org.apache.kafka.test.KStreamTestDriver.<init>(KStreamTestDriver.java:50)
at com.simple.estuary.transform.streaming.CartesianTransactionEnrichmentJobTest.testBuilder(CartesianTransactionEnrichmentJobTest.java:41)
```
Is there any risk to always fetching the absolute path as proposed here?
Let me know if you think this requires a JIRA issue or a unit test. I started working on a unit test, but don't know of a great solution for writing out a file to a relative directory.
This contribution is my original work and I license the work to the project under the project's open source license.
Author: Jeff Klukas <jeff@klukas.net>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1481 from jklukas/rocksdb-abspath
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Michael G. Noll, Damian Guy, Eno Thereska, Guozhang Wang
Closes#1529 from mjsax/kafka-3880-join-windows
Co-authored with ijuma.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1536 from vahidhashemian/minor/KAFKA-3176-Followup
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1533 from enothereska/KAFKA-3872-oom-integration-tests
With this pull request the new console consumer can be provided with optional --partition and --offset arguments so only messages from a particular partition and starting from a particular offset are consumed.
The following rules are also implemented to avoid invalid combinations of arguments:
- If --partition or --offset is provided --new-consumer has to be provided too.
- If --partition is provided --topic has to be provided too.
- If --offset is provided --partition has to be provided too.
- --offset and --from-beginning cannot be used at the same time.
This patch is co-authored with rajinisivaram.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#922 from vahidhashemian/KAFKA-3176