Enable producer per task if exactly-once config is enabled.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2773 from mjsax/exactly-once-streams-producer-per-task
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2848 from enothereska/KAFKA-5038-trunk
Set the internal consumer config internal.leave.group.on.close in
`StreamsConfig`. This is to reduce the number of rebalances we get
during bounces.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2750 from dguy/kafka-4965
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2751 from miguno/trunk-streams-window-iterator-doc-fixes
Author: Eno Thereska <eno@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2819 from enothereska/minor-increase-retries
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2780 from cmccabe/KAFKA-4995
Highlight that the range in `fetch` is inclusive of both `timeFrom` and `timeTo`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Michael G. Noll <michael@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2811 from dguy/minor-window-fetch-java-doc
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Eno Thereska <eno@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2812 from miguno/trunk-streams-examples-docs
We should catch `InvalidTopicException` and not just
`NoOffsetForPartitionException`. Also, we need to step through
all partitions that might be affected and reset those.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2747 from mjsax/minor-fix-reset
There should only be a single `KafkaStreams.StreamStateListener` to
ensure synchronization of operations on
`KafkaStreams.StreamStateListener#threadState`.
Author: Armin Braun <me@obrown.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2801 from original-brownbear/fix-stream-state-listener
This fixes:
```
java.lang.AssertionError: expected:<2> but was:<3>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at org.apache.kafka.streams.processor.internals.StateDirectoryTest.shouldCleanUpTaskStateDirectoriesThatAreNotCurrentlyLocked(StateDirectoryTest.java:145)
```
While running test in infinite loop, hit other problems:
- fixed file management (release all locks and close everything)
- increased sleep time for `shouldCleanupStateDirectoriesWhenLastModifiedIsLessThanNowMinusCleanupDelay` too (was flaky as well)
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2781 from mjsax/minor-fix-stateDirectoryTest
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2777 from mjsax/hotfix-window-serdes-trunk
Several fixes for handling broker failures:
- default replication value for internal topics is now 3 in test itself (not in streams code, that will require a KIP.
- streams producer waits for acks from all replicas in test itself (not in streams code, that will require a KIP.
- backoff time for streams client to try again after a failure to contact controller.
- fix bug related to state store locks (this helps in multi-threaded scenarios)
- fix related to catching exceptions property for network errors.
- system test for all the above
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Dan Norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2719 from enothereska/KAFKA-4916-broker-bounce-test
fixes:
```
java.nio.file.NoSuchFileException: /tmp/test7863510415433793941/topic2-Canonized/topic2-Canonized-197001010000/000015.sst
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:97)
at java.nio.file.Files.readAttributes(Files.java:1686)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:105)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:199)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:199)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:199)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:69)
at java.nio.file.Files.walkFileTree(Files.java:2602)
at java.nio.file.Files.walkFileTree(Files.java:2635)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:555)
at org.apache.kafka.streams.kstream.internals.KStreamWindowAggregateTest.testJoin(KStreamWindowAggregateTest.java:320)
```
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Jun Rao <junrao@gmail.com>
Closes#2778 from mjsax/minor-fix-kstreamWindowAggregateTest
This may be a reason why we see Jenkins jobs time out at times.
I can reproduce it locally.
With current trunk there is a possibility to run into this:
```sh
"kafka-streams-close-thread" #585 daemon prio=5 os_prio=0 tid=0x00007f66d052d800 nid=0x7e02 waiting for monitor entry [0x00007f66ae2e5000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.kafka.streams.processor.internals.StreamThread.close(StreamThread.java:345)
- waiting to lock <0x000000077d33c538> (a org.apache.kafka.streams.processor.internals.StreamThread)
at org.apache.kafka.streams.KafkaStreams$1.run(KafkaStreams.java:474)
at java.lang.Thread.run(Thread.java:745)
"appId-bd262a91-5155-4a35-bc46-c6432552c2c5-StreamThread-97" #583 prio=5 os_prio=0 tid=0x00007f66d052f000 nid=0x7e01 waiting for monitor entry [0x00007f66ae4e6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.kafka.streams.KafkaStreams.setState(KafkaStreams.java:219)
- waiting to lock <0x000000077d335760> (a org.apache.kafka.streams.KafkaStreams)
at org.apache.kafka.streams.KafkaStreams.access$100(KafkaStreams.java:117)
at org.apache.kafka.streams.KafkaStreams$StreamStateListener.onChange(KafkaStreams.java:259)
- locked <0x000000077d42f138> (a org.apache.kafka.streams.KafkaStreams$StreamStateListener)
at org.apache.kafka.streams.processor.internals.StreamThread.setState(StreamThread.java:168)
- locked <0x000000077d33c538> (a org.apache.kafka.streams.processor.internals.StreamThread)
at org.apache.kafka.streams.processor.internals.StreamThread.setStateWhenNotInPendingShutdown(StreamThread.java:176)
- locked <0x000000077d33c538> (a org.apache.kafka.streams.processor.internals.StreamThread)
at org.apache.kafka.streams.processor.internals.StreamThread.access$1600(StreamThread.java:70)
at org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsRevoked(StreamThread.java:1321)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:406)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:349)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:296)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1037)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1002)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:531)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:669)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:326)
```
In a nutshell: `KafkaStreams` and `StreamThread` are both
waiting for each other since another intermittent `close`
(eg. from a test) comes along also trying to lock on
`KafkaStreams` :
```sh
"main" #1 prio=5 os_prio=0 tid=0x00007f66d000c800 nid=0x78bb in Object.wait() [0x00007f66d7a15000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
- locked <0x000000077d45a590> (a java.lang.Thread)
at org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:503)
- locked <0x000000077d335760> (a org.apache.kafka.streams.KafkaStreams)
at org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:447)
at org.apache.kafka.streams.KafkaStreamsTest.testCannotStartOnceClosed(KafkaStreamsTest.java:115)
```
=> causing a deadlock.
Fixed this by softer locking on the state change, that guarantees
atomic changes to the state but does not lock on the whole object
(I at least could not find another method that would require more
than atomicly-locked access except for `setState`).
Also qualified the state listeners with their outer-class to make
the whole code-flow around this more readable (having two
interfaces with the same naming for interface and method and then
using them between their two outer classes is crazy hard to read
imo :)).
Easy to reproduced yourself by running
`org.apache.kafka.streams.KafkaStreamsTest` in a loop for a bit
(save yourself some time by running 2-4 in parallel :)). Eventually
it will lock on one of the tests (for me this takes less than 1 min
with 4 parallel runs).
Author: Armin Braun <me@obrown.io>
Author: Armin <me@obrown.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2791 from original-brownbear/fix-streams-deadlock
Fix for adding state stores with regex defined sources
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Matthias J. Sax, Damian Guy, Guozhang Wang
Closes#2618 from bbejeck/KAFKA-4791_unable_to_add_statestore_regex_topics
We got test error `org.apache.kafka.common.errors.TopicExistsException: Topic 'inputTopic' already exists.` in some builds. Can reproduce reliably at local machine. Root cause it async "topic delete" that might not be finished before topic gets re-created.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma, Damian Guy, Guozhang Wang
Closes#2757 from mjsax/minor-fix-resetintegrationtest
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jun Rao <junrao@gmail.com>, Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2691 from cmccabe/KAFKA-4902
This is a minor change but it helps to improve the log readability.
Author: Kamal C <kamal.chandraprakash@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2709 from Kamal15/util
- Improves streams efficiency by more than 200K requests/second (small 100 byte requests)
- Gets streams efficiency very close to pure consumer (see results in https://jenkins.confluent.io/job/system-test-kafka-branch-builder/746/console)
- Maintains same fairness across tasks
- Schedules all records in the queue in-between poll() calls, not just one per task.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax, Guozhang Wang
Closes#2643 from enothereska/minor-schedule-round-robin
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Michael G. Noll <michael@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2727 from cmccabe/KAFKA-4944
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Jun Rao <junrao@gmail.com>, Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2614 from hachikuji/exactly-once-message-format
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Michael G. Noll, Eno Thereska, Matthias J. Sax, Elias Levy, Guozhang Wang
Closes#2725 from dguy/minor-processor-java-doc
1. add thread id as prefix in state directory classes; also added logs for lock activities.
2. add logging for task creation / suspension.
3. add more information in rebalance listener logging.
4. add restored number of records into changlog reader.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Eno Thereska, Damian Guy, Ewen Cheslack-Postava
Closes#2702 from guozhangwang/KMinor-streams-task-creation-log4j-improvements
Sometimes `ResetIntegrationTest` hangs and thus the build times out. We suspect, that this happens if no data is written into the input topics. Right now, input data is written once and reused for both test cases. If for some reason, the broker gets recreated (between both test cases), no data will be available for the second test method and thus the test hangs.
This change ensures, that input data is written for each test case individually.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma, Eno Thereska, Guozhang Wang
Closes#2630 from mjsax/minor-reset-integration-test
Debug loggin of the start and end offsets used during state store restoration
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2718 from dguy/log-restore-offsets
remove unused log field from KStreamTransformValuesProcessor
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2717 from dguy/remove-unused-log-para
iterate over all keys returned from the rocksdb iterator so we don't miss any results
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Xavier Léauté, Guozhang Wang
Closes#2713 from dguy/window-iter
This uses JUnit Categories to identify integration tests. Adds 2 new build targets:
`integrationTest` and `unitTest`.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska <eno@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2695 from dguy/junit-categories
Fixes related to handling of MAX_POLL_INTERVAL_MS_CONFIG during deadlock and CommitFailedException on partition revoked.
Author: Sachin Mittal <sjmittal@gmail.com>
Reviewers: Matthias J. Sax, Damian Guy, Guozhang Wang
Closes#2642 from sjmittal/trunk
I considered catching errors to add further information about naming internal state stores. However, Topic.validate() will throw an error that prints the offending name, so I decided not to add too much complexity.
Author: Nikki Thean <nthean@etsy.com>
Reviewers: Matthias J. Sax, Guozhang Wang, Eno Thereska, Damian Guy, Ismael Juma
Closes#2331 from nixsticks/internal-topics
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Armin Braun, Damian Guy, Jason Gustafson
Closes#2682 from guozhangwang/K4859-cache-commit-interval
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Eno Thereska, Damian Guy, Jason Gustafson
Closes#2693 from guozhangwang/K4885-system-test-unexpected-exception-handler
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#2685 from guozhangwang/KMinor-improve-log4j
Make sure that the iterator returned from `WindowStore.fetch(..)` only returns matching keys, rather than all keys that are a prefix match.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#2662 from dguy/kafka-4863
This commmit brings improved test coverage for window store fetch method
and WindowStoreIterator
Author: Andrey Dyachkov <andrey.dyachkov@zalando.de>
Reviewers: Damian Guy, Guozhang Wang
Closes#2672 from adyach/trunk
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Guozhang Wang
Closes#2663 from enothereska/minor-rocksdb-parallel
Remove generic type of class ClientState and generic T inTaskAssignor.
Author: sharad.develop <sharad.develop@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2616 from sharad-develop/KAFKA-4738
Add application.id to StreamThread name
Author: sharad.develop <sharad.develop@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2617 from sharad-develop/KAFKA-4722
restrict the locating of segments in `Segments#segments(..)` to only the segments that are currently available, i.e., rather than searching the hashmap for many segments that don't exist.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2645 from dguy/session-windows-testing