Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Sharon Liu <sharonliu.cup@gmail.com>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#3787 from mjsax/kafka-5823-kip120-docs
When we install kafka on path with spaces, batch files were failing, this PR is trying to fix this issue.
Author: Vladimír Kleštinec <klestinec@gmail.com>
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Jason Gustafson <jason@confluent.io>
Closes#2649 from klesta490/trunk
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3719 from mjsax/kafka-5603-dont-abort-tx-for-zombie-tasks-2
Simple implementation of the feature : [KAFKA-4819](https://issues.apache.org/jira/browse/KAFKA-4819)
KAFKA-4819
This PR adds a new method `threadStates` to public API of `KafkaStreams` which returns all currently states of running threads and active tasks.
Below is a example for a simple topology consuming from topics; test-p2 and test-p4.
[{"name":"StreamThread-1","state":"RUNNING","activeTasks":[{"id":"0_0", "assignments":["test-p4-0","test-p2-0"], "consumedOffsetsByPartition":[{"topicPartition":"test-p2-0","offset":"test-p2-0"}]}, {"id":"0_2", "assignments":["test-p4-2"], "consumedOffsetsByPartition":[]}]}, {"name":"StreamThread-2","state":"RUNNING","activeTasks":[{"id":"0_1", "assignments":["test-p4-1","test-p2-1"], "consumedOffsetsByPartition":[{"topicPartition":"test-p2-1","offset":"test-p2-1"}]}, {"id":"0_3", "assignments":["test-p4-3"], "consumedOffsetsByPartition":[]}]}]
Author: Florian Hussonnois <florian.hussonnois@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#2612 from fhussonnois/KAFKA-4819
Added a simple table of contents for the developer section.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3760 from enothereska/minor-docs-toc
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>, Colin P. Mccabe <cmccabe@confluent.io>
Closes#3621 from lindong28/KAFKA-5694
If a request for a broker configuration failed due to a timeout or
the broker not being available, we would fail the futures
associated with the non-broker request instead (and never fail the
broker future, which would be left uncompleted forever).
We would also do an unnecessary request if only broker configs were
requested.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3585 from cmccabe/KAFKA-5659
- need to check that state is CRATED at startup
- some minor test cleanup
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3775 from mjsax/kafka-5818-kafkaStreams-state-transition
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jiangjie Qin<becket.qin@gmail.com>
Closes#3709 from lindong28/KAFKA-5759
All current implementations process records using the same timestamp. This makes it difficult to test operations that require time windows, like `KStream-KStream joins`.
This change would allow tests to simulate records created at different times, thus making it possible to test operations like the above mentioned joins.
Author: Sebastian Gavril <sgavril@wehkamp.nl>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3753 from sebigavril/allow-timestamps-in-test-driver
The previous timeout was 10 seconds, but system test failures have occurred when Zookeeper has started after about 11 seconds. Increasing the timeout to 30 seconds, since most of the time this extra time will not be required, and when it is it will prevent a failed system test.
In addition to merging to `trunk`, please backport to the `0.11.x` and `0.10.2.x` branches.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#3774 from rhauch/MINOR-Increase-timeout-of-zookeeper-service-in-system-tests
In the coordinator, we should check that 'shutdown' is not true before going to sleep waiting for the condition.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#3755 from cmccabe/KAFKA-5806
1. Core concepts (added the stream time definition), upgrade guide and developer guide.
2. Related Java docs changes.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3732 from guozhangwang/KMinor-kip138-docs
1. Remove timeout-based validatePartitionExists from StoreChangelogReader; instead only try to refresh metadata once after all tasks have been created and their topology initialized (hence all stores have been registered).
2. Add the logic to refresh partition metadata at the end of initialization if some restorers needing initialization cannot find their changelogs, hoping that in the next run loop these stores can find their changelogs.
As a result, after `initialize` is called we may not be able to start initializing all the `needsInitializing` ones.
As an optimization, we would not call `consumer#partitionsFor` any more, but only `consumer#listTopics` fetching all the topic metadata; so the only blocking calls left are `listTopics` and `endOffsets`, and we always capture timeout exceptions around these two calls, and delay to retry in the next run loop after refreshing the metadata. By doing this we can also reduce the number of request round trips between consumer and brokers.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Closes#3748 from guozhangwang/K5797-handle-metadata-available
`ChangeLoggingWindowBytesStore` needs to have the same `retainDuplicates` functionality as `RocksDBWindowStore` else data could be lost upon failover/restoration.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3754 from dguy/hotfix-changelog-window-store
- client id is part of the log context, so removed ad-hoc usages
- Fixed an issue where the response was not printed correctly,
use `toString(version)` instead of `toString()`
- Capitalized all log statements for consistency
- Fixed a number of double spaces after period
Author: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3741 from Kamal15/kafka-5762
Author: Tommy Becker <tobecker@tivo.com>
Author: Tommy Becker <twbecker@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3455 from twbecker/kafka-5379
1. StreamThread: prevent `PARTITIONS_REVOKED` to transit to itself in `setState` by returning false. And only execute the task suspension logic when `setState(PARTITIONS_REVOKED)` returns true in `onPartitionsRevoked`.
2. StreamThread: minor, renaming `shutdown` to `completeShutdown`, and `close` to `shutdown`, `stillRunning` to `isRunning`, `isInitialized` to `isRunningAndNotRebalancing`.
3. GlobalStreamThread: tighten the transition a bit in `setState`. Force transiting to `PENDING_SHUTDOWN` and `DEAD` when initialization failed.
4. GlobalStreamThread: minor, add logPrefix to StateConsumer. Also removing its state change listener when closing the thread.
5. KafkaStreams: because of 1) above we can now prevent its `REBALANCING` to `REBALANCING`.
6. KafkaStreams: prevent `CREATED` to ever go to `REBALANCING` first to force it transit to `RUNNING` when starting. Also prevent `CREATED` to go to `ERROR`.
7. KafkaStreams: collapse `validateStartOnce` and `checkFirstTimeClosing ` into `setState`.
8. KafkaStreams: in `close` and `start`, only execute the logic when `setState` succeeds.
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Closes#3713 from guozhangwang/KMinor-set-state
* When a call is aborted, that should count as a "try" in the failure log message.
* FailureInjectingTimeoutProcessorFactory should fail the first request it is asked about.
* testCallTimeouts should expect the first request it makes to fail because of the timeout we injected.
* FailureInjectingTimeoutProcessorFactory should track how many failures it has injected, and the test should verify that one has been injected.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3731 from cmccabe/KAFKA-5720
Remove the previous TODO to remove the clean shutdown file with some of the discussion from https://github.com/apache/kafka/pull/2104.
Author: Holden Karau <holden@us.ibm.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3742 from holdenk/KAFKA-4380-document-clean-shutdown-file
…h-benchmarks/jmh.sh
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Closes#2654 from bbejeck/KAFKA-3989_follow_up
Expose the ClassCastException as the cause for the SerializationException.
Author: Jeremy Custenborder <jcustenborder@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3556 from jcustenborder/KAFKA-5620
If a task fails during initialization due to a LockException, its changelog partitions are not immediately added to the StoreChangelogReader as the thread doesn't hold the lock. However StoreChangelogReader#restore will be called and it sets the initialized flag. On a subsequent successfull call to initialize the new tasks the partitions are added to the StoreChangelogReader, however as it is already initialized these new partitions will never be restored. So the task would remain in a non-running state forever.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3736 from dguy/kafka-5787
Make MeteredSessionStore the outermost store.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3729 from dguy/kafka-5749
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3699 from cmccabe/trogdor-review
With LogContext, each producer log item is automatically prefixed with client id and transactional id.
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3703 from huxihx/KAFKA-5755
This patch contains a few small improvements to make request/response handling more consistent. Primarily it consolidates request/response serialization logic so that `SaslServerAuthenticator` and `KafkaApis` follow the same path. It also reduces the amount of custom logic needed to handle unsupported versions of the ApiVersions requests.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3673 from hachikuji/consolidate-response-handling
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3715 from cmccabe/kafka_service_print_node_hostname_on_failure
Two requests sent together may not always trigger a staged receive since the requests may not be received in a single poll and the channel is muted when receives are complete. Hence attempt to stage multiple times until a receive is staged to make the test more stable.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>
Closes#3712 from rajinisivaram/MINOR-connectionidreuse-test
This patch improves documentation on the handling of errors for the idempotent/transactional producer. It also fixes a couple minor inconsistencies and improves test coverage. In particular:
- UnsupportedForMessageFormat should be a fatal error for TxnOffsetCommit responses
- UnsupportedVersion should be fatal for Produce responses and should be returned instead of InvalidRequest
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3716 from hachikuji/KAFKA-5342
1. Add upgrade section for 1.0.0, including Streams API changes section.
2. Add metrics name changes section.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3687 from guozhangwang/KMinor-metrics-upgrade-guide
Needs to come after https://github.com/apache/kafka/pull/3701
Originally reviewed as part of #3490.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3711 from enothereska/minor-docs-stateless
We are occasionally hitting some timeouts due to processing not finishing. So rather than failing the build for these reasons it would be better to reduce the runtime.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3725 from dguy/fix-system-test
If prepare throws an exception in the same poll when the connection
is established, the channel id should be in `disconnected`, but
not in `connected`.
Author: dongeforever <dongeforever@apache.org>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3282 from dongeforever/KAFKA-5417
In onPartitionsAssigned:
release all locks for non-assigned suspended tasks.
resume any suspended tasks.
Create new tasks, but don't attempt to take the state lock.
Pause partitions for any new tasks.
set the state to PARTITIONS_ASSIGNED
In StreamThread#runLoop
poll
if state is PARTITIONS_ASSIGNED
2.1 attempt to initialize any new tasks, i.e, take out the state locks and init state stores
2.2 restore some data for changelogs, i.e., poll once on the restore consumer and return the partitions that have been fully restored
2.3 update tasks with restored partitions and move any that have completed restoration to running
2.4 resume consumption for any tasks where all partitions have been restored.
2.5 if all active tasks are running, transition to RUNNING and assign standby partitions to the restoreConsumer.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3675 from dguy/kafka-5152
Cherrypicking additional changes made as part of this PR https://github.com/apache/kafka/pull/3622, back to trunk.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3691 from enothereska/hotfix-deadlock-state