The goal of this ticket is to improve controller maintainability by simplifying the controller's concurrency semantics. The controller code has a lot of shared state between several threads using several concurrency primitives. This makes the code hard to reason about.
This ticket proposes we convert the controller to a single-threaded event queue model. We add a new controller thread which processes events held in an event queue. Note that this does not mean we get rid of all threads used by the controller. We merely delegate all work that interacts with controller local state to this single thread. With only a single thread accessing and modifying the controller local state, we no longer need to worry about concurrent access, which means we can get rid of the various concurrency primitives used throughout the controller.
Performance is expected to match existing behavior since the bulk of the existing controller work today already happens sequentially in the ZkClient’s single ZkEventThread.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2816 from onurkaraman/KAFKA-5028
Author: Damian Guy <damian.guy@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang, Jason Gustafson, Apurva Mehta, Jun Rao
Closes#2849 from dguy/exactly-once-tc
Fixes `org.apache.kafka.streams.integration.utils.IntegrationTestUtils#readKeyValues` potentially starting to `poll` for stream output after the stream finished sending the test data and hence missing it when working with `latest` offsets.
Author: Armin Braun <me@obrown.io>
Reviewers: Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2921 from original-brownbear/KAFKA-5124
Refactors Task with proper interface methods `init()`, `resume()`, `commit()`, `suspend()`, and `close()`. All other methods for task handling are internal now. This allows to simplify `StreamThread` code, avoid code duplication and allows for easier reasoning of control flow.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma, Damian Guy, Eno Thereska, Guozhang Wang
Closes#2895 from mjsax/kafka-5111-cleanup-task-code
Author: Sean McCauliff <smccauliff@linkedin.com>
Reviewers: Dong Lin <lindong28@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jiangjie Qin <becket.qin@gmail.com>
Closes#2659 from smccauliff/kafka-4840
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2745 from hachikuji/add-replication-testcase-for-compression
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2897 from mjsax/KAFKA-4564-follow-up
The `common` package is public and this class is
internal.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2759 from ijuma/move-os-to-utils
- call close() on Metrics to join created threads
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska, Damian Guy, Guozhang Wang
Closes#2788 from mjsax/minor-improve-test-metric-cleanup
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma, Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2837 from mjsax/kafka-4564-fail-fast-test-stream-compatibility
If `partition==null` and `partitioner!=null` we should not fall back to default partitioner (as we do before the patch if `producer.partitionsFor(...)` returns empty list. Falling back to default partitioner might corrupt hash partitioning.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska, Damian Guy, Guozhang Wang
Closes#2868 from mjsax/minor-fix-RecordCollector
- typo error corrected (spelling)
Author: Kamal C <kamal.chandraprakash@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2885 from Kamal15/log
fix some spelling errors
Author: xinlihua <xin.lihua1@zte.com.cn>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2871 from auroraxlh/fix_spellingerror
Skip null keys when initializing GlobalKTables. This is inline with what happens during normal processing.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Michael G. Noll, Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2834 from dguy/kafka-5047
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Eno Thereska, Guozhang Wang
Closes#2789 from mjsax/minor-improve-integration-test
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2813 from ijuma/kafka-5014-least-loaded-node-should-check-if-channel-is-ready
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Magnus Edenhill, Eno Thereska, Damian Guy, Guozhang Wang
Closes#2836 from mjsax/minor-broker-comp-test
change `consumer.position` so that it always updates any partitions that need an update. Keep track of partitions that `seekToBeginning` in `StoreChangeLogReader` and do the `consumer.position` call after all `seekToBeginning` calls.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang, Jason Gustafson, Ismael Juma
Closes#2769 from dguy/kafka-4937
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2866 from hachikuji/improve-snapshot-management
Worth special mention:
1. Update Scala to 2.11.11 and 2.12.2
2. Update Gradle to 3.5
3. Update ZooKeeper to 3.4.10
4. Update reflections to 0.9.11, which:
* Switches to jsr305 annotations with a provided scope
* Updates Guava from 18 to 20
* Updates javaassist from 3.18 to 3.21
There’s a separate PR for updating RocksDb, so
I didn’t include that here.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2872 from ijuma/update-deps-for-0.11
junrao added a config `--print.metrics` to control whether ProducerPerformance prints out metrics at the end of the test. If its okay, will add the code counterpart for consumer.
Author: huxi <huxi@zhenrongbao.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2860 from amethystic/kafka-5068_print_metrics_in_perf_tests
Test the various controller protocols by observing zookeeper and broker state.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2853 from onurkaraman/KAFKA-5069
This PR covers point (2) and point (5) from KAFKA-5036:
**Commit 1:**
2. Currently, we update the leader epoch in epochCache after log append in the follower but before log append in the leader. It would be more consistent to always do this after log append. This also avoids issues related to failure in log append.
5. The constructor of LeaderEpochFileCache has the following:
lock synchronized { ListBuffer(checkpoint.read(): _*) }
But everywhere else uses a read or write lock. We should use consistent locking.
This is a refactor to the way epochs are cached, replacing the code to cache the latest epoch in the LeaderEpochFileCache by reusing the cached value in Partition. There is no functional change.
**Commit 2:**
Adds an assert(epoch >=0) as epochs are written. Refactors tests so they never hit this assert.
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2831 from benstopford/KAFKA-5036-part2-second-try
Author: Dong Lin <lindong28@gmail.com>
Author: Dong Lin <lindong28@users.noreply.github.com>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>
Closes#2859 from lindong28/KAFKA-5075
Enable producer per task if exactly-once config is enabled.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2773 from mjsax/exactly-once-streams-producer-per-task
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2848 from enothereska/KAFKA-5038-trunk
Set the internal consumer config internal.leave.group.on.close in
`StreamsConfig`. This is to reduce the number of rebalances we get
during bounces.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2750 from dguy/kafka-4965
Introduced in PR #2824. Already fixed in the
website github.
Author: Gwen Shapira <cshapi@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2844 from gwenshap/docs-hotfix
(cherry picked from commit 5f728532ac)
Signed-off-by: Ismael Juma <ismael@juma.me.uk>
Also include a few code readability improvements.
Author: jozi-k <jozef.koval@protonmail.ch>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2731 from jozi-k/immutable_LeaderAndIsr
This property is mentioned in the quickstart.
Author: huxi <huxi@zhenrongbao.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2661 from amethystic/kafka4866_consoleconsumer_ignore_printvalue
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2751 from miguno/trunk-streams-window-iterator-doc-fixes
Also:
1. FindCoordinator is more general and takes a coordinator_type
so that it can be used for the group and transaction coordinators.
2. Include an error message in FindCoordinatorResponse to make the
errors at the client side more informative. We have just added the
field to the protocol in this PR, a subsequent PR will update the
code to use it.
3. Rename `Errors` names for FindCoordinator to be more generic. This
is a compatible change as the ids remain the same.
4. Since the exception classes for the error codes are in a public
package, we introduce new ones and deprecate the old ones.
The classes were not thrown back to the user (KAFKA-5052 aside),
so this is a compatible change.
5. Update InitPidRequest for transactions. Since this protocol API
was introduced recently and is not used by default, we did not bump
its version.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2825 from apurvam/exactly-once-rpc-stubs