Write txn markers and complete the commit/abort for transactions in PrepareXX
state during partition immigration.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2926 from dguy/kafka-5059
KAFKA-5028 moves the controller to a single-threaded model, so we would no longer have work interleaved between preferred replica leader election, meaning we don't need to keep its state.
This patch additionally addresses a bug from KAFKA-5028 where it made onPreferredReplicaElection keep the line calling topicDeletionManager.markTopicIneligibleForDeletion but removes the line calling topicDeletionManager.resumeDeletionForTopics
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2927 from onurkaraman/KAFKA-5107
Moving the coordinatorEpoch from WriteTxnMarkerRequest to TxnMarkerEntry will generate fewer broker send requests
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma, Guozhang Wang
Closes#2925 from dguy/tc-write-txn-request-follow-up
remove broken locking. fix handleAddPartitions after complete commit/abort
respond with CONCURRENT_TRANSACTIONS in initPid
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2934 from dguy/follow-up-tc-work
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Dan Norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2472 from cmccabe/KAFKA-3265
ReassignPartitionsCommand should protect against empty replica list assignment.
Author: amethystic <huxi_2b@hotmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2904 from amethystic/kafka-5901_ReassignPartitionsCommand_protect_against_empty_replica_list
As per KIP-82
Adding record headers api to ProducerRecord, ConsumerRecord
Support to convert from protocol to api added Kafka Producer, Kafka Fetcher (Consumer)
Updated MirrorMaker, ConsoleConsumer and scala BaseConsumer
Add RecordHeaders and RecordHeader implementation of the interfaces Headers and Header
Some bits using are reverted to being Java 7 compatible, for the moment until KIP-118 is implemented.
Author: Michael Andre Pearce <Michael.Andre.Pearce@me.com>
Reviewers: Radai Rosenblatt <radai.rosenblatt@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2772 from michaelandrepearce/KIP-82
Depending on the test execution order, the global registry would
contain some metrics causing the test to fail.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>
Closes#2915 from ijuma/socket-server-test-metric-collection-after-shutdown
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2840 from apurvam/exactly-once-transactional-clients
The goal of this ticket is to improve controller maintainability by simplifying the controller's concurrency semantics. The controller code has a lot of shared state between several threads using several concurrency primitives. This makes the code hard to reason about.
This ticket proposes we convert the controller to a single-threaded event queue model. We add a new controller thread which processes events held in an event queue. Note that this does not mean we get rid of all threads used by the controller. We merely delegate all work that interacts with controller local state to this single thread. With only a single thread accessing and modifying the controller local state, we no longer need to worry about concurrent access, which means we can get rid of the various concurrency primitives used throughout the controller.
Performance is expected to match existing behavior since the bulk of the existing controller work today already happens sequentially in the ZkClient’s single ZkEventThread.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2816 from onurkaraman/KAFKA-5028
Author: Damian Guy <damian.guy@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang, Jason Gustafson, Apurva Mehta, Jun Rao
Closes#2849 from dguy/exactly-once-tc
The `common` package is public and this class is
internal.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2759 from ijuma/move-os-to-utils
fix some spelling errors
Author: xinlihua <xin.lihua1@zte.com.cn>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2871 from auroraxlh/fix_spellingerror
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2866 from hachikuji/improve-snapshot-management
junrao added a config `--print.metrics` to control whether ProducerPerformance prints out metrics at the end of the test. If its okay, will add the code counterpart for consumer.
Author: huxi <huxi@zhenrongbao.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2860 from amethystic/kafka-5068_print_metrics_in_perf_tests
Test the various controller protocols by observing zookeeper and broker state.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2853 from onurkaraman/KAFKA-5069
This PR covers point (2) and point (5) from KAFKA-5036:
**Commit 1:**
2. Currently, we update the leader epoch in epochCache after log append in the follower but before log append in the leader. It would be more consistent to always do this after log append. This also avoids issues related to failure in log append.
5. The constructor of LeaderEpochFileCache has the following:
lock synchronized { ListBuffer(checkpoint.read(): _*) }
But everywhere else uses a read or write lock. We should use consistent locking.
This is a refactor to the way epochs are cached, replacing the code to cache the latest epoch in the LeaderEpochFileCache by reusing the cached value in Partition. There is no functional change.
**Commit 2:**
Adds an assert(epoch >=0) as epochs are written. Refactors tests so they never hit this assert.
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2831 from benstopford/KAFKA-5036-part2-second-try
Author: Dong Lin <lindong28@gmail.com>
Author: Dong Lin <lindong28@users.noreply.github.com>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>
Closes#2859 from lindong28/KAFKA-5075
Also include a few code readability improvements.
Author: jozi-k <jozef.koval@protonmail.ch>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2731 from jozi-k/immutable_LeaderAndIsr
This property is mentioned in the quickstart.
Author: huxi <huxi@zhenrongbao.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2661 from amethystic/kafka4866_consoleconsumer_ignore_printvalue
Also:
1. FindCoordinator is more general and takes a coordinator_type
so that it can be used for the group and transaction coordinators.
2. Include an error message in FindCoordinatorResponse to make the
errors at the client side more informative. We have just added the
field to the protocol in this PR, a subsequent PR will update the
code to use it.
3. Rename `Errors` names for FindCoordinator to be more generic. This
is a compatible change as the ids remain the same.
4. Since the exception classes for the error codes are in a public
package, we introduce new ones and deprecate the old ones.
The classes were not thrown back to the user (KAFKA-5052 aside),
so this is a compatible change.
5. Update InitPidRequest for transactions. Since this protocol API
was introduced recently and is not used by default, we did not bump
its version.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2825 from apurvam/exactly-once-rpc-stubs
In 67fc2a91a6, we are using an empty collection and comparing via
value equality, so if a user passes an empty collection, they will
get the default ACLs instead of no ACLs. We fix that issue here.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram
Closes#2829 from ijuma/zk-utils-default-acls-improvement and squashes the following commits:
0846172 [Ismael Juma] Add missing import
2dc84f3 [Ismael Juma] Simplify logic in `sensitivePath`
8122f27 [Ismael Juma] Use a true sentinel instead of an empty collection for `UseDefaultAcls`
Also add test and refactor things a little to make testing easier.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Ben Stopford <benstopford@gmail.com>, Jun Rao <junrao@gmail.com>
Closes#2822 from ijuma/hotfix-checkpoint-file
- Consistent validation across different code paths in LogValidator
- Validate baseOffset for message format V2
- Flesh out LogValidatorTest to check producerId, baseSequence, producerEpoch and partitionLeaderEpoch.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2802 from ijuma/validate-base-offset
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jozef Koval <jozef.koval@protonmail.ch>, Ismael Juma <ismael@juma.me.uk>
Closes#2687 from cmccabe/KAFKA-4899
Author: Matthias J. Sax <matthias@confluent.io>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2799 from mjsax/kafka-4990-add-api-stub-config-parameters-request-types
This PR replaces https://github.com/apache/kafka/pull/2743 (just raising from Confluent repo)
This PR describes the addition of Partition Level Leader Epochs to messages in Kafka as a mechanism for fixing some known issues in the replication protocol. Full details can be found here:
[KIP-101 Reference](https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation)
*The key elements are*:
- Epochs are stamped on messages as they enter the leader.
- Epochs are tracked in both leader and follower in a new checkpoint file.
- A new API allows followers to retrieve the leader's latest offset for a particular epoch.
- The logic for truncating the log, when a replica becomes a follower, has been moved from Partition into the ReplicaFetcherThread
- When partitions are added to the ReplicaFetcherThread they are added in an initialising state. Initialising partitions request leader epochs and then truncate their logs appropriately.
This test provides a good overview of the workflow `EpochDrivenReplicationProtocolAcceptanceTest.shouldFollowLeaderEpochBasicWorkflow()`
The corrupted log use case is covered by the test
`EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards()`
Remaining work: There is a do list here: https://docs.google.com/document/d/1edmMo70MfHEZH9x38OQfTWsHr7UGTvg-NOxeFhOeRew/edit?usp=sharing
Author: Ben Stopford <benstopford@gmail.com>
Author: Jun Rao <junrao@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2808 from benstopford/kip-101-v2
Though MirrorMaker uses the `producer.type` value of the
producer properties, ProducerConfig show the warning:
`The configuration 'producer.type' was supplied but
isn't a known config.`
Author: Shun Takebayashi <shun@takebayashi.asia>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2676 from takebayashi/suppress-mirrormaker-warning
Of particular importance are compression buffers (64 KB for LZ4, for example).
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2796 from apurvam/idempotent-producer-close-data-stream
This is from the KIP-98 proposal.
The main points of discussion surround the correctness logic, particularly the Log class where incoming entries are validated and duplicates are dropped, and also the producer error handling to ensure that the semantics are sound from the users point of view.
There is some subtlety in the idempotent producer semantics. This patch only guarantees idempotent production upto the point where an error has to be returned to the user. Once we hit a such a non-recoverable error, we can no longer guarantee message ordering nor idempotence without additional logic at the application level.
In particular, if an application wants guaranteed message order without duplicates, then it needs to do the following in the error callback:
1. Close the producer so that no queued batches are sent. This is important for guaranteeing ordering.
2. Read the tail of the log to inspect the last message committed. This is important for avoiding duplicates.
Author: Apurva Mehta <apurva@confluent.io>
Author: hachikuji <jason@confluent.io>
Author: Apurva Mehta <apurva.1618@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: fpj <fpj@apache.org>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2735 from apurvam/exactly-once-idempotent-producer
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2760 from lindong28/KAFKA-4973
This brought down a cluster by causing continuous controller moves.
ZkClient's ZkEventThread and a RequestSendThread can concurrently use objects that aren't thread-safe:
* Selector
* NetworkClient
* SSLEngine (this was the big one for us. We turn on SSL for interbroker communication).
As per the "Concurrency Notes" section from https://docs.oracle.com/javase/7/docs/api/javax/net/ssl/SSLEngine.html:
> two threads must not attempt to call the same method (either wrap() or unwrap()) concurrently
SSLEngine.wrap gets called in:
* SslTransportLayer.write
* SslTransportLayer.handshake
* SslTransportLayer.close
It turns out that the ZkEventThread and RequestSendThread can concurrently call SSLEngine.wrap:
* ZkEventThread calls SslTransportLayer.close from ControllerChannelManager.removeExistingBroker
* RequestSendThread can call SslTransportLayer.write or SslTransportLayer.handshake from NetworkClient.poll
Suppose the controller moves for whatever reason. The former controller could have had a RequestSendThread who
was in the middle of sending out messages to the cluster while the ZkEventThread began executing
KafkaController.onControllerResignation, which calls ControllerChannelManager.shutdown, which sequentially cleans
up the controller-to-broker queue and connection for every broker in the cluster. This cleanup includes the call
to ControllerChannelManager.removeExistingBroker as mentioned earlier, causing the concurrent call to SSLEngine.wrap.
This concurrent call throws a BufferOverflowException which ControllerChannelManager.removeExistingBroker catches so
the ControllerChannelManager.shutdown moves onto cleaning up the next controller-to-broker queue and connection,
skipping the cleanup steps such as clearing the queue, stopping the RequestSendThread, and removing the entry from its
brokerStateInfo map.
By failing out of the Selector.close, the sensors corresponding to the broker connection has not been cleaned up. Any
later attempt at initializing an identical Selector will result in a sensor collision and therefore cause Selector
initialization to throw an exception. In other words, any later attempts by this broker to become controller again
will fail on initialization. When controller initialization fails, the controller deletes the /controller znode and
lets another broker take over.
Now suppose the controller moves enough times such that every broker hits the BufferOverflowException concurrency
issue. We're now guaranteed to fail controller initialization due to the sensor collision on every controller
transition, so the controller will move across brokers continuously.
This patch avoids the concurrent use of non-threadsafe classes in ControllerChannelManager.removeExistingBroker
by shutting down the RequestSendThread before closing the NetworkClient.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Joel Koshy <jjkoshy.w@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2746 from onurkaraman/KAFKA-4959