This patch adds support for the `TxnOffsetCommitRequest` added in KIP-98. Desired handling for this request is [described here](https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#bookmark=id.55yzhvkppi6m) .
The functionality includes handling the stable state of receiving `TxnOffsetCommitRequests` and materializing results only when the commit marker for the transaction is received. It also handles partition emigration and immigration and rebuilds the required data structures on these events.
Tests are included for all the functionality.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2970 from apurvam/KAFKA-5160-broker-side-support-for-txnoffsetcommitrequest
1. Collapsed the `ownedPartitions`, `pendingTxnMap` and the `transactionMetadataCache` into a single in-memory structure, which is a two-layered map: first keyed by the transactionTxnLog, and then valued with the current coordinatorEpoch of that map plus another map keyed by the transactional id.
2. Use `transactionalId` across the modules in transactional coordinator, attach this id with the transactional marker entries.
3. Use two keys: `transactionalId` and `txnLogPartitionId` in the writeMarkerPurgatory as well as passing it along with the TxnMarkerEntry, so that `TransactionMarkerRequestCompletionHandler` can use it to access the two-layered map upon getting responses.
4. Use one queue per `broker-id` and `txnLogPartitionId`. Also when there is a possible update on the end point associated with the `broker-id`, update the Node without clearing the queue but relying on the requests to retry in the next round.
5. Centralize the error handling callback for appending-metadata-to-log and sending-markers-to-brokers in `TransactionStateManager#appendTransactionToLog`, and `TransactionMarkerChannelManager#addTxnMarkersToSend`.
6. Always update the in-memory transaction metadata AFTER the txn log has been appended and replicated, and then double check on the cache to make sure nothing has changed since log appending. The only exception is when initializing the pid for the first time, in which we will put a dummy into the cache but set its pendingState as `Empty` (so it will be valid to transit from `Empty` to `Empty`) so that it can be updated after the log append has completed.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Ismael Juma, Damian Guy, Jason Gustafson, Jun Rao
Closes#2964 from guozhangwang/K5130-refactor-tc-inmemory-cache
Also add tests and a few clean-ups.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#2937 from ijuma/metrics-recording-level-producer
Suppose there are two valid records followed by one invalid records in the FetchResponse.PartitionData(). As of current implementation, PartitionRecords.fetchRecords(...) will throw exception without returning the two valid records. The next call to PartitionRecords.fetchRecords(...) will not return that two valid records either because the iterator has already moved across them.
We can fix this problem by defering exception to the next call of PartitionRecords.fetchRecords(...) if iterator has already moved across any valid record.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Jiangjie Qin <becket.qin@gmail.com>
Closes#2864 from lindong28/KAFKA-5078
A JoinGroupRequest V0 built with the Builder had
a rebalance timeout = -1 rather than equal to session timeout
as it would have been if coming from the wire and deserialized
from a V0 Struct
fix developed with mimaison
Author: Edoardo Comar <ecomar@uk.ibm.com>
Reviewers: Rajini Sivaram
Closes#2936 from edoardocomar/MINOR-JoinGroupRequestV0
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3015 from apurvam/KAFKA-5213-illegalstateexception-in-ensureOpenForAppend
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Guozhang Wang
Closes#2951 from mjsax/kafka-5126-add-transactions-to-mock-producer
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: dan norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2977 from cmccabe/KAFKA-5176
This only removes deprecated methods,
fields and constructors in a small number of classes.
Deprecated producer configs is tracked via KAFKA-3353
and the old clients and related (tools, etc.) won't
be removed in 0.11.0.0.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2995 from ijuma/kafka-3763-remove-deprecated-0.11
These configs have been deprecated since 0.9.0.0:
block.on.buffer.full, metadata.fetch.timeout.ms and timeout.ms
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2987 from ijuma/kafka-3353-remove-deprecated-producer-configs
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2910 from hachikuji/eos-txn-index
Add new broker config, `group.initial.rebalance.delay.ms`, with a default of 3 seconds.
When a consumer creates a new group, set the group's state to InitialRebalance and delay the rebalance until `min(group.initial.rebalance.delay.ms, rebalanceTimeout)`. As other members join the group further delay the rebalance by `min(group.initial.rebalance.delay.ms, remainingRebalanceTimeout)`. Once `rebalanceTimeout` is hit or no new members join the group within the delay, complete the rebalance.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ewen Cheslack-Postava, Guozhang Wang
Closes#2758 from dguy/kafka-4925
If we pass in 0 futures to an AllOfAdapter, we should complete immediately
Author: dan norwood <norwood@confluent.io>
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2953 from norwood/handle-all-of-0
Moving the coordinatorEpoch from WriteTxnMarkerRequest to TxnMarkerEntry will generate fewer broker send requests
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma, Guozhang Wang
Closes#2925 from dguy/tc-write-txn-request-follow-up
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2933 from hachikuji/minor-transactional-client-cleanup
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Dan Norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2472 from cmccabe/KAFKA-3265
As per KIP-82
Adding record headers api to ProducerRecord, ConsumerRecord
Support to convert from protocol to api added Kafka Producer, Kafka Fetcher (Consumer)
Updated MirrorMaker, ConsoleConsumer and scala BaseConsumer
Add RecordHeaders and RecordHeader implementation of the interfaces Headers and Header
Some bits using are reverted to being Java 7 compatible, for the moment until KIP-118 is implemented.
Author: Michael Andre Pearce <Michael.Andre.Pearce@me.com>
Reviewers: Radai Rosenblatt <radai.rosenblatt@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2772 from michaelandrepearce/KIP-82
I verified that the test would trigger an `IllegalStateException` if the
`position` call was added back.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Eno Thereska <eno@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#2887 from ijuma/kafka-5097-unit-test
- addressing open Github comments from #2773
- test clean-up
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Guozhang Wang
Closes#2854 from mjsax/kafka-4986-producer-per-task-follow-up
Update topic expiry time after every metadata update to handle max.block.ms greater than 5 minutes
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>
Closes#2869 from lindong28/KAFKA-5086
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2840 from apurvam/exactly-once-transactional-clients
Author: Damian Guy <damian.guy@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang, Jason Gustafson, Apurva Mehta, Jun Rao
Closes#2849 from dguy/exactly-once-tc
Author: Sean McCauliff <smccauliff@linkedin.com>
Reviewers: Dong Lin <lindong28@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jiangjie Qin <becket.qin@gmail.com>
Closes#2659 from smccauliff/kafka-4840
The `common` package is public and this class is
internal.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2759 from ijuma/move-os-to-utils
- typo error corrected (spelling)
Author: Kamal C <kamal.chandraprakash@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2885 from Kamal15/log
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2813 from ijuma/kafka-5014-least-loaded-node-should-check-if-channel-is-ready
change `consumer.position` so that it always updates any partitions that need an update. Keep track of partitions that `seekToBeginning` in `StoreChangeLogReader` and do the `consumer.position` call after all `seekToBeginning` calls.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang, Jason Gustafson, Ismael Juma
Closes#2769 from dguy/kafka-4937
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2866 from hachikuji/improve-snapshot-management
This PR covers point (2) and point (5) from KAFKA-5036:
**Commit 1:**
2. Currently, we update the leader epoch in epochCache after log append in the follower but before log append in the leader. It would be more consistent to always do this after log append. This also avoids issues related to failure in log append.
5. The constructor of LeaderEpochFileCache has the following:
lock synchronized { ListBuffer(checkpoint.read(): _*) }
But everywhere else uses a read or write lock. We should use consistent locking.
This is a refactor to the way epochs are cached, replacing the code to cache the latest epoch in the LeaderEpochFileCache by reusing the cached value in Partition. There is no functional change.
**Commit 2:**
Adds an assert(epoch >=0) as epochs are written. Refactors tests so they never hit this assert.
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2831 from benstopford/KAFKA-5036-part2-second-try
Author: Dong Lin <lindong28@gmail.com>
Author: Dong Lin <lindong28@users.noreply.github.com>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>
Closes#2859 from lindong28/KAFKA-5075
Also:
1. FindCoordinator is more general and takes a coordinator_type
so that it can be used for the group and transaction coordinators.
2. Include an error message in FindCoordinatorResponse to make the
errors at the client side more informative. We have just added the
field to the protocol in this PR, a subsequent PR will update the
code to use it.
3. Rename `Errors` names for FindCoordinator to be more generic. This
is a compatible change as the ids remain the same.
4. Since the exception classes for the error codes are in a public
package, we introduce new ones and deprecate the old ones.
The classes were not thrown back to the user (KAFKA-5052 aside),
so this is a compatible change.
5. Update InitPidRequest for transactions. Since this protocol API
was introduced recently and is not used by default, we did not bump
its version.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2825 from apurvam/exactly-once-rpc-stubs
- Consistent validation across different code paths in LogValidator
- Validate baseOffset for message format V2
- Flesh out LogValidatorTest to check producerId, baseSequence, producerEpoch and partitionLeaderEpoch.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2802 from ijuma/validate-base-offset
Author: Matthias J. Sax <matthias@confluent.io>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2799 from mjsax/kafka-4990-add-api-stub-config-parameters-request-types
This PR replaces https://github.com/apache/kafka/pull/2743 (just raising from Confluent repo)
This PR describes the addition of Partition Level Leader Epochs to messages in Kafka as a mechanism for fixing some known issues in the replication protocol. Full details can be found here:
[KIP-101 Reference](https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation)
*The key elements are*:
- Epochs are stamped on messages as they enter the leader.
- Epochs are tracked in both leader and follower in a new checkpoint file.
- A new API allows followers to retrieve the leader's latest offset for a particular epoch.
- The logic for truncating the log, when a replica becomes a follower, has been moved from Partition into the ReplicaFetcherThread
- When partitions are added to the ReplicaFetcherThread they are added in an initialising state. Initialising partitions request leader epochs and then truncate their logs appropriately.
This test provides a good overview of the workflow `EpochDrivenReplicationProtocolAcceptanceTest.shouldFollowLeaderEpochBasicWorkflow()`
The corrupted log use case is covered by the test
`EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards()`
Remaining work: There is a do list here: https://docs.google.com/document/d/1edmMo70MfHEZH9x38OQfTWsHr7UGTvg-NOxeFhOeRew/edit?usp=sharing
Author: Ben Stopford <benstopford@gmail.com>
Author: Jun Rao <junrao@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2808 from benstopford/kip-101-v2
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2779 from cmccabe/KAFKA-4993
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2660 from ewencp/minor-make-configdef-safer
Of particular importance are compression buffers (64 KB for LZ4, for example).
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2796 from apurvam/idempotent-producer-close-data-stream
The bug meant that the base offset was the same as the batch size instead of 0 so the broker would always recompress batches.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2794 from ijuma/fix-records-builder-construction