- reuse decompression buffers in consumer Fetcher
- switch lz4 input stream to operate directly on ByteBuffers
- avoids performance impact of catching exceptions when reaching the end of legacy record batches
- more tests with both compressible / incompressible data, multiple
blocks, and various other combinations to increase code coverage
- fixes bug that would cause exception instead of invalid block size
for invalid incompressible blocks
- fixes bug if incompressible flag is set on end frame block size
Overall this improves LZ4 decompression performance by up to 40x for small batches.
Most improvements are seen for batches of size 1 with messages on the order of ~100B.
We see at least 2x improvements for for batch sizes of < 10 messages, containing messages < 10kB
This patch also yields 2-4x improvements on v1 small single message batches for other compression types.
Full benchmark results can be found here
https://gist.github.com/xvrl/05132e0643513df4adf842288be86efd
Author: Xavier Léauté <xavier@confluent.io>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2967 from xvrl/kafka-5150
ByteBufferOutputStream improvements:
* Document pitfalls
* Improve efficiency when dealing with direct byte buffers
* Improve handling of buffer expansion
* Be consistent about using `limit` instead of `capacity`
* Add constructors that allocate the internal buffer
Other minor changes:
* Fix log warning to specify correct Kafka version
* Clean-ups
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3166 from ijuma/minor-kafka-5316-follow-ups
Keep track of when a transaction has begun by setting a flag, `transactionStarted` when a successfull `AddPartitionsToTxnResponse` or `AddOffsetsToTxnResponse` had been received. If an `AbortTxnRequest` about to be sent and `transactionStarted` is false, don't send the request and transition the state to `READY`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#3126 from dguy/kafka-5260
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3142 from hachikuji/KAFKA-5316
Autogenerate docs for the Consumer Fetcher's metrics. This is a smaller subset of the original PR https://github.com/apache/kafka/pull/1202.
CC ijuma benstopford hachikuji
Author: James Cheng <jylcheng@yahoo.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Closes#2993 from wushujames/fetcher_metrics_docs
Add check in `KafkaApis` that the inter broker protocol version is at least `KAFKA_0_11_0_IV0`, i.e., supporting transactions
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3103 from dguy/kafka-5128
The previous code did not handle this correctly if a batch was
compacted more than once.
Also add test case for duplicate check after log cleaning and
improve various comments.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3145 from hachikuji/minor-improve-base-sequence-docs
The basic idea is that exactly three collections, ie. `pendingRequests`, `newPartitionsToBeAddedToTransaction`, and `partitionsInTransaction` are accessed from the context of application threads. The first two are modified from the application threads, and the last is read from those threads.
So to make the `TransactionManager` truly thread safe, we have to ensure that all accesses to these three members are done in a synchronized block. I inspected the code, and I believe this patch puts the synchronization in all the correct places.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3132 from apurvam/KAFKA-5147-transaction-manager-synchronization-fixes
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3137 from rajinisivaram/KAFKA-5320
For consumers with manual partition assignment, await metadata when there are no ready nodes to avoid busy polling.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3124 from rajinisivaram/KAFKA-5263
This call to isCompletedExceptionally introduced a race condition
because the future might not have been completed. assertFutureError
checks that the exception is present and of the correct type in any
case, so the call was not necessary.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3139 from cmccabe/fix-test-deleteacls
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3123 from hachikuji/KAFKA-4935
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>
Closes#3075 from hachikuji/KAFKA-5259-FIXED
Before this patch the consumer would return the cached offsets for partitions in its current assignment. This worked when all the offset commits went through the consumer.
With KIP-98, offsets can be committed transactionally through the producer. This means that relying on cached positions in the consumer returns incorrect information: since commits go through the producer, the cache is never updated.
Hence we need to update the `KafkaConsumer.committed` method to always lookup the server for the last committed offset to ensure it gets the correct information every time.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson, Guozhang Wang
Closes#3119 from apurvam/KAFKA-5273-kafkaconsumer-committed-should-always-hit-server
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3118 from hachikuji/disallow-transactional-idempotent-downconversion
... plus some minor cleanup
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3092 from vahidhashemian/KAFKA-5277
If props is null, use POLICY_NOPLAINTEXT default value: false
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3105 from mimaison/KAFKA-5294
Author: Jiangjie Qin <becket.qin@gmail.com>
Reviewers: Joel Koshy <jjkoshy.w@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2638 from becketqin/KAFKA-3995
We should retry AddPartitionsToTxnRequest and TxnOffsetCommitRequest when receiving an UNKNOWN_TOPIC_OR_PARTITION error.
As described in the JIRA: It turns out that the `UNKNOWN_TOPIC_OR_PARTITION` is returned from the request handler in KafkaAPis for the AddPartitionsToTxn and the TxnOffsetCommitRequest when the broker's metadata doesn't contain one or more partitions in the request. This can happen for instance when the broker is bounced and has not received the cluster metadata yet.
We should retry in these cases, as this is the model followed by the consumer when committing offsets, and by the producer with a ProduceRequest.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#3094 from apurvam/KAFKA-5269-handle-unknown-topic-partition-in-transaction-manager
Summary:
- add `reconnect.backoff.max.ms` common client configuration parameter
- if `reconnect.backoff.max.ms` > `reconnect.backoff.ms`, apply an exponential backoff policy
- apply +/- 20% random jitter to smooth cluster reconnects
Author: Dana Powers <dana.powers@gmail.com>
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Roger Hoover <roger.hoover@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1523 from dpkp/exp_backoff
Implements range scan for keys in windowed and session stores
Modifies caching session and windowed stores to use segmented cache keys.
Cache keys are internally prefixed with their segment id to ensure key ordering in the cache matches the ordering in the underlying store for keys spread across multiple segments.
This should also result in fewer cache keys getting scanned for queries spanning only some segments.
Author: Xavier Léauté <xavier@confluent.io>
Reviewers: Damian Guy, Guozhang Wang
Closes#3027 from xvrl/windowstore-range-scan
The backing store for offsets, status, and configs now attempts to use the new AdminClient to look up the internal topics and create them if they don’t yet exist. If the necessary APIs are not available in the connected broker, the stores fall back to the old behavior of relying upon auto-created topics. Kafka Connect requires a minimum of Apache Kafka 0.10.0.1-cp1, and the AdminClient can work with all versions since 0.10.0.0.
All three of Connect’s internal topics are created as compacted topics, and new distributed worker configuration properties control the replication factor for all three topics and the number of partitions for the offsets and status topics; the config topic requires a single partition and does not allow it to be set via configuration. All of these new configuration properties have sensible defaults, meaning users can upgrade without having to change any of the existing configurations. In most situations, existing Connect deployments will have already created the storage topics prior to upgrading.
The replication factor defaults to 3, so anyone running Kafka clusters with fewer nodes than 3 will receive an error unless they explicitly set the replication factor for the three internal topics. This is actually desired behavior, since it signals the users that they should be aware they are not using sufficient replication for production use.
The integration tests use a cluster with a single broker, so they were changed to explicitly specify a replication factor of 1 and a single partition.
The `KafkaAdminClientTest` was refactored to extract a utility for setting up a `KafkaAdminClient` with a `MockClient` for unit tests.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2984 from rhauch/kafka-4667
Use IBM Kerberos module for SASL tests if running on IBM JDK
Developed with edoardocomar
Based on https://github.com/apache/kafka/pull/738 by rajinisivaram
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>, Edoardo Comar <ecomar@uk.ibm.com>
Closes#2878 from mimaison/KAFKA-3070
This PR implements a new partition assignment strategy called "sticky", and it's purpose is to balance partitions across consumers in a way that minimizes moving partitions around, or, in other words, preserves existing partition assignments as much as possible.
This patch is co-authored with rajinisivaram and edoardocomar.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#1020 from vahidhashemian/KAFKA-2273
Includes server-side code, protocol and AdminClient.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2941 from cmccabe/KAFKA-3266
Author: Apurva Mehta <apurva@confluent.io>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2994 from apurvam/KAFKA-5188-exactly-once-integration-tests
Add a public create API that takes a Properties instance.
Make the constructors for TopicDescription, TopicListing
and TopicPartitionInfo public to enable AdminClient
users to write better tests.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3070 from cmccabe/publicapi
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Apurva Metha, Ismael Juma, Damian Guy, Eno Thereska, Guozhang Wang
Closes#2945 from mjsax/kafka-4923-add-eos-to-streams
Add ACL checks for Transactional APIs
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2979 from dguy/kafka-5129
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Roger Hoover <roger.hoover@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3013 from cmccabe/KAFKA-5215
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3063 from guozhangwang/KMinor-more-logging-in-fetcher
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3058 from hachikuji/KAFKA-5248
Producer id is used instead.
Also refactored TransactionLog schema code to follow
our naming convention and to have better structure.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#3041 from ijuma/eliminate-pid-terminology
The parse method was incorrectly referring to `ApiKeys.ADD_PARTITIONS_TO_TXN`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3056 from dguy/hotfix-add-offsets-to-txn-response
Without the histogram cleanup, the percentiles are calculated
incorrectly after purging of one or more samples: event counts
go out of sync with counts in histogram buckets, and bucket
with lower value gets chosen for the given quantile.
This change adds the necessary histogram cleanup.
Author: Ivan A. Melnikov <iv@altlinux.org>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3002 from iv-m/kafka-5203-percentiles-fix
During the 0.11.0.0 cycle, a Java version of the class
was introduced so that Streams could use it. Given that
it includes the bulk of the functionality of the Scala
version of the class, it makes sense to consolidate them.
While doing this, I noticed that one of the tests for
the Java class (`shouldThrowOnInvalidTopicNames`) was
broken as it only checked if the first topic name in
the list was invalid.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3046 from ijuma/consolidate-topic-classes
This patch adds support for the `TxnOffsetCommitRequest` added in KIP-98. Desired handling for this request is [described here](https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#bookmark=id.55yzhvkppi6m) .
The functionality includes handling the stable state of receiving `TxnOffsetCommitRequests` and materializing results only when the commit marker for the transaction is received. It also handles partition emigration and immigration and rebuilds the required data structures on these events.
Tests are included for all the functionality.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2970 from apurvam/KAFKA-5160-broker-side-support-for-txnoffsetcommitrequest
1. Collapsed the `ownedPartitions`, `pendingTxnMap` and the `transactionMetadataCache` into a single in-memory structure, which is a two-layered map: first keyed by the transactionTxnLog, and then valued with the current coordinatorEpoch of that map plus another map keyed by the transactional id.
2. Use `transactionalId` across the modules in transactional coordinator, attach this id with the transactional marker entries.
3. Use two keys: `transactionalId` and `txnLogPartitionId` in the writeMarkerPurgatory as well as passing it along with the TxnMarkerEntry, so that `TransactionMarkerRequestCompletionHandler` can use it to access the two-layered map upon getting responses.
4. Use one queue per `broker-id` and `txnLogPartitionId`. Also when there is a possible update on the end point associated with the `broker-id`, update the Node without clearing the queue but relying on the requests to retry in the next round.
5. Centralize the error handling callback for appending-metadata-to-log and sending-markers-to-brokers in `TransactionStateManager#appendTransactionToLog`, and `TransactionMarkerChannelManager#addTxnMarkersToSend`.
6. Always update the in-memory transaction metadata AFTER the txn log has been appended and replicated, and then double check on the cache to make sure nothing has changed since log appending. The only exception is when initializing the pid for the first time, in which we will put a dummy into the cache but set its pendingState as `Empty` (so it will be valid to transit from `Empty` to `Empty`) so that it can be updated after the log append has completed.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Ismael Juma, Damian Guy, Jason Gustafson, Jun Rao
Closes#2964 from guozhangwang/K5130-refactor-tc-inmemory-cache