Publish Javadoc for common.annotation package, which contains
InterfaceStability.
Finally, mark AdminClient classes with `Evolving` instead of `Unstable`.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Colin Mccabe, Gwen Shapira
Closes#3316 from ijuma/kafka-5274-admin-client-javadoc
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3201 from mjsax/kafka-5362-add-eos-system-tests-for-streams-api
I included a JMH benchmark and the results follow. The
implementation in this PR takes no more than 1/10th
of the time when compared to trunk. I also included
results for an alternative implementation that is a little
slower than the one in the PR.
Trunk:
```text
TopicBenchmark.testValidate topic avgt 15 134.107 ± 3.956 ns/op
TopicBenchmark.testValidate longer-topic-name avgt 15 316.241 ± 13.379 ns/op
TopicBenchmark.testValidate very-long-topic-name_with_more_text avgt 15 636.026 ± 30.272 ns/op
```
Implementation in the PR:
```text
TopicBenchmark.testValidate topic avgt 15 13.153 ± 0.383 ns/op
TopicBenchmark.testValidate longer-topic-name avgt 15 26.139 ± 0.896 ns/op
TopicBenchmark.testValidate very-long-topic-name.with_more_text avgt 15 44.829 ± 1.390 ns/op
```
Alternative implementation where boolean validChar = Character.isLetterOrDigit(c) || c == '.' || c == '_' || c == '-';
```text
TopicBenchmark.testValidate topic avgt 15 18.883 ± 1.044 ns/op
TopicBenchmark.testValidate longer-topic-name avgt 15 36.696 ± 1.220 ns/op
TopicBenchmark.testValidate very-long-topic-name_with_more_text avgt 15 65.956 ± 0.669 ns/op
```
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3234 from ijuma/optimise-topic-is-valid
The JMH benchmark included shows that the redundant
volatile write causes the constructor of `ProducerRecord`
to take more than 50% longer:
ProducerRecordBenchmark.constructorBenchmark avgt 15 24.136 ± 1.458 ns/op (before)
ProducerRecordBenchmark.constructorBenchmark avgt 15 14.904 ± 0.231 ns/op (after)
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3233 from ijuma/remove-volatile-write-in-records-header-constructor
It avoids the need to handle protocol downgrades and it's safe (i.e. it will never cause
the auto creation of topics).
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3220 from ijuma/kafka-5374-admin-client-metadata
Author: Mario Molina <mmolimar@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>, Michael G. Noll <michael@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3017 from mmolimar/KAFKA-5218
Also introduce TopicConfig.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3120 from cmccabe/KAFKA-5265
- reuse decompression buffers in consumer Fetcher
- switch lz4 input stream to operate directly on ByteBuffers
- avoids performance impact of catching exceptions when reaching the end of legacy record batches
- more tests with both compressible / incompressible data, multiple
blocks, and various other combinations to increase code coverage
- fixes bug that would cause exception instead of invalid block size
for invalid incompressible blocks
- fixes bug if incompressible flag is set on end frame block size
Overall this improves LZ4 decompression performance by up to 40x for small batches.
Most improvements are seen for batches of size 1 with messages on the order of ~100B.
We see at least 2x improvements for for batch sizes of < 10 messages, containing messages < 10kB
This patch also yields 2-4x improvements on v1 small single message batches for other compression types.
Full benchmark results can be found here
https://gist.github.com/xvrl/05132e0643513df4adf842288be86efd
Author: Xavier Léauté <xavier@confluent.io>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2967 from xvrl/kafka-5150
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3142 from hachikuji/KAFKA-5316
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3137 from rajinisivaram/KAFKA-5320
- add broker compatibility system tests
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Eno Thereska, Guozhang Wang
Closes#2974 from mjsax/kafka-4923-add-eos-to-streams-add-broker-check-and-system-test
Author: Jiangjie Qin <becket.qin@gmail.com>
Reviewers: Joel Koshy <jjkoshy.w@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2638 from becketqin/KAFKA-3995
Implements range scan for keys in windowed and session stores
Modifies caching session and windowed stores to use segmented cache keys.
Cache keys are internally prefixed with their segment id to ensure key ordering in the cache matches the ordering in the underlying store for keys spread across multiple segments.
This should also result in fewer cache keys getting scanned for queries spanning only some segments.
Author: Xavier Léauté <xavier@confluent.io>
Reviewers: Damian Guy, Guozhang Wang
Closes#3027 from xvrl/windowstore-range-scan
Includes server-side code, protocol and AdminClient.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2941 from cmccabe/KAFKA-3266
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Dan Norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2472 from cmccabe/KAFKA-3265
As per KIP-82
Adding record headers api to ProducerRecord, ConsumerRecord
Support to convert from protocol to api added Kafka Producer, Kafka Fetcher (Consumer)
Updated MirrorMaker, ConsoleConsumer and scala BaseConsumer
Add RecordHeaders and RecordHeader implementation of the interfaces Headers and Header
Some bits using are reverted to being Java 7 compatible, for the moment until KIP-118 is implemented.
Author: Michael Andre Pearce <Michael.Andre.Pearce@me.com>
Reviewers: Radai Rosenblatt <radai.rosenblatt@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2772 from michaelandrepearce/KIP-82
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2840 from apurvam/exactly-once-transactional-clients
Author: Damian Guy <damian.guy@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang, Jason Gustafson, Apurva Mehta, Jun Rao
Closes#2849 from dguy/exactly-once-tc
Enable producer per task if exactly-once config is enabled.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2773 from mjsax/exactly-once-streams-producer-per-task
Author: Matthias J. Sax <matthias@confluent.io>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2799 from mjsax/kafka-4990-add-api-stub-config-parameters-request-types
This PR replaces https://github.com/apache/kafka/pull/2743 (just raising from Confluent repo)
This PR describes the addition of Partition Level Leader Epochs to messages in Kafka as a mechanism for fixing some known issues in the replication protocol. Full details can be found here:
[KIP-101 Reference](https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation)
*The key elements are*:
- Epochs are stamped on messages as they enter the leader.
- Epochs are tracked in both leader and follower in a new checkpoint file.
- A new API allows followers to retrieve the leader's latest offset for a particular epoch.
- The logic for truncating the log, when a replica becomes a follower, has been moved from Partition into the ReplicaFetcherThread
- When partitions are added to the ReplicaFetcherThread they are added in an initialising state. Initialising partitions request leader epochs and then truncate their logs appropriately.
This test provides a good overview of the workflow `EpochDrivenReplicationProtocolAcceptanceTest.shouldFollowLeaderEpochBasicWorkflow()`
The corrupted log use case is covered by the test
`EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards()`
Remaining work: There is a do list here: https://docs.google.com/document/d/1edmMo70MfHEZH9x38OQfTWsHr7UGTvg-NOxeFhOeRew/edit?usp=sharing
Author: Ben Stopford <benstopford@gmail.com>
Author: Jun Rao <junrao@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2808 from benstopford/kip-101-v2
This is from the KIP-98 proposal.
The main points of discussion surround the correctness logic, particularly the Log class where incoming entries are validated and duplicates are dropped, and also the producer error handling to ensure that the semantics are sound from the users point of view.
There is some subtlety in the idempotent producer semantics. This patch only guarantees idempotent production upto the point where an error has to be returned to the user. Once we hit a such a non-recoverable error, we can no longer guarantee message ordering nor idempotence without additional logic at the application level.
In particular, if an application wants guaranteed message order without duplicates, then it needs to do the following in the error callback:
1. Close the producer so that no queued batches are sent. This is important for guaranteeing ordering.
2. Read the tail of the log to inspect the last message committed. This is important for avoiding duplicates.
Author: Apurva Mehta <apurva@confluent.io>
Author: hachikuji <jason@confluent.io>
Author: Apurva Mehta <apurva.1618@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: fpj <fpj@apache.org>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2735 from apurvam/exactly-once-idempotent-producer
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jun Rao <junrao@gmail.com>, Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2691 from cmccabe/KAFKA-4902
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Jun Rao <junrao@gmail.com>, Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2614 from hachikuji/exactly-once-message-format
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ewen Cheslack-Postava <me@ewencp.org>, Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2594 from dguy/checkstyle
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2303 from mjsax/licenseHeader
Author: Maysam Yabandeh <myabandeh@dropbox.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2474 from ijuma/kafka-4039-deadlock-during-shutdown
After debugging this i can see the times that it fails there is a race between when the topic is actually created/ready on the broker and when the assignment happens. When it fails `StreamPartitionAssignor.assign(..)` gets called with a `Cluster` with no topics. Hence the test hangs as no tasks get assigned. To fix this I added a `waitForTopics` method to `EmbeddedKafkaCluster`. This will wait until the topics have been created.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2371 from dguy/integration-test-fix
Besides API and runtime changes, this PR also includes 2 data transformations (`InsertField`, `HoistToStruct`) and 1 routing transformation (`TimestampRouter`).
There is some gnarliness in `ConnectorConfig` / `ConfigDef` around creating, parsing and validating a dynamic `ConfigDef`.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2299 from shikhar/smt-2017
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>, Jun Rao <junrao@gmail.com>
Closes#2140 from hachikuji/KAFKA4390
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Michael G. Noll, Eno Thereska, Damian Guy, Guozhang Wang
Closes#2117 from mjsax/kafka-4393-improveInvalidTsHandling
This PR implements KIP-78:Cluster Identifiers [(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id#KIP-78:ClusterId-Overview) and includes the following changes:
1. Changes to broker code
- generate cluster id and store it in Zookeeper
- update protocol to add cluster id to metadata request and response
- add ClusterResourceListener interface, ClusterResource class and ClusterMetadataListeners utility class
- send ClusterResource events to the metric reporters
2. Changes to client code
- update Cluster and Metadata code to support cluster id
- update clients for sending ClusterResource events to interceptors, (de)serializers and metric reporters
3. Integration tests for interceptors, (de)serializers and metric reporters for clients and for protocol changes and metric reporters for broker.
4. System tests for upgrading from previous versions.
Author: Sumit Arrawatia <sumit.arrawatia@gmail.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1830 from arrawatia/kip-78
moved streams application reset tool from tools to core
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1685 from mjsax/moveResetTool
(cherry picked from commit f2405a73ea)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
guozhangwang enothereska mjsax miguno
If you get a chance can you please take a look at this. I've done the repartitioning in the join, but it results in 2 internal topics for each join. This seems like overkill as sometimes we wouldn't need to repartition at all, others just 1 topic, and then sometimes both, but I'm not sure how we can know that.
I'd also need to implement something similar for leftJoin, but again, i'd like to see if i'm heading down the right path or if anyone has any other bright ideas.
For reference - https://github.com/apache/kafka/pull/1453 - the previous PR
Thanks for taking the time and looking forward to getting some welcome advice :-)
Author: Damian Guy <damian.guy@gmail.com>
Author: Damian Guy <damian@continuum.local>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#1472 from dguy/KAFKA-3561
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Ismael Juma, Michael G. Noll, Guozhang Wang
Closes#1285 from enothereska/more-integration-tests
ewencp granders Can you take a look? Thanks!
Author: Liquan Pei <liquanpei@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1259 from Ishiihara/fix-warning
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Ismael Juma, Damian Guy, Michael G. Noll, Guozhang Wang
Closes#1260 from enothereska/KAFKA-3612-integration-tests