This PR implements the KIP-559: https://cwiki.apache.org/confluence/display/KAFKA/KIP-559%3A+Make+the+Kafka+Protocol+Friendlier+with+L7+Proxies
- it adds the Protocol Type and the Protocol Name fields in JoinGroup and SyncGroup API;
- it validates that the fields are provided by the client when the new version of the API is used and ensure that they are consistent. it errors out otherwise;
- it validates that the fields are consistent in the client and errors out otherwise;
- it adds many tests related to the API changes but also extends the testing coverage of the requests/responses themselves.
- it standardises the naming in the coordinator. now, `ProtocolType` and `ProtocolName` are used across the board in the coordinator instead of having a mix of protocol type, protocol name, subprotocol, protocol, etc.
Reviewers: Jason Gustafson <jason@confluent.io>
Rather than maintain hand coded protocol serialization code, Streams could use the same code-generation framework as Clients/Core.
There isn't a perfect match, since the code generation framework includes an assumption that you're generating "protocol messages", rather than just arbitrary blobs, but I think it's close enough to justify using it, and improving it over time.
Using the code generation allows us to drop a lot of detail-oriented, brittle, and hard-to-maintain serialization logic in favor of a schema spec.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The scalac optimizer is able to inline methods to avoid lambda allocations, eliminating
the runtime cost of higher order functions in many cases. The compilation parameters
we are using here were introduced in 2.12.x, so we don't enable them for Scala 2.11.
Also, we enable a more aggressive inlining policy for the `core` project since it's
not meant to be used as a library.
See https://www.lightbend.com/blog/scala-inliner-optimizer for more information about
the optimizer.
I verified that the lambda allocation in the code below (from LogCleaner.scala) went away
after this change with Scala 2.12 and 2.13.
```scala
private def consumeAbortedTxnsUpTo(offset: Long): Unit = {
while (abortedTransactions.headOption.exists(_.firstOffset <= offset)) {
val abortedTxn = abortedTransactions.dequeue()
ongoingAbortedTxns.getOrElseUpdate(abortedTxn.producerId, new AbortedTransactionMetadata(abortedTxn))
}
}
```
The relevant part of the bytecode when compiled with Scala 2.13 looks like:
```text
private void consumeAbortedTxnsUpTo(long);
Code:
0: aload_0
1: invokespecial #54 // Method abortedTransactions:()Lscala/collection/mutable/PriorityQueue;
4: invokevirtual #175 // Method scala/collection/mutable/PriorityQueue.headOption:()Lscala/Option;
7: dup
8: ifnonnull 13
11: aconst_null
12: athrow
13: astore 4
15: aload 4
17: invokevirtual #145 // Method scala/Option.isEmpty:()Z
20: ifne 48
23: aload 4
25: invokevirtual #148 // Method scala/Option.get:()Ljava/lang/Object;
28: checkcast #177 // class kafka/log/AbortedTxn
```
The increased inlining causes some spurious spotBugs warnings, I added a few suppressions
and fixed one warning by avoiding unnecessary boxing.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Part of supporting KIP-213 ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable ). Murmur3 hash is used as a hashing mechanism in KIP-213 for the large range of uniqueness. The Murmur3 class and tests are ported directly from Apache Hive, with no alterations to the code or dependencies.
Author: Adam Bellemare <adam.bellemare@wishabi.com>
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Closes#7271 from bellemare/murmur3hash
This PR makes two changes to code in the ReplicaManager.updateFollowerFetchState path, which is in the hot path for follower fetches. Although calling ReplicaManager.updateFollowerFetch state is inexpensive on its own, it is called once for each partition every time a follower fetch occurs.
1. updateFollowerFetchState no longer calls maybeExpandIsr when the follower is already in the ISR. This avoid repeated expansion checks.
2. Partition.maybeIncrementLeaderHW is also in the hot path for ReplicaManager.updateFollowerFetchState. Partition.maybeIncrementLeaderHW calls Partition.remoteReplicas four times each iteration, and it performs a toSet conversion. maybeIncrementLeaderHW now avoids generating any intermediate collections when updating the HWM.
**Benchmark results for Partition.updateFollowerFetchState on a r5.xlarge:**
Old:
```
1288.633 ±(99.9%) 1.170 ns/op [Average]
(min, avg, max) = (1287.343, 1288.633, 1290.398), stdev = 1.037
CI (99.9%): [1287.463, 1289.802] (assumes normal distribution)
```
New (when follower fetch offset is updated):
```
261.727 ±(99.9%) 0.122 ns/op [Average]
(min, avg, max) = (261.565, 261.727, 261.937), stdev = 0.114
CI (99.9%): [261.605, 261.848] (assumes normal distribution)
```
New (when follower fetch offset is the same):
```
68.484 ±(99.9%) 0.025 ns/op [Average]
(min, avg, max) = (68.446, 68.484, 68.520), stdev = 0.023
CI (99.9%): [68.460, 68.509] (assumes normal distribution)
```
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Scala 2.13 support was added to build via #5454. This PR adjusts the code so that
it compiles with 2.11, 2.12 and 2.13.
Changes:
* Add `scala-collection-compat` dependency.
* Import `scala.collection.Seq` in a number of places for consistent behavior between
Scala 2.11, 2.12 and 2.13.
* Remove wildcard imports that were causing the Java classes to have priority over the
Scala ones, related Scala issue: https://github.com/scala/scala/pull/6589.
* Replace parallel collection usage with `Future`. The former is no longer included by
default in the standard library.
* Replace val _: Unit workaround with one that is more concise and works with Scala 2.13
* Replace `filterKeys` with `filter` when we expect a `Map`. `filterKeys` returns a view
that doesn't implement the `Map` trait in Scala 2.13.
* Replace `mapValues` with `map` or add a `toMap` as an additional transformation
when we expect a `Map`. `mapValues` returns a view that doesn't implement the
`Map` trait in Scala 2.13.
* Replace `breakOut` with `iterator` and `to`, `breakOut` was removed in Scala
2.13.
* Replace to() with toMap, toIndexedSeq and toSet
* Replace `mutable.Buffer.--` with `filterNot`.
* ControlException is an abstract class in Scala 2.13.
* Variable arguments can only receive arrays or immutable.Seq in Scala 2.13.
* Use `Factory` instead of `CanBuildFrom` in DecodeJson. `CanBuildFrom` behaves
a bit differently in Scala 2.13 and it's been deprecated. `Factory` has the behavior
we need and it's available via the compat library.
* Fix failing tests due to behavior change in Scala 2.13,
"Map.values.map is not strict in Scala 2.13" (https://github.com/scala/bug/issues/11589).
* Use Java collections instead of Scala ones in StreamResetter (a Java class).
* Adjust CheckpointFile.write to take an `Iterable` instead of `Seq` to avoid
unnecessary collection copies.
* Fix DelayedElectLeader to use a Map instead of Set and avoid `to` call that
doesn't work in Scala 2.13.
* Use unordered map for mapping in SimpleAclAuthorizer, mapping of ordered
maps require an `Ordering` in Scala 2.13 for safety reasons.
* Adapt `ConsumerGroupCommand` to compile with Scala 2.13.
* CoreUtils.min takes an `Iterable` instead of `TraversableOnce`, the latter does
not exist in Scala 2.13.
* Replace `Unit` with `()` in a couple places. Scala 2.13 is stricter when it expects
a value instead of a type.
* Fix bug in CustomQuotaCallbackTest where we did not necessarily set `partitionRatio`
correctly, `forall` can terminate early.
* Add a couple of spotbugs exclusions that are needed by code generated by Scala 2.13
* Remove unused variables, simplify some code and remove procedure syntax in a few
places.
* Remove unused `CoreUtils.JSONEscapeString`.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, José Armando García Sancio <jsancio@users.noreply.github.com>
Since the originals map passed to AbstractConfig constructor may be immutable, avoid updating this map while resolving indirect config variables. Instead a new ResolvingMap instance is now used to store resolved configs.
Reviewers: Randall Hauch <rhauch@gmail.com>, Boyang Chen <bchen11@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
This patch adds a framework to automatically generate the request/response classes for Kafka's protocol. The code will be updated to use the generated classes in follow-up patches. Below is a brief summary of the included components:
**buildSrc/src**
The message generator code is here. This code is automatically re-run by gradle when one of the schema files changes. The entire directory is processed at once to minimize the number of times we have to start a new JVM. We use Jackson to translate the JSON files into Java objects.
**clients/src/main/java/org/apache/kafka/common/protocol/Message.java**
This is the interface implemented by all automatically generated messages.
**clients/src/main/java/org/apache/kafka/common/protocol/MessageUtil.java**
Some utility functions used by the generated message code.
**clients/src/main/java/org/apache/kafka/common/protocol/Readable.java, Writable.java, ByteBufferAccessor.java**
The generated message code uses these classes for writing to a buffer.
**clients/src/main/message/README.md**
This README file explains how the JSON schemas work.
**clients/src/main/message/\*.json**
The JSON files in this directory implement every supported version of every Kafka API. The unit tests automatically validate that the generated schemas match the hand-written schemas in our code. Additionally, there are some things like request and response headers that have schemas here.
**clients/src/main/java/org/apache/kafka/common/utils/ImplicitLinkedHashSet.java**
I added an optimization here for empty sets. This is useful here because I want all messages to start with empty sets by default prior to being loaded with data. This is similar to the "empty list" optimizations in the `java.util.ArrayList` class.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>, Bob Barrett <bob.barrett@outlook.com>, Jason Gustafson <jason@confluent.io>
See https://github.com/spotbugs/spotbugs/issues/756 for details on
the false positives affecting try with resources. An example is:
> RCN | Nullcheck of fc at line 629 of value previously dereferenced in
> org.apache.kafka.common.utils.Utils.readFileAsString(String, Charset)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
findBugs is abandoned, it doesn't work with Java 9 and the Gradle plugin will be deprecated in
Gradle 5.0: https://github.com/gradle/gradle/pull/6664
spotBugs is actively maintained and it supports Java 8, 9 and 10. Java 11 is not supported yet,
but it's likely to happen soon.
Also fixed a file leak in Connect identified by spotbugs.
Manually tested spotBugsMain, jarAll and importing kafka in IntelliJ and running
a build in the IDE.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Dong Lin <lindong28@gmail.com>
Closes#5625 from ijuma/kafka-5887-spotbugs
Previously, we depicted creating a Jackson serde for every pojo class, which becomes a burden in practice. There are many ways to avoid this and just have a single serde, so we've decided to model this design choice instead.
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
- Removed Scala consumers (`SimpleConsumer` and `ZooKeeperConsumerConnector`)
and their tests.
- Removed Scala request/response/message classes.
- Removed any mention of new consumer or new producer in the code
with the exception of MirrorMaker where the new.consumer option was
never deprecated so we have to keep it for now. The non-code
documentation has not been updated either, that will be done
separately.
- Removed a number of tools that only made sense in the context
of the Scala consumers (see upgrade notes).
- Updated some tools that worked with both Scala and Java consumers
so that they only support the latter (see upgrade notes).
- Removed `BaseConsumer` and related classes apart from `BaseRecord`
which is used in `MirrorMakerMessageHandler`. The latter is a pluggable
interface so effectively public API.
- Removed `ZkUtils` methods that were only used by the old consumers.
- Removed `ZkUtils.registerBroker` and `ZKCheckedEphemeral` since
the broker now uses the methods in `KafkaZkClient` and no-one else
should be using that method.
- Updated system tests so that they don't use the Scala consumers except
for multi-version tests.
- Updated LogDirFailureTest so that the consumer offsets topic would
continue to be available after all the failures. This was necessary for it
to work with the Java consumer.
- Some multi-version system tests had not been updated to include
recently released Kafka versions, fixed it.
- Updated findBugs and checkstyle configs not to refer to deleted
classes and packages.
Reviewers: Dong Lin <lindong28@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
KAFKA-6921 removed deprecated scala producer. This pull request removes the now unnecessary findbugs exclusion that matched one of the affected classes.
This PR implements a Scala wrapper library for Kafka Streams. The library is implemented as a project under streams, namely `:streams:streams-scala`. The PR contains the following:
* the library implementation of the wrapper abstractions
* the test suite
* the changes in `build.gradle` to build the library jar
The library has been tested running the tests as follows:
```
$ ./gradlew -Dtest.single=StreamToTableJoinScalaIntegrationTestImplicitSerdes streams:streams-scala:test
$ ./gradlew -Dtest.single=StreamToTableJoinScalaIntegrationTestImplicitSerdesWithAvro streams:streams-scala:test
$ ./gradlew -Dtest.single=WordCountTest streams:streams-scala:test
```
Author: Debasish Ghosh <ghosh.debasish@gmail.com>
Author: Sean Glover <seglo@randonom.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>, John Roesler <john@confluent.io>, Damian Guy <damian@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4756 from debasishg/scala-streams
- Use newly added pause method in LogCleaner and ControllerChannelManager classes
- Remove LogCleaner, Cleaner exclusions from findbugs-exclude.xml
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Previously, Trogdor only handled "Faults." Now, Trogdor can handle
"Tasks" which may be either faults, or workloads to execute in the
background.
The Agent and Coordinator have been refactored from a
mutexes-and-condition-variables paradigm into a message passing
paradigm. No locks are necessary, because only one thread can access
the task state or worker state. This makes them a lot easier to reason
about.
The MockTime class can now handle mocking deferred message passing
(adding a message to an ExecutorService with a delay). I added a
MockTimeTest.
MiniTrogdorCluster now starts up Agent and Coordinator classes in
paralle in order to minimize junit test time.
RPC messages now inherit from a common Message.java class. This class
handles implementing serialization, equals, hashCode, etc.
Remove FaultSet, since it is no longer necessary.
Previously, if CoordinatorClient or AgentClient hit a networking
problem, they would throw an exception. They now retry several times
before giving up. Additionally, the REST RPCs to the Coordinator and
Agent have been changed to be idempotent. If a response is lost, and
the request is resent, no harm will be done.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#4073 from cmccabe/KAFKA-6060
I included a JMH benchmark and the results follow. The
implementation in this PR takes no more than 1/10th
of the time when compared to trunk. I also included
results for an alternative implementation that is a little
slower than the one in the PR.
Trunk:
```text
TopicBenchmark.testValidate topic avgt 15 134.107 ± 3.956 ns/op
TopicBenchmark.testValidate longer-topic-name avgt 15 316.241 ± 13.379 ns/op
TopicBenchmark.testValidate very-long-topic-name_with_more_text avgt 15 636.026 ± 30.272 ns/op
```
Implementation in the PR:
```text
TopicBenchmark.testValidate topic avgt 15 13.153 ± 0.383 ns/op
TopicBenchmark.testValidate longer-topic-name avgt 15 26.139 ± 0.896 ns/op
TopicBenchmark.testValidate very-long-topic-name.with_more_text avgt 15 44.829 ± 1.390 ns/op
```
Alternative implementation where boolean validChar = Character.isLetterOrDigit(c) || c == '.' || c == '_' || c == '-';
```text
TopicBenchmark.testValidate topic avgt 15 18.883 ± 1.044 ns/op
TopicBenchmark.testValidate longer-topic-name avgt 15 36.696 ± 1.220 ns/op
TopicBenchmark.testValidate very-long-topic-name_with_more_text avgt 15 65.956 ± 0.669 ns/op
```
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3234 from ijuma/optimise-topic-is-valid
The JMH benchmark included shows that the redundant
volatile write causes the constructor of `ProducerRecord`
to take more than 50% longer:
ProducerRecordBenchmark.constructorBenchmark avgt 15 24.136 ± 1.458 ns/op (before)
ProducerRecordBenchmark.constructorBenchmark avgt 15 14.904 ± 0.231 ns/op (after)
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3233 from ijuma/remove-volatile-write-in-records-header-constructor
- reuse decompression buffers in consumer Fetcher
- switch lz4 input stream to operate directly on ByteBuffers
- avoids performance impact of catching exceptions when reaching the end of legacy record batches
- more tests with both compressible / incompressible data, multiple
blocks, and various other combinations to increase code coverage
- fixes bug that would cause exception instead of invalid block size
for invalid incompressible blocks
- fixes bug if incompressible flag is set on end frame block size
Overall this improves LZ4 decompression performance by up to 40x for small batches.
Most improvements are seen for batches of size 1 with messages on the order of ~100B.
We see at least 2x improvements for for batch sizes of < 10 messages, containing messages < 10kB
This patch also yields 2-4x improvements on v1 small single message batches for other compression types.
Full benchmark results can be found here
https://gist.github.com/xvrl/05132e0643513df4adf842288be86efd
Author: Xavier Léauté <xavier@confluent.io>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2967 from xvrl/kafka-5150
The code was correct since the method is only called from
one thread, but the change is worthwhile anyway.
Author: Amit Daga <adaga@adobe.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2966 from amitdaga/findbugs-streams-multithread
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2840 from apurvam/exactly-once-transactional-clients
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jozef Koval <jozef.koval@protonmail.ch>, Ismael Juma <ismael@juma.me.uk>
Closes#2687 from cmccabe/KAFKA-4899
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2780 from cmccabe/KAFKA-4995
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2779 from cmccabe/KAFKA-4993
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2763 from cmccabe/KAFKA-4977