src-kafka

Commit Graph

Author	SHA1	Message	Date
Lucas Bradstreet	1675115ec1	MINOR: refactor replica last sent HW updates due to performance regression (#7671 ) This change fixes a performance regression due to follower last seen highwatermark handling introduced in `23beeea`. maybeUpdateHwAndSendResponse is expensive for brokers with high partition counts, as it requires a partition and a replica lookup for every partition being fetched. This refactor moves the last seen watermark update into the follower fetch state update where we have already looked up the partition and replica. I've seen cases where maybeUpdateHwAndSendResponse is responsible 8% of CPU usage, not including the responseCallback call that is part of it. I have benchmarked this change with `UpdateFollowerFetchStateBenchmark` and it adds 5ns of overhead to Partition.updateFollowerFetchState, which is a rounding error compared to the current overhead of maybeUpdateHwAndSendResponse. Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Guozhang Wang	59a75f4422	KAFKA-9048 Pt1: Remove Unnecessary lookup in Fetch Building (#7576 ) Get rid of partitionStates that creates a new PartitionState for each state since all the callers do not require it to be a Seq. Modify ReplicaFetcherThread constructor to fix the broken benchmark path. This PR: Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 9280.953 ± 55.967 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 61533.546 ± 1213.559 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 151306.146 ± 1820.222 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1138547.929 ± 45301.938 ns/op Trunk: Benchmark (partitionCount) Mode Cnt Score Error Units \| \| \| \| \| -- \| -- \| -- \| -- \| -- \| -- ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 9305.588 ± 51.886 ns/op \| \| \| \| \| ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 65216.933 ± 939.827 ns/op \| \| \| \| \| ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 151715.514 ± 1361.009 ns/op \| \| \| \| \| ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1231958.103 ± 94 Reviewers: Jason Gustafson <jason@confluent.io>, Lucas Bradstreet <lucasbradstreet@gmail.com>	5 years ago
Lucas Bradstreet	8966d066bd	KAFKA-9039: Optimize ReplicaFetcher fetch path (#7443 ) Improves the performance of the replica fetcher for high partition count fetch requests, where a majority of the partitions did not update between fetch requests. All benchmarks were run on an r5x.large. Vanilla Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 26491.825 ± 438.463 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 153941.952 ± 4337.073 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 339868.602 ± 4201.462 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2588878.448 ± 22172.482 ns/op From 100 to 5000 partitions the latency increase is 2588878.448 / 26491.825 = 97. Avoid gettimeofdaycalls in steady state fetch states `8545888` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 22685.381 ± 267.727 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 113622.521 ± 1854.254 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 273698.740 ± 9269.554 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2189223.207 ± 1706.945 ns/op From 100 to 5000 partitions the latency increase is 2189223.207 / 22685.381 = 97X Avoid copying partition states to maintain fetch offsets `29fdd60` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 17039.989 ± 609.355 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 99371.086 ± 1833.256 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 216071.333 ± 3714.147 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2035678.223 ± 5195.232 ns/op From 100 to 5000 partitions the latency increase is 2035678.223 / 17039.989 = 119X Keep lag alongside PartitionFetchState to avoid expensive isReplicaInSync check `0e57e3e` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 15131.684 ± 382.088 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 86813.843 ± 3346.385 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 193050.381 ± 3281.833 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1801488.513 ± 2756.355 ns/op From 100 to 5000 partitions the latency increase is 1801488.513 / 15131.684 = 119X Fetch session optimizations (mostly presizing the next hashmap, and avoiding making a copy of sessionPartitions, as a deep copy is not required for the ReplicaFetcher) `2614b24` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 11386.203 ± 416.701 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 60820.292 ± 3163.001 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 146242.158 ± 1937.254 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1366768.926 ± 3305.712 ns/op From 100 to 5000 partitions the latency increase is 1366768.926 / 11386.203 = 120 Reviewers: Jun Rao <junrao@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Ismael Juma	66183f730f	KAFKA-8471: Replace control requests/responses with automated protocol (#7353 ) Replaced UpdateMetadata{Request, Response}, LeaderAndIsr{Request, Response} and StopReplica{Request, Response} with the automated protocol classes. Updated the JSON schema for the 3 request types to be more consistent and less strict (if needed to avoid duplication). The general approach is to avoid generating new collections in the request classes. Normalization happens in the constructor to make this possible. Builders still have to group by topic to maintain the external ungrouped view. Introduced new tests for LeaderAndIsrRequest and UpdateMetadataRequest to verify that the new logic is correct. A few other clean-ups/fixes in code that was touched due to these changes: * KAFKA-8956: Refactor DelayedCreatePartitions#updateWaiting to avoid modifying collection in foreach. * Avoid unnecessary allocation for state change trace logging if trace logging is not enabled * Use `toBuffer` instead of `toList`, `toIndexedSeq` or `toSeq` as it generally performs better and it matches the performance characteristics of `java.util.ArrayList`. This is particularly important when passing such instances to Java code. * Minor refactoring for clarity and readability. * Removed usage of deprecated `/:`, unused imports and unnecessary `var`s. * Include exception in `AdminClientIntegrationTest` failure message. * Move StopReplicaRequest verification in `AuthorizerIntegrationTest` to the end to match the comment. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>	5 years ago
Lucas Bradstreet	f3ded39a05	KAFKA-8841; Reduce overhead of ReplicaManager.updateFollowerFetchState (#7324 ) This PR makes two changes to code in the ReplicaManager.updateFollowerFetchState path, which is in the hot path for follower fetches. Although calling ReplicaManager.updateFollowerFetch state is inexpensive on its own, it is called once for each partition every time a follower fetch occurs. 1. updateFollowerFetchState no longer calls maybeExpandIsr when the follower is already in the ISR. This avoid repeated expansion checks. 2. Partition.maybeIncrementLeaderHW is also in the hot path for ReplicaManager.updateFollowerFetchState. Partition.maybeIncrementLeaderHW calls Partition.remoteReplicas four times each iteration, and it performs a toSet conversion. maybeIncrementLeaderHW now avoids generating any intermediate collections when updating the HWM. Benchmark results for Partition.updateFollowerFetchState on a r5.xlarge: Old: ``` 1288.633 ±(99.9%) 1.170 ns/op [Average] (min, avg, max) = (1287.343, 1288.633, 1290.398), stdev = 1.037 CI (99.9%): [1287.463, 1289.802] (assumes normal distribution) ``` New (when follower fetch offset is updated): ``` 261.727 ±(99.9%) 0.122 ns/op [Average] (min, avg, max) = (261.565, 261.727, 261.937), stdev = 0.114 CI (99.9%): [261.605, 261.848] (assumes normal distribution) ``` New (when follower fetch offset is the same): ``` 68.484 ±(99.9%) 0.025 ns/op [Average] (min, avg, max) = (68.446, 68.484, 68.520), stdev = 0.023 CI (99.9%): [68.460, 68.509] (assumes normal distribution) ``` Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>	5 years ago
Lucas Bradstreet	fb381cb6c7	MINOR: Fix integer overflow in LRUCacheBenchmark (#7270 ) The jmh LRUCacheBenchmark will exhibit an int overflow when run on a fast machine: ``` java.lang.ArrayIndexOutOfBoundsException: Index -3648 out of bounds for length 10000 at org.apache.kafka.jmh.cache.LRUCacheBenchmark.testCachePerformance(LRUCacheBenchmark.java:70) at org.apache.kafka.jmh.cache.generated.LRUCacheBenchmark_testCachePerformance_jmhTest.testCachePerformance_thrpt_jmhStub(LRUCacheBenchmark_testCachePerformance_jmhTest.java:119) at org.apache.kafka.jmh.cache.generated.LRUCacheBenchmark_testCachePerformance_jmhTest.testCachePerformance_Throughput(LRUCacheBenchmark_testCachePerformance_jmhTest.java:83) ``` Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Guozhang Wang	3e9d1c1411	KAFKA-8106: Skipping ByteBuffer allocation of key / value / headers in logValidator (#6785 ) * KAFKA-8106:Reducing the allocation and copying of ByteBuffer when logValidator do validation. * KAFKA-8106:Reducing the allocation and copying of ByteBuffer when logValidator do validation. * github comments * use batch.skipKeyValueIterator * cleanups * no need to skip kv for uncompressed iterator * checkstyle fixes * fix findbugs * adding unit tests * reuse decompression buffer; and using streaming iterator * checkstyle * add unit tests * remove reusing buffer supplier * fix unit tests * add unit tests * use streaming iterator * minor refactoring * rename * github comments * github comments * reuse buffer at DefaultRecord caller * some further optimization * major refactoring * further refactoring * update comment * github comments * minor fix * add jmh benchmarks * update jmh * github comments * minor fix * github comments	5 years ago
Lifei Chen	3322439d98	MINOR: Document improvement (#6682 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	6 years ago
Ismael Juma	7d9e93ac6d	MINOR: Use https instead of http in links (#6477 ) Verified that the https links work. I didn't update the license header in this PR since that touches so many files. Will file a separate one for that. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	6 years ago
Ismael Juma	12f310d50e	KAFKA-7612: Fix javac warnings and enable warnings as errors (#5900 ) - Use Xlint:all with 3 exclusions (filed KAFKA-7613 to remove the exclusions) - Use the same javac options when compiling tests (seems accidental that we didn't do this before) - Replaced several deprecated method calls with non-deprecated ones: - `KafkaConsumer.poll(long)` and `KafkaConsumer.close(long)` - `Class.newInstance` and `new Integer/Long` (deprecated since Java 9) - `scala.Console` (deprecated in Scala 2.11) - `PartitionData` taking a timestamp (one of them seemingly a bug) - `JsonMappingException` single parameter constructor - Fix unnecessary usage of raw types in several places. - Add @SuppressWarnings for deprecations, unchecked and switch fallthrough in several places. - Scala clean-ups (var -> val, ETA expansion warnings, avoid reflective calls) - Use lambdas to simplify code in a few places - Add @SafeVarargs, fix varargs usage and remove unnecessary `Utils.mkList` method Reviewers: Matthias J. Sax <mjsax@apache.org>, Manikumar Reddy <manikumar.reddy@gmail.com>, Randall Hauch <rhauch@gmail.com>, Bill Bejeck <bill@confluent.io>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>	6 years ago
Armin Braun	346d0ca538	MINOR: Fix needless GC + Result time unit in JMH Fixes two issues with the JMH benchmark example: * Trivial: The output should be in `ops/ms` for readability reasons (it's in the millions of operations per second) * Important: The benchmark is not actually measuring the LRU Cache performance as most of the time in each run is wasted on concatenating `key + counter` as well as `value + counter`. Fixed by pre-generating 10k K-V pairs (100x the cache capacity) and iterating over them. This brings the performance up by a factor of more than 5 on a standard 4 core i7 (`~6k/ms` before goes to `~35k/ms`). Author: Armin Braun <me@obrown.io> Reviewers: Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk> Closes #2903 from original-brownbear/fix-jmh-example	7 years ago
Bill Bejeck	b6adb2dc89	KAFKA-3989; MINOR: follow-up: update script to run from kafka root …h-benchmarks/jmh.sh Author: bbejeck <bbejeck@gmail.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com> Closes #2654 from bbejeck/KAFKA-3989_follow_up	7 years ago
Jason Gustafson	f49697a279	KAFKA-5456; Ensure producer handles old format large compressed messages More specifically, fix the case where a compressed V0 or V1 message is larger than the producer batch size. Author: Jason Gustafson <jason@confluent.io> Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk> Closes #3356 from hachikuji/KAFKA-5456	8 years ago
Ismael Juma	39eb31feae	MINOR: Optimise performance of `Topic.validate()` I included a JMH benchmark and the results follow. The implementation in this PR takes no more than 1/10th of the time when compared to trunk. I also included results for an alternative implementation that is a little slower than the one in the PR. Trunk: ```text TopicBenchmark.testValidate topic avgt 15 134.107 ± 3.956 ns/op TopicBenchmark.testValidate longer-topic-name avgt 15 316.241 ± 13.379 ns/op TopicBenchmark.testValidate very-long-topic-name_with_more_text avgt 15 636.026 ± 30.272 ns/op ``` Implementation in the PR: ```text TopicBenchmark.testValidate topic avgt 15 13.153 ± 0.383 ns/op TopicBenchmark.testValidate longer-topic-name avgt 15 26.139 ± 0.896 ns/op TopicBenchmark.testValidate very-long-topic-name.with_more_text avgt 15 44.829 ± 1.390 ns/op ``` Alternative implementation where boolean validChar = Character.isLetterOrDigit(c) \|\| c == '.' \|\| c == '_' \|\| c == '-'; ```text TopicBenchmark.testValidate topic avgt 15 18.883 ± 1.044 ns/op TopicBenchmark.testValidate longer-topic-name avgt 15 36.696 ± 1.220 ns/op TopicBenchmark.testValidate very-long-topic-name_with_more_text avgt 15 65.956 ± 0.669 ns/op ``` Author: Ismael Juma <ismael@juma.me.uk> Reviewers: Guozhang Wang <wangguoz@gmail.com> Closes #3234 from ijuma/optimise-topic-is-valid	8 years ago
Ismael Juma	c7bc8f7d8c	MINOR: Remove redundant volatile write in RecordHeaders The JMH benchmark included shows that the redundant volatile write causes the constructor of `ProducerRecord` to take more than 50% longer: ProducerRecordBenchmark.constructorBenchmark avgt 15 24.136 ± 1.458 ns/op (before) ProducerRecordBenchmark.constructorBenchmark avgt 15 14.904 ± 0.231 ns/op (after) Author: Ismael Juma <ismael@juma.me.uk> Reviewers: Jason Gustafson <jason@confluent.io> Closes #3233 from ijuma/remove-volatile-write-in-records-header-constructor	8 years ago
Xavier Léauté	c060c48285	KAFKA-5150; Reduce LZ4 decompression overhead - reuse decompression buffers in consumer Fetcher - switch lz4 input stream to operate directly on ByteBuffers - avoids performance impact of catching exceptions when reaching the end of legacy record batches - more tests with both compressible / incompressible data, multiple blocks, and various other combinations to increase code coverage - fixes bug that would cause exception instead of invalid block size for invalid incompressible blocks - fixes bug if incompressible flag is set on end frame block size Overall this improves LZ4 decompression performance by up to 40x for small batches. Most improvements are seen for batches of size 1 with messages on the order of ~100B. We see at least 2x improvements for for batch sizes of < 10 messages, containing messages < 10kB This patch also yields 2-4x improvements on v1 small single message batches for other compression types. Full benchmark results can be found here https://gist.github.com/xvrl/05132e0643513df4adf842288be86efd Author: Xavier Léauté <xavier@confluent.io> Author: Ismael Juma <ismael@juma.me.uk> Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk> Closes #2967 from xvrl/kafka-5150	8 years ago
bbejeck	79f85039d7	KAFKA-3989; Initial support for adding a JMH benchmarking module Author: bbejeck <bbejeck@gmail.com> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk> Closes #1712 from bbejeck/KAFKA-3989_create_jmh_benchmarking_module	8 years ago

17 Commits (d2d68380175905775aac0eeaf2238a280fb77f7f)