Tree:
9da6823a3a
0.10.0
0.10.1
0.10.2
0.11.0
0.7
0.7.0
0.7.1
0.7.2
0.8
0.8.0-beta1-candidate1
0.8.1
0.8.2
0.9.0
1.0
1.1
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.x
3.0
3.1
3.2
3.3
3.4
3.5
3.6
KAFKA-14367-join-group
KAFKA-14496-fix-oauth-encoder
KAFKA-15311
add-assignor-log-generation
cmccabe_2023-04-11_improve_controller_logging
cmccabe_2023-05-10_cleanup
cmccabe_2023-06-21_some_minor_fixes
cmccabe_kip_919
hekai-study-v2.8
john-disable-12049
kafka-10867-improved-task-idling-nolog
kip-866-zk-migration-to-kraft
metashell
minor-alter-isr-scheduling
printer
repro-task-idling-problem
revert-13391-kafka-14561
temp-8436
trunk
txn1
${ noResults }
2 Commits (9da6823a3a5cef7305d72788c295ada2fd1c0c88)
Author | SHA1 | Message | Date |
---|---|---|---|
Lucas Bradstreet | 8966d066bd |
KAFKA-9039: Optimize ReplicaFetcher fetch path (#7443)
Improves the performance of the replica fetcher for high partition count fetch requests, where a majority of the partitions did not update between fetch requests. All benchmarks were run on an r5x.large. Vanilla Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 26491.825 ± 438.463 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 153941.952 ± 4337.073 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 339868.602 ± 4201.462 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2588878.448 ± 22172.482 ns/op From 100 to 5000 partitions the latency increase is 2588878.448 / 26491.825 = 97. Avoid gettimeofdaycalls in steady state fetch states 8545888 Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 22685.381 ± 267.727 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 113622.521 ± 1854.254 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 273698.740 ± 9269.554 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2189223.207 ± 1706.945 ns/op From 100 to 5000 partitions the latency increase is 2189223.207 / 22685.381 = 97X Avoid copying partition states to maintain fetch offsets 29fdd60 Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 17039.989 ± 609.355 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 99371.086 ± 1833.256 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 216071.333 ± 3714.147 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2035678.223 ± 5195.232 ns/op From 100 to 5000 partitions the latency increase is 2035678.223 / 17039.989 = 119X Keep lag alongside PartitionFetchState to avoid expensive isReplicaInSync check 0e57e3e Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 15131.684 ± 382.088 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 86813.843 ± 3346.385 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 193050.381 ± 3281.833 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1801488.513 ± 2756.355 ns/op From 100 to 5000 partitions the latency increase is 1801488.513 / 15131.684 = 119X Fetch session optimizations (mostly presizing the next hashmap, and avoiding making a copy of sessionPartitions, as a deep copy is not required for the ReplicaFetcher) 2614b24 Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 11386.203 ± 416.701 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 60820.292 ± 3163.001 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 146242.158 ± 1937.254 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1366768.926 ± 3305.712 ns/op From 100 to 5000 partitions the latency increase is 1366768.926 / 11386.203 = 120 Reviewers: Jun Rao <junrao@gmail.com>, Guozhang Wang <wangguoz@gmail.com> |
5 years ago |
Lucas Bradstreet | f3ded39a05 |
KAFKA-8841; Reduce overhead of ReplicaManager.updateFollowerFetchState (#7324)
This PR makes two changes to code in the ReplicaManager.updateFollowerFetchState path, which is in the hot path for follower fetches. Although calling ReplicaManager.updateFollowerFetch state is inexpensive on its own, it is called once for each partition every time a follower fetch occurs. 1. updateFollowerFetchState no longer calls maybeExpandIsr when the follower is already in the ISR. This avoid repeated expansion checks. 2. Partition.maybeIncrementLeaderHW is also in the hot path for ReplicaManager.updateFollowerFetchState. Partition.maybeIncrementLeaderHW calls Partition.remoteReplicas four times each iteration, and it performs a toSet conversion. maybeIncrementLeaderHW now avoids generating any intermediate collections when updating the HWM. **Benchmark results for Partition.updateFollowerFetchState on a r5.xlarge:** Old: ``` 1288.633 ±(99.9%) 1.170 ns/op [Average] (min, avg, max) = (1287.343, 1288.633, 1290.398), stdev = 1.037 CI (99.9%): [1287.463, 1289.802] (assumes normal distribution) ``` New (when follower fetch offset is updated): ``` 261.727 ±(99.9%) 0.122 ns/op [Average] (min, avg, max) = (261.565, 261.727, 261.937), stdev = 0.114 CI (99.9%): [261.605, 261.848] (assumes normal distribution) ``` New (when follower fetch offset is the same): ``` 68.484 ±(99.9%) 0.025 ns/op [Average] (min, avg, max) = (68.446, 68.484, 68.520), stdev = 0.023 CI (99.9%): [68.460, 68.509] (assumes normal distribution) ``` Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io> |
5 years ago |