src-kafka

Commit Graph

Author	SHA1	Message	Date
Jason Gustafson	f9fc53ea28	MINOR: Controller should process events without rate metrics (#7732 ) Fixes #7717, which did not actually achieve its intended effect. The event manager failed to process the new event because we disabled the rate metric, which it expected to be present. Reviewers: Ismael Juma <ismael@juma.me.uk	5 years ago
Jason Gustafson	4e431246c3	MINOR: Controller should log UpdateMetadata response errors (#7717 ) Create a controller event for handling UpdateMetadata responses and log a message when a response contains an error. Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Ismael Juma	2b2de2ddc6	MINOR: Use IntegrationTestHarness properly in BaseAdminIntegrationTest (#7705 ) The latter was previously hardcoding logDirCount instead of using the method defined in the superclass since it was unnecessarily duplicating logic. Also tweak IntegrationTestHarness and remove unnecessary method override from SaslPlainPlaintextConsumerTest. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
Jason Gustafson	6ae1af842a	KAFKA-9198; Complete purgatory operations on receiving StopReplica (#7701 ) Force completion of delayed operations when receiving a StopReplica request. In the case of a partition reassignment, we cannot rely on receiving a LeaderAndIsr request in order to complete these operations because the leader may no longer be a replica. Previously when this happened, the delayed operations were left to eventually timeout. Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk> Co-Authored-By: Kun Du <kidkun@users.noreply.github.com>	5 years ago
Ismael Juma	f98d935b3e	KAFKA-9180: Introduce BrokerMetadataCheckpointTest (#7700 ) While investigating KAFKA-9180, I noticed that we had no unit test coverage. It turns out that the behavior was correct, so we just fix the test coverage issue. Also updated .gitignore with jmh-benchmarks/generated. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
Jason Gustafson	7eafea0d57	KAFKA-9196; Update high watermark metadata after segment roll (#7695 ) When we roll a new segment, the log offset metadata tied to the high watermark may need to be updated. This is needed when the high watermark is equal to the log end offset at the time of the roll. Otherwise, we risk exposing uncommitted data early. Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Jason Gustafson	ca7a5eae89	HOTFIX: Fix infinite loop in AbstractIndex.indexSlotRangeFor (#7702 ) Fixes regression from #5378 which causing an infinite loop in `binarySearch`. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Colin Patrick McCabe	cf84f244e5	MINOR: More efficient midpoint calc for AbstractIndex (#5378 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Jason Gustafson	400185421f	KAFKA-9183; Remove redundant admin client integration testing (#7690 ) This patch creates a `BaseAdminIntegrationTest` to be the root for all integration test extensions. Most of the existing tests will only be tested in `PlaintextAdminIntegrationTest`, which extends from `BaseAdminIntegrationTest`. This should cut off about 30 minutes from the overall build time. Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Lucas Bradstreet	2bd6b6ff0f	MINOR: fix flaky ConsumerBounceTest.testClose Fixes `java.util.concurrent.ExecutionException: java.lang.AssertionError: Close finished too quickly 5999`. The close test sets a close duration in milliseconds, but measures the time taken in nanoseconds. This leads to small error due to the resolution in each, where the close is deemed to have taken too little time. When I measured the start and end with nanoTime, I found the time taken to close was `5999641566 ns (5999.6ms)` which seems close enough to be a resolution error. I've run the test 50 times and have not hit the "Close finished too quickly" issue again, whereas previously I hit a failure pretty quickly. Author: Lucas Bradstreet <lucas@confluent.io> Reviewers: Ismael Juma <ismael@juma.me.uk> Closes #7683 from lbradstreet/flaky-consumer-bounce-test	5 years ago
Bob Barrett	fecb977b25	KAFKA-8710; Allow transactional producers to bump producer epoch [KIP-360] (#7115 ) This patch implements the broker-side changes for KIP-360. It adds two new fields to InitProducerId: lastEpoch and producerId. Passing these values allows the TransactionCoordinator to safely bump a producer's epoch after some failures (such as UNKNOWN_PRODUCER_ID and INVALID_PRODUCER_ID_MAPPING). When a producer calls InitProducerId after a failure, the coordinator first checks the producer ID from the request to make sure no other producer has been started using the same transactional ID. If it is safe to continue, the coordinator checks the epoch from the request; if it matches the existing epoch, the epoch is bumped and the producer can safely continue. If it matches the previous epoch, the the current epoch is returned without bumping. Otherwise, the producer is fenced. Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>	5 years ago
Lucas Bradstreet	1675115ec1	MINOR: refactor replica last sent HW updates due to performance regression (#7671 ) This change fixes a performance regression due to follower last seen highwatermark handling introduced in `23beeea`. maybeUpdateHwAndSendResponse is expensive for brokers with high partition counts, as it requires a partition and a replica lookup for every partition being fetched. This refactor moves the last seen watermark update into the follower fetch state update where we have already looked up the partition and replica. I've seen cases where maybeUpdateHwAndSendResponse is responsible 8% of CPU usage, not including the responseCallback call that is part of it. I have benchmarked this change with `UpdateFollowerFetchStateBenchmark` and it adds 5ns of overhead to Partition.updateFollowerFetchState, which is a rounding error compared to the current overhead of maybeUpdateHwAndSendResponse. Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Rajini Sivaram	6f0008643d	KAFKA-9171: Handle ReplicaNotAvailableException during DelayedFetch (#7678 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Rajini Sivaram	f15d318aaa	MINOR: Change topic-exists log for CreateTopics from INFO to DEBUG (#7666 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Jason Gustafson	92dd337688	KAFKA-9133; Cleaner should handle log start offset larger than active segment base offset (#7662 ) This was a regression in 2.3.1. In the case of a DeleteRecords call, the log start offset may be higher than the active segment base offset. The cleaner should allow for this case gracefully. Reviewers: Jun Rao <junrao@gmail.com> Co-Authored-By: Tim Van Laer <timvlaer@users.noreply.github.com>	5 years ago
uttpal	0dcd0a2ef9	KAFKA-9016; Warn when log dir stopped serving replicas Author: uttpal <kumar.uttpal@oyorooms.com> Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Dong Lin <lindong28@gmail.com> Closes #7563 from uttpal/KAFKA-9016	5 years ago
Guozhang Wang	6df058ec15	KAFKA-8677: Simplify the best-effort network client poll to never throw exception (#7613 ) Within KafkaConsumer.poll, we have an optimization to try to send the next fetch request before returning the data in order to pipelining the fetch requests; however, this pollNoWakeup should NOT throw any exceptions, since at this point the fetch position has been updated. If an exception is thrown and the callers decide to capture and continue, those records would never be returned again, causing data loss. Also fix the flaky test itself. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Guozhang Wang	4283fd640c	MINOR: Return null in key mapping of committed (#7659 ) To be consistent with other grouping APIs, and also modified callers accordingly. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Colin Patrick McCabe	1b08fceb7a	KAFKA-9101: Create a fetch.max.bytes configuration for the broker (#7595 ) Create a fetch.max.bytes configuration for the broker as described by KIP-541. Reviewers: Gwen Shapira <gwen@confluent.io>	5 years ago
David Jacot	54f8d0c3fc	KAFKA-9150; DescribeGroup uses member assignment as metadata Author: David Jacot <djacot@confluent.io> Reviewers: Jason Gustafson <jason@confluent.io> Closes #7658 from dajac/KAFKA-9150	5 years ago
Tu V. Tran	16f1ce12e4	KAFKA-8729: Change `PartitionResponse` to include all troubling records (#7612 ) Background: Currently, whenever a batch is dropped because ofInvalidRecordException or InvalidTimestampException, only the culprit record appears in ProduceResponse.PartitionResponse.recordErrors. However, after users try to resend that batch excluding the rejected message, the latter records are not guaranteed to be free of problems. Changes: To address this issue, I changed the function signature of validateKey, validateRecord and validateTimestamp to return a Scala's Option object. Specifically, this object will hold the reason/message the current record in iteration fails and leaves to the callers (convertAndAssignOffsetsNonCompressed, assignOffsetsNonCompressed, validateMessagesAndAssignOffsetsCompressed) to gathered all troubling records into one place. Then, all these records will be returned along with the PartitionResponse object. As a result, if a batch contains more than one record errors, users see exactly which records cause the failure. PartitionResponse.recordErrors is a list of RecordError objects introduced by #7167 which include batchIndex denoting the relative position of a record in a batch and message indicating the reason of failure. Gotchas: Things are particularly tricky when a batch has records rejected because of both InvalidRecordException and InvalidTimestampException. In this case, the InvalidTimestampException takes higher precedence. Therefore, the Error field in PartitionResponse will be encoded with INVALID_TIMESTAMP. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Jason Gustafson	82137ba52e	MINOR: Fetch only from leader should be respected in purgatory (#7650 ) In #7361, we inadvertently reverted a change to enforce leader only fetching for old versions of the protocol. This patch fixes the problem and adds a new test case to cover fetches which hit purgatory. Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, David Arthur <mumrah@gmail.com>	5 years ago
Lucas Bradstreet	a4cbdc6a7b	KAFKA-9137: Fix incorrect FetchSessionCache eviction logic (#7640 ) Fix a bug where the lastUsedMs value in the FetchSessionCache was not getting correctly updated, resulting in spurious evictions. Reviewers: Colin P. McCabe <cmccabe@apache.org>	5 years ago
Stanislav Kozlovski	be58580e14	MINOR: Rework NewPartitionReassignment public API (#7638 ) This patch removes the NewPartitionReassignment#of() method in favor of a simple constructor. Said method was confusing due to breaking two conventions - always returning a non-empty Optional and thus not being used as a static factory method. Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>	5 years ago
Ismael Juma	c552c06aed	KAFKA-9110: Improve efficiency of disk reads when TLS is enabled (#7604 ) 1. Avoid a buffer allocation and a buffer copy per file read. 2. Ensure we flush `netWriteBuffer` successfully before reading from disk to avoid wasted disk reads. 3. 32k reads instead of 8k reads to reduce the number of disk reads (improves efficiency for magnetic drives and reduces the number of system calls). 4. Update SslTransportLayer.write(ByteBuffer) to loop until the socket buffer is full or the src buffer has no remaining bytes. 5. Renamed `MappedByteBuffers` to `ByteBufferUnmapper` since it's also applicable for direct byte buffers. 6. Skip empty `RecordsSend` 7. Some minor clean-ups for readability. I ran a simple consumer perf benchmark on a 6 partition topic (large enough not to fit into page cache) starting from the beginning of the log with TLS enabled on my 6 core MacBook Pro as a sanity check. This laptop has fast SSDs so it benefits less from the larger reads than the case where magnetic disks are used. Consumer throughput was ~260 MB/s before the changes and ~300 MB/s after (~15% improvement). Credit to @junrao for pointing out that this code could be more efficient. Reviewers: Jun Rao <junrao@confluent.io>, Colin P. McCabe <cmccabe@apache.org>	5 years ago
Tu Tran	f5a4519be9	KAFKA-9080: Revert the check added to validate non-compressed record batch does have continuous incremental offsets #7167 added a check for non-incremental offsets in `assignOffsetsNonCompressed`, which is not applicable for message format V0 and V1. Therefore, I added a condition to disable the check if the record version precedes V2. Author: Tu Tran <tu@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7628 from tuvtran/KAFKA-9080	5 years ago
Stanislav Kozlovski	72282ed198	MINOR: Correctly mark offset expiry in GroupMetadataManager's OffsetExpired metric We would mistakenly increment the `OffsetCommits` metric instead Author: Stanislav Kozlovski <stanislav_kozlovski@outlook.com> Reviewers: David Jacot <djacot@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7624 from stanislavkozlovski/minor-fix-group-coordinator-offset-expiry-metric	5 years ago
José Armando García Sancio	0dbf95b1c1	MINOR: Fix documentation for updateCurrentReassignment (#7611 ) The function KafkaController.updateCurrentReassignment doesn't return any value. Fix the documentation to reflect that.	5 years ago
Stanislav Kozlovski	2dba8803f7	MINOR: Preserve backwards-compatibility by renaming the AlterPartitionReassignment metric to PartitionReassignment In `18d4e57f6e (diff-394389922df5210adf43a8b7064cc4ffL61)` we unintentionally renamed the metric with the previous changes to reassignments Author: Stanislav Kozlovski <stanislav_kozlovski@outlook.com> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7606 from stanislavkozlovski/minor-partition-reassignment-metric	5 years ago
Guozhang Wang	59a75f4422	KAFKA-9048 Pt1: Remove Unnecessary lookup in Fetch Building (#7576 ) Get rid of partitionStates that creates a new PartitionState for each state since all the callers do not require it to be a Seq. Modify ReplicaFetcherThread constructor to fix the broken benchmark path. This PR: Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 9280.953 ± 55.967 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 61533.546 ± 1213.559 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 151306.146 ± 1820.222 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1138547.929 ± 45301.938 ns/op Trunk: Benchmark (partitionCount) Mode Cnt Score Error Units \| \| \| \| \| -- \| -- \| -- \| -- \| -- \| -- ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 9305.588 ± 51.886 ns/op \| \| \| \| \| ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 65216.933 ± 939.827 ns/op \| \| \| \| \| ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 151715.514 ± 1361.009 ns/op \| \| \| \| \| ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1231958.103 ± 94 Reviewers: Jason Gustafson <jason@confluent.io>, Lucas Bradstreet <lucasbradstreet@gmail.com>	5 years ago
Jason Gustafson	4bde9bb3cc	KAFKA-9102; Increase default zk session timeout and replica max lag [KIP-537] (#7596 ) This patch increases the default value of `zookeeper.session.timeout` from 6s to 18s and `replica.lag.time.max.ms` from 10s to 30s. This change was documented in KIP-537: https://cwiki.apache.org/confluence/display/KAFKA/KIP-537%3A+Increase+default+zookeeper+session+timeout. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Bob Barrett	c5df208281	KAFKA-9105; Add back truncateHead method to ProducerStateManager (#7599 ) The truncateHead method was removed from ProducerStateManager by github.com/apache/kafka/commit/c49775b. This meant that snapshots were no longer removed when the log start offset increased, even though the intent of that change was to remove snapshots but preserve the in-memory mapping. This patch adds the required functionality back. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Boyang Chen	77fc498889	KAFKA-8992; Redefine RemoveMembersFromGroup interface on AdminClient (#7478 ) This PR fixes the inconsistency involved in the `removeMembersFromGroup` admin API calls: 1. Fail the `all()` request when there is sub level error (either partition or member) 2. Change getMembers() to members() 3. Hide the actual Errors from user 4. Do not expose generated MemberIdentity type 5. Use more consistent naming for Options and Result types Reviewers: Guozhang Wang <wangguoz@gmail.com>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>	5 years ago
NIkhil Bhatia	43262d358a	KAFKA-9038; Allow creating partitions while a reassignment is in progress (#7582 ) Prior to this patch, partition creation would not be allowed for any topic while a reassignment is in progress. The PR makes this a topic level check. As long as a particular topic is not being reassigned, we allow partitions to be increased. Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Jason Gustafson	1df01d2583	KAFKA-9089; Reassignment should be resilient to unexpected errors (#7562 ) The purpose of this patch is to make the reassignment algorithm simpler and more resilient to unexpected errors. Specifically, it has the following improvements: 1. Remove `ReassignedPartitionContext`. We no longer need to track the previous reassignment through the context and we now use the assignment state as the single source of truth for the target replicas in a reassignment. 2. Remove the intermediate assignment state when overriding a previous reassignment. Instead, an overriding reassignment directly updates the assignment state and shuts down any unneeded replicas. Reassignments are _always_ persisted in Zookeeper before being updated in the controller context. 3. To fix race conditions with concurrent submissions, reassignment completion for a partition always checks for a zk partition reassignment to be removed. This means the controller no longer needs to track the source of the reassignment. 4. Api reassignments explicitly remove reassignment state from zk prior to beginning the new reassignment. This fixes an inconsistency in precedence. Upon controller failover, zookeeper reassignments always take precedence over any active reassignment. So if we do not have the logic to remove the zk reassignment when an api reassignment is triggered, then we can revert to the older zk reassignment. Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jun Rao <junrao@gmail.com>	5 years ago
Stanislav Kozlovski	28ef7f1d6d	MINOR: Re-implement NewPartitionReassignment#of() (#7592 ) Re-implement NewPartitionReassignment#of. It now takes a list rather than a variable-length list of arguments. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Vikas Singh <vikas@confluent.io>	5 years ago
Jason Gustafson	0971f66ff5	KAFKA-9056; Inbound/outbound byte metrics should reflect incomplete sends/receives (#7551 ) Currently we only record completed sends and receives in the selector metrics. If there is a disconnect in the middle of the respective operation, then it is not counted. The metrics will be more accurate if we take into account partial sends and receives. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com	5 years ago
Stanislav Kozlovski	2efee34b74	MINOR: Check against empty replicas in AlterPartitionReassignments (#7574 ) Do not allow an empty replica set to be passed into the reassignment API. Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@gmail.com>	5 years ago
Mickael Maison	99a4068c5c	KAFKA-7689; Add AlterConsumerGroup/List Offsets to AdminClient [KIP-396] (#7296 ) This patch implements new AdminClient APIs to list offsets and alter consumer group offsets as documented in KIP-396: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Viktor Somogyi	8f639eb79c	KAFKA-9041; Flaky Test LogCleanerIntegrationTest#testIsThreadFailed (#7542 ) Aims to fix the flaky LogCleanerIntegrationTest#testIsThreadFailed by changing how metrics are cleaned. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Dhruvil Shah	317089663c	KAFKA-8962; Use least loaded node for AdminClient#describeTopics (#7421 ) Allow routing of `AdminClient#describeTopics` to any broker in the cluster than just the controller, so that we don't create a hotspot for this API call. `AdminClient#describeTopics` uses the broker's metadata cache which is asynchronously maintained, so routing to brokers other than the controller is not expected to have a significant difference in terms of metadata consistency; all metadata requests are eventually consistent. This patch also fixes a few flaky test failures. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@gmail.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Viktor Somogyi	6c0aed2fbe	KAFKA-8834; Add reassignment metrics and distinguish URPs (KIP-352) (#7361 ) KIP-352 aims to add several new metrics in order to track reassignments much better. We will be able to measure bytes in/out rate and the count of partitions under active reassignment. We also change the semantic of the UnderReplicatedPartitions metric to cater better for reassignment. Currently it reports under-replicated partitions when during reassignment extra partitions are added as part of the process but this PR changes it so it'll always take the original replica set into account when computing the URP metrics. The newly added metrics will be: - kafka.server:type=ReplicaManager,name=ReassigningPartitions - kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesOutPerSec - kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSec The changed URP metric: - kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>	5 years ago
David Arthur	d9f0be4523	KAFKA-9004; Prevent older clients from fetching from a follower (#7531 ) With KIP-392, we allow consumers to fetch from followers. This capability is enabled when a replica selector has been provided in the configuration. When not in use, the intent is to preserve current behavior of fetching only from leader. The leader epoch is the mechanism that keeps us honest. When there is a leader change, the epoch gets bumped, consumer fetches fail due to the fenced epoch, and we find the new leader. However, for old consumers, there is no similar protection. The leader epoch was not available to clients until recently. If there is a preferred leader election (for example), the old consumer will happily continue fetching from the demoted leader until a periodic metadata fetch causes us to discover the new leader. This does not create any problems from a correctness perspective–fetches are still bound by the high watermark–but it is unexpected and may cause unexpected performance characteristics. This patch fixes this problem by enforcing leader-only fetching for older versions of the fetch request. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Colin Patrick McCabe	3cb8ccf63a	MINOR: AbstractRequestResponse should be an interface (#7513 ) AbstractRequestResponse should be an interface, since it has no concrete elements or implementation. Move AbstractRequestResponse#serialize to RequestUtils#serialize and make it package-private, since it doesn't need to be public. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Lucas Bradstreet	8966d066bd	KAFKA-9039: Optimize ReplicaFetcher fetch path (#7443 ) Improves the performance of the replica fetcher for high partition count fetch requests, where a majority of the partitions did not update between fetch requests. All benchmarks were run on an r5x.large. Vanilla Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 26491.825 ± 438.463 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 153941.952 ± 4337.073 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 339868.602 ± 4201.462 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2588878.448 ± 22172.482 ns/op From 100 to 5000 partitions the latency increase is 2588878.448 / 26491.825 = 97. Avoid gettimeofdaycalls in steady state fetch states `8545888` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 22685.381 ± 267.727 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 113622.521 ± 1854.254 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 273698.740 ± 9269.554 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2189223.207 ± 1706.945 ns/op From 100 to 5000 partitions the latency increase is 2189223.207 / 22685.381 = 97X Avoid copying partition states to maintain fetch offsets `29fdd60` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 17039.989 ± 609.355 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 99371.086 ± 1833.256 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 216071.333 ± 3714.147 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2035678.223 ± 5195.232 ns/op From 100 to 5000 partitions the latency increase is 2035678.223 / 17039.989 = 119X Keep lag alongside PartitionFetchState to avoid expensive isReplicaInSync check `0e57e3e` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 15131.684 ± 382.088 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 86813.843 ± 3346.385 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 193050.381 ± 3281.833 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1801488.513 ± 2756.355 ns/op From 100 to 5000 partitions the latency increase is 1801488.513 / 15131.684 = 119X Fetch session optimizations (mostly presizing the next hashmap, and avoiding making a copy of sessionPartitions, as a deep copy is not required for the ReplicaFetcher) `2614b24` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 11386.203 ± 416.701 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 60820.292 ± 3163.001 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 146242.158 ± 1937.254 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1366768.926 ± 3305.712 ns/op From 100 to 5000 partitions the latency increase is 1366768.926 / 11386.203 = 120 Reviewers: Jun Rao <junrao@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Vikas Singh	176c93412e	KAFKA-8813: Refresh log config if it's updated before initialization (#7305 ) A partition log in initialized in following steps: 1. Fetch log config from ZK 2. Call LogManager.getOrCreateLog which creates the Log object, then 3. Registers the Log object Step #3 enables Configuration update thread to deliver configuration updates to the log. But if any update arrives between step #1 and #3 then that update is missed. It breaks following use case: 1. Create a topic with default configuration, and immediately after that 2. Update the configuration of topic There is a race condition here and in random cases update made in second step will get dropped. This change fixes it by tracking updates arriving between step #1 and #3 Once a Partition is done initializing log, it checks if it has missed any update. If yes, then the configuration is read from ZK again. Added unit tests to make sure a dirty configuration is refreshed. Tested on local cluster to make sure that topic configuration and updates are handled correctly. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Stanislav Kozlovski	f771005f40	KAFKA-8725; Improve LogCleanerManager#grabFilthiestLog error handling (#7475 ) KAFKA-7215 improved the log cleaner error handling to mitigate thread death but missed one case. Exceptions in grabFilthiestCompactedLog still cause the thread to die. This patch improves handling to ensure that errors in that function still mark a partition as uncleanable and do not crash the thread. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Viktor Somogyi	b8d609c207	KAFKA-7981; Add fetcher and log cleaner thread count metrics (#6514 ) This patch adds metrics for failed threads as documented in KIP-434: https://cwiki.apache.org/confluence/display/KAFKA/KIP-434%3A+Add+Replica+Fetcher+and+Log+Cleaner+Count+Metrics. Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Ismael Juma	ac937706be	MINOR: Improve assert in testCreateTopicsResponseMetadataAndConfig (#7484 ) It's much easier to debug when one can see the config names than one can only see a number. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
David Jacot	fa2a9f09e4	MINOR: ListPartitionReassignmentsResponse should not be entirely failed when a topic-partition does not exist (#7486 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	5 years ago

... 3 4 5 6 7 ...

2981 Commits (aa1b3c1107f53638ec2a4a6c2f06a4626545fad3)