src-kafka

Commit Graph

Author	SHA1	Message	Date
David Arthur	339d2556c6	KAFKA-15605: Fix topic deletion handling during ZK migration (#14545 ) This patch adds reconciliation logic to migrating ZK brokers to deal with pending topic deletions as well as missed StopReplicas. During the hybrid mode of the ZK migration, the KRaft controller is asynchronously sending UMR and LISR to the ZK brokers to propagate metadata. Since this process is essentially "best effort" it is possible for a broker to miss a StopReplicas. The new logic lets the ZK broker examine its local logs compared with the full set of replicas in a "Full" LISR. Any local logs which are not present in the set of replicas in the request are removed from ReplicaManager and marked as "stray". To avoid inadvertent data loss with this new behavior, the brokers do not delete the "stray" partitions. They will rename the directories and log warning messages during log recovery. It will be up to the operator to manually delete the stray partitions. We can possibly enhance this in the future to clean up old stray logs. This patch makes use of the previously unused Type field on LeaderAndIsrRequest. This was added as part of KIP-516 but never implemented. Since its introduction, an implicit 0 was sent in all LISR. The KRaft controller will now send a value of 2 to indicate a full LISR (as specified by the KIP). The presence of this value acts as a trigger for the ZK broker to perform the log reconciliation. Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
Calvin Liu	af747fbfed	KAFKA-15581: Introduce ELR (#14312 ) This patch introduces preliminary changes for Eligible Leader Replicas (KIP-966) * New MetadataVersion 16 (3.7-IV1) * New record versions for PartitionRecord and PartitionChangeRecord * New tagged fields on PartitionRecord and PartitionChangeRecord * New static config "eligible.leader.replicas.enable" to gate the whole feature Reviewers: Artem Livshits <alivshits@confluent.io>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	1 year ago
Calvin Liu	14029e2ddd	KAFKA-15582: Identify clean shutdown broker (#14465 ) The PR includes: * Added a new class of CleanShutdownFile which helps write and read from a clean shutdown file. * Updated the BrokerRegistration API. * Client side handling for the broker epoch. * Minimum work on the controller side. Reviewers: Jun Rao <junrao@gmail.com>	1 year ago
mannoopj	da314ee48c	KAFKA-15532: non active controllers return 0 for ZkWriteBeforelag (#14478 ) Since only the active controller is performing the dual-write to ZK during a migration, it should be the only controller to report the ZkWriteBehindLag metric. Currently, if the controller fails over during a migration, the previous active controller will incorrectly report its last value for ZkWriteBehindLag forever. Instead, it should report zero. Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>	1 year ago
Federico Valeri	aec07f76d7	KAFKA-15537: Fix metadata downgrade documentation (#14484 ) In KIP-778 we introduced the "unsafe" (lossy) downgrade in case metadata has changes in one of the versions between target and current, as defined in MetadataVersion. The documentation says it is possible: "Note that the cluster metadata version cannot be downgraded to a pre-production 3.0.x, 3.1.x, or 3.2.x version once it has been upgraded. However, it is possible to downgrade to production versions such as 3.3-IV0, 3.3-IV1, etc." The command line tool shows that this doesn't work: bin/kafka-features.sh --bootstrap-server :9092 downgrade --metadata 3.4 --unsafe Could not downgrade metadata.version to 8. Invalid metadata.version 8. Unsafe metadata downgrade is not supported in this version. 1 out of 1 operation(s) failed. In addition to unsafe, also safe metadata downgrades are not supported in practice. For example, when you upgrade to 3.5, you land on 3.5-IV2 as metadata version, which has metadata changes and won't let you to downgrade. This is true for every other release at the moment. This change fixes the documentation to reflect that, and improves the error messages. Signed-off-by: Federico Valeri <fedevaleri@gmail.com> Reviewers: Luke Chen <showuon@gmail.com>, Jakub Scholz <github@scholzj.com>	1 year ago
Ritika Reddy	bcfc9543d1	MINOR: Move TopicIdPartition class to server-common (#14418 ) This patch moves the TopicIdPartition from the metadata module to the server-common module so it can be used by the group-coordinator module as well. Reviewers: Sagar Rao <sagarmeansocean@gmail.com>, David Jacot <djacot@confluent.io>	1 year ago
Ismael Juma	98febb989a	KAFKA-15485: Fix "this-escape" compiler warnings introduced by JDK 21 (1/N) (#14427 ) This is one of the steps required for kafka to compile with Java 21. For each case, one of the following fixes were applied: 1. Suppress warning if fixing would potentially result in an incompatible change (for public classes) 2. Add final to one or more methods so that the escape is not possible 3. Replace method calls with direct field access. In addition, we also fix a couple of compiler warnings related to deprecated references in the `core` module. See the following for more details regarding the new lint warning: https://www.oracle.com/java/technologies/javase/21-relnote-issues.html#JDK-8015831 Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Chris Egerton <chrise@aiven.io>	1 year ago
Alyssa Huang	2d262efb00	[MINOR] QuorumController tests use testToImage (#14405 )	1 year ago
Colin Patrick McCabe	7d45d849f8	KAFKA-15458: Fully resolve endpoint information before registering controllers (#14376 ) Endpoint information provided by KafkaConfig can be incomplete in two ways. One is that endpoints using ephemeral ports will show up as using port 0. Another is that endpoints binding to 0.0.0.0 will show up with a null or blank hostname. Because we were not accounting for this in controller registration, it was leading to a null pointer dereference when we tried to register a controller using an endpoint defined as PLAINTEXT://:9092. This PR adds a ListenerInfo class which can fix both of the causes of incomplete endpoint information. It also handles serialization to and from various RPC and record formats. This allows us to remove a lot of boilerplate code and standardize the handling of listeners between BrokerServer and ControllerServer. Reviewers: David Arthur <mumrah@gmail.com>	1 year ago
David Arthur	b24ccd65b7	KAFKA-15441 Allow broker heartbeats to complete in metadata transaction (#14351 ) This patch allows broker heartbeat events to be completed while a metadata transaction is in-flight. More generally, this patch allows any RUNS_IN_PREMIGRATION event to complete while the controller is in pre-migration mode even if the migration transaction is in-flight. We had a problem with broker heartbeats timing out because they could not be completed while a large ZK migration transaction was in-flight. This resulted in the controller fencing all the ZK brokers which has many undesirable downstream effects. Reviewers: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>, Colin Patrick McCabe <cmccabe@apache.org>	1 year ago
Colin Patrick McCabe	41b695b6e3	KAFKA-15369: Implement KIP-919: Allow AC to Talk Directly with Controllers (#14306 ) Implement KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration. This KIP adds a new version of DescribeClusterRequest which is supported by KRaft controllers. It also teaches AdminClient how to use this new DESCRIBE_CLUSTER request to talk directly with the controller quorum. This is all gated behind a new MetadataVersion, IBP_3_7_IV0. In order to share the DESCRIBE_CLUSTER logic between broker and controller, this PR factors it out into AuthHelper.computeDescribeClusterResponse. The KIP adds three new errors codes: MISMATCHED_ENDPOINT_TYPE, UNSUPPORTED_ENDPOINT_TYPE, and UNKNOWN_CONTROLLER_ID. The endpoint type errors can be returned from DescribeClusterRequest On the controller side, the controllers now try to register themselves with the current active controller, by sending a CONTROLLER_REGISTRATION request. This, in turn, is converted into a RegisterControllerRecord by the active controller. ClusterImage, ClusterDelta, and all other associated classes have been upgraded to propagate the new metadata. In the metadata shell, the cluster directory now contains both broker and controller subdirectories. QuorumFeatures previously had a reference to the ApiVersions structure used by the controller's NetworkClient. Because this PR removes that reference, QuorumFeatures now contains only immutable data. Specifically, it contains the current node ID, the locally supported features, and the list of quorum node IDs in the cluster. Reviewers: David Arthur <mumrah@gmail.com>, Ziming Deng <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>	1 year ago
David Arthur	65e2ecffab	KAFKA-15435 Fix counts in MigrationManifest (#14342 ) Reviewers: Liu Zeyu <zeyu.luke@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	1 year ago
Dimitar Dimitrov	78c59cd2b0	KAFKA-15052 Fix the flaky testBalancePartitionLeaders - part II (#13908 ) A follow-up to https://github.com/apache/kafka/pull/13804. This follow-up adds the alternative fix approach mentioned in the PR above - bumping the session timeout used in the test with 1 second. Reproducing the flake-out locally has been much harder than on the CI runs, as neither Gradle with Java 11 or Java 14 nor IntelliJ with Java 14 could show it, but IntelliJ with Java 11 could occasionally reproduce the failure the first time immediately after a rebuild. While I was unable to see the failure with the bumped session timeout, the testing procedure definitely didn't provide sufficient reassurance for the fix as even without it often I'd see hundreds of consecutive successful test runs when the first run didn't fail. Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>	1 year ago
David Arthur	f2d499e25a	KAFKA-15389: Don't publish until we have replayed at least one record (#14282 ) When starting up a controller for the first time (i.e., with an empty log), it is possible for MetadataLoader to publish an empty MetadataImage before the activation records of the controller have been written. While this is not a bug, it could be confusing. This patch closes that gap by waiting for at least one controller record to be committed before the MetadataLoader starts publishing images. Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
Phuc-Hong-Tran	8d12c1175c	KAFKA-15152: Fix incorrect format specifiers when formatting string (#14026 ) Reviewers: Divij Vaidya <diviv@amazon.com> Co-authored-by: phuchong.tran <phuchong.tran@servicenow.com>	1 year ago
Ron Dagostino	8394ddc0d2	MINOR: Move delegation token support to Metadata Version 3.6-IV2 (#14270 ) #14083 added support for delegation tokens in KRaft and attached that support to the existing MetadataVersion 3.6-IV1. This patch moves that support into a separate MetadataVersion 3.6-IV2. Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
David Arthur	418b8a6e59	KAFKA-14538 Metadata transactions in MetadataLoader (#14208 ) This PR contains three main changes: - Support for transactions in MetadataLoader - Abort in-progress transaction during controller failover - Utilize transactions for ZK to KRaft migration A new MetadataBatchLoader class is added to decouple the loading of record batches from the publishing of metadata in MetadataLoader. Since a transaction can span across multiple batches (or multiple transactions could exist within one batch), some buffering of metadata updates was needed before publishing out to the MetadataPublishers. MetadataBatchLoader accumulates changes into a MetadataDelta, and uses a callback to publish to the publishers when needed. One small oddity with this approach is that since we can "splitting" batches in some cases, the number of bytes returned in the LogDeltaManifest has new semantics. The number of bytes included in a batch is now only included in the last metadata update that is published as a result of a batch. Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
Proven Provenzano	c2759df067	KAFKA-15219: KRaft support for DelegationTokens (#14083 ) Reviewers: David Arthur <mumrah@gmail.com>, Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Viktor Somogyi <viktor.somogyi@cloudera.com>	1 year ago
Colin Patrick McCabe	adc16d0f31	KAFKA-14538: Implement KRaft metadata transactions in QuorumController Implement the QuorumController side of KRaft metadata transactions. As specified in KIP-868, this PR creates a new metadata version, IBP_3_6_IV1, which contains the three new records: AbortTransactionRecord, BeginTransactionRecord, EndTransactionRecord. In order to make offset management unit-testable, this PR moves it out of QuorumController.java and into OffsetControlManager.java. The general approach here is to track the "last stable offset," which is calculated by looking at the latest committed offset and the in-progress transaction (if any). When a transaction is aborted, we revert back to this last stable offset. We also revert back to it when the controller is transitioning from active to inactive. In a follow-up PR, we will add support for the transaction records in MetadataLoader. We will also add support for automatically aborting pending transactions after a controller failover. Reviewers: David Arthur <mumrah@gmail.com>	1 year ago
Colin Patrick McCabe	9318b591d7	KAFKA-15318: Update the Authorizer via AclPublisher (#14169 ) On the controller, move publishing acls to the Authorizer into a dedicated MetadataPublisher, AclPublisher. This publisher listens for notifications from MetadataLoader, and receives only committed data. This brings the controller side in line with how the broker has always worked. It also avoids some ugly code related to publishing directly from the QuorumController. Most important of all, it clears the way to implement metadata transactions without worrying about Authorizer state (since it will be handled by the MetadataLoader, along with other metadata image state). In AclsDelta, we can remove isSnapshotDelta. We always know when the MetadataLoader is giving us a snapshot. Also bring AclsDelta in line with the other delta classes, where completeSnapshot calculates the diff between the previous image and the next one. We don't use this delta (since we just apply the image directly to the authorizer) but we should have it, for consistency. Finally, change MockAclMutator to avoid the need to subclass AclControlManager. Reviewers: David Arthur <mumrah@gmail.com>	1 year ago
David Arthur	32c39c8149	KAFKA-15263 Check KRaftMigrationDriver state in each event (#14115 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
Colin Patrick McCabe	10bcd4fc7f	KAFKA-15213: provide the exact offset to QuorumController.replay (#13643 ) Provide the exact record offset to QuorumController.replay() in all cases. There are several situations where this is useful, such as logging, implementing metadata transactions, or handling broker registration records. In the case where the QC is inactive, and simply replaying records, it is easy to compute the exact record offset from the batch base offset and the record index. The active QC case is more difficult. Technically, when we submit records to the Raft layer, it can choose a batch base offset later than the one we expect, if someone else is also adding records. While the QC is the only entity submitting data records, control records may be added at any time. In the current implementation, these are really only used for leadership elections. However, this could change with the addition of quorum reconfiguration or similar features. Therefore, this PR allows the QC to tell the Raft layer that a record append should fail if it would have resulted in a batch base offset other than what was expected. This in turn will trigger a controller failover. In the future, if automatically added control records become more common, we may wish to have a more sophisticated system than this simple optimistic concurrency mechanism. But for now, this will allow us to rely on the offset as correct. In order that the active QC can learn what offset to start writing at, the PR also adds a new RaftClient#endOffset function. At the Raft level, this PR adds a new exception, UnexpectedBaseOffsetException. This gets thrown when we request a base offset that doesn't match the one the Raft layer would have given us. Although this exception should cause a failover, it should not be considered a fault. This complicated the exception handling a bit and motivated splitting more of it out into the new EventHandlerExceptionInfo class. This will also let us unit test things like slf4j log messages a bit better. Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>	1 year ago
David Arthur	a900794ace	KAFKA-15196 Additional ZK migration metrics (#14028 ) This patch adds several metrics defined in KIP-866: * MigratingZkBrokerCount: the number of zk brokers registered with KRaft * ZkWriteDeltaTimeMs: time spent writing MetadataDelta to ZK * ZkWriteSnapshotTimeMs: time spent writing MetadataImage to ZK * Adds value 4 for "ZK" to ZkMigrationState Also fixes a typo in the metric name introduced in #14009 (ZKWriteBehindLag -> ZkWriteBehindLag) Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	1 year ago
David Arthur	e794bc719a	MINOR: Add a Builder for KRaftMigrationDriver (#14062 ) Reviewers: Justine Olshan <jolshan@confluent.io>	1 year ago
Colin Patrick McCabe	c7de30f38b	KAFKA-15183: Add more controller, loader, snapshot emitter metrics (#14010 ) Implement some of the metrics from KIP-938: Add more metrics for measuring KRaft performance. Add these metrics to QuorumControllerMetrics: kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCount kafka.controller:type=KafkaController,name=EventQueueOperationsStartedCount kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCount kafka.controller:type=KafkaController,name=NewActiveControllersCount Create LoaderMetrics with these new metrics: kafka.server:type=MetadataLoader,name=CurrentMetadataVersion kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCount Create SnapshotEmitterMetrics with these new metrics: kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedBytes kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedAgeMs Reviewers: Ron Dagostino <rndgstn@gmail.com>	1 year ago
Owen Leung	4981fa939d	KAFKA-14712: Produce correct error msg with correct metadataversion (#13773 ) Fix the confusing error message in ImageWriterOptions Reviewers: Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>	1 year ago
Hailey Ni	9e50f7cdd3	MINOR: Add ZK dual-write lag metric (#14009 ) This patch adds ZKWriteBehindLag metric to the KafkaController mbean as specified in KIP-866 Reviewers: David Arthur <mumrah@gmail.com>	1 year ago
David Arthur	d9253fed5c	MINOR Improve logging during the ZK to KRaft migration (#14008 ) * Adds an exponential backoff to 1m while the controller is waiting for brokers to show up * Increases one-time logs to INFO * Adds a summary of the migration records * Use RecordRedactor for summary of migration batches (TRACE only) Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
David Arthur	c84ac00609	Fix compile test error (#14013 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
Colin Patrick McCabe	959f9ca4c0	MINOR: Standardize controller log4j output for replaying records (#13703 ) Standardize controller log4j output for replaying important records. The log message should include word "replayed" to make it clear that this is a record replay. Log the replay of records for ACLs, client quotas, and producer IDs, which were previously not logged. Also fix a case where we weren't logging changes to broker registrations. AclControlManager, ClientQuotaControlManager, and ProducerIdControlManager didn't previously have a log4j logger object, so this PR adds one. It also converts them to using Builder objects. This makes junit tests more readable because we don't need to specify paramaters where the test can use the default (like LogContexts). Throw an exception in replay if we get another TopicRecord for a topic which already exists. Example log messages: INFO [QuorumController id=3000] Replayed a FeatureLevelRecord setting metadata version to 3.6-IV0 DEBUG [QuorumController id=3000] Replayed a ZkMigrationStateRecord which did not alter the state from NONE. INFO [QuorumController id=3000] Replayed BrokerRegistrationChangeRecord modifying the registration for broker 0: BrokerRegistrationChangeRecord(brokerId=0, brokerEpoch=3, fenced=-1, inControlledShutdown=0) INFO [QuorumController id=3000] Replayed ClientQuotaRecord for ClientQuotaEntity(entries={user=testkit}) setting request_percentage to 0.99. Reviewers: Divij Vaidya <diviv@amazon.com>, Ron Dagostino <rndgstn@gmail.com>, David Arthur <mumrah@gmail.com>	1 year ago
Ron Dagostino	edd64fa251	MINOR: more KRaft Metadata Image tests (#13724 ) Adds additional testing for the various KRaft *Image classes. For every image that we create we already test that we can get there by applying all the records corresponding to that image written out as a list of records. This patch adds more tests to confirm that we can get to each such final image with intermediate stops at all possible intermediate images along the way. Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>	1 year ago
David Arthur	726d277c0a	MINOR: Move some things around in KRaftMigrationDriver (#13978 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
andymg3	1223b79973	KAFKA-15149: Fix handling of new partitions in dual-write mode (#13968 ) Fixes a bug where we don't send UMR and LISR requests in dual-write mode when new partitions are created. Prior to this patch, KRaftMigrationZkWriter was mutating the internal data-structures of TopicDelta which prevented MigrationPropagator from sending UMR and LISR for the changed partitions. Reviewers: David Arthur <mumrah@gmail.com>	1 year ago
David Arthur	fc7d912e8b	KAFKA-15109 Ensure the leader epoch bump occurs for older MetadataVersions (#13910 ) This fixes a regression introduced by the previous KAFKA-15109 commit (`d0457f7360` on trunk). Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@apache.org>	1 year ago
David Arthur	1bf7039999	KAFKA-15098 Allow authorizers to be configured in ZK migration (#13895 ) Reviewers: Ron Dagostino <rdagostino@confluent.io>	1 year ago
David Arthur	d0457f7360	KAFKA-15109 Don't skip leader epoch bump while in migration mode (#13890 ) While in migration mode, the KRaft controller must always bump the leader epoch when shrinking an ISR. This is required to maintain compatibility with the ZK brokers. Without the epoch bump, the ZK brokers will ignore the partition state change present in the LeaderAndIsrRequest since it would not contain a new leader epoch. Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
minjian.cai	ba5e1acdfb	MINOR: fix typos for metadata (#13889 ) Reviewers: Divij Vaidya <diviv@amazon.com>, Deqi Hu <deqi.hu@shopee.com>	1 year ago
Colin P. McCabe	cd3c0ab1a3	KAFKA-15060: fix the ApiVersionManager interface This PR expands the scope of ApiVersionManager a bit to include returning the current MetadataVersion and features that are in effect. This is useful in general because that information needs to be returned in an ApiVersionsResponse. It also allows us to fix the ApiVersionManager interface so that all subclasses implement all methods of the interface. Having subclasses that don't implement some methods is dangerous because they could cause exceptions at runtime in unexpected scenarios. On the KRaft controller, we were previously performing a read operation in the QuorumController thread to get the current metadata version and features. With this PR, we now read a volatile variable maintained by a separate MetadataVersionContextPublisher object. This will improve performance and simplify the code. It should not change the guarantees we are providing; in both the old and new scenarios, we need to be robust against version skew scenarios during updates. Add a Features class which just has a 3-tuple of metadata version, features, and feature epoch. Remove MetadataCache.FinalizedFeaturesAndEpoch, since it just duplicates the Features class. (There are some additional feature-related classes that can be consolidated in in a follow-on PR.) Create a java class, EndpointReadyFutures, for managing the futures associated with individual authorizer endpoints. This avoids code duplication between ControllerServer and BrokerServer and makes this code unit-testable. Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>	1 year ago
Luke Chen	d3e0b27b24	KAFKA-15040: trigger onLeadershipChange under KRaft mode (#13807 ) When received LeaderAndIsr request, we'll notify remoteLogManager about this leadership changed to trigger the following workflow. But LeaderAndIsr won't be sent in KRaft mode, instead, the topicDelta will be received. This PR fixes this issue by getting leader change and follower change from topicDelta, and triggering rlm.onLeadershipChange to notify remote log manager. Adding tests for remote storage enabled cases. Reviewers: Satish Duggana <satishd@apache.org>	1 year ago
José Armando García Sancio	8ad0ed3e61	KAFKA-15021; Skip leader epoch bump on ISR shrink (#13765 ) When the KRaft controller removes a replica from the ISR because of the controlled shutdown there is no need for the leader epoch to be increased by the KRaft controller. This is accurate as long as the topic partition leader doesn't add the removed replica back to the ISR. This change also fixes a bug when computing the HWM. When computing the HWM, replicas that are not eligible to join the ISR but are caught up should not be included in the computation. Otherwise, the HWM will never increase for replica.lag.time.max.ms because the shutting down replica is not sending FETCH request. Without this additional fix PRODUCE requests would timeout if the request timeout is greater than replica.lag.time.max.ms. Because of the bug above the KRaft controller needs to check the MV to guarantee that all brokers support this bug fix before skipping the leader epoch bump. Reviewers: David Mao <47232755+splett2@users.noreply.github.com>, Divij Vaidya <diviv@amazon.com>, David Jacot <djacot@confluent.io>	1 year ago
andymg3	db9d845702	KAFKA-14791; Create a builder for PartitionRegistration (#13788 ) This creates a builder for PartitionRegistration. The motivation for the builder is that the constructor of PartitionRegistration has four arguments all of type int[] which makes it easy to make a mistake when using it. Reviewers: José Armando García Sancio <jsancio@apache.org>	1 year ago
Dimitar Dimitrov	0d5cf4c385	KAFKA-15052 Fix the flaky QuorumControllerTest.testBalancePartitionLeaders (#13804 ) In this test broker session timeout is configured aggressively low (to 1 second) so that fencing can happen without much waiting. Then in the final portion of the test when brokers should not be fenced heartbeats are sent roughly 2 times in a session timeout window. However the first time that's done there's other code between sending the heartbeat and taking the timestamp, and in local tests that code can take up to 0.5 seconds (1/2 of the session timeout). That then can result in all brokers being fenced again which would fail the test. This change sends a heartbeat just when a timestamp is taken, which in local tests results flaky failures from 4 out of 50 to 0 out of 50. Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
Colin Patrick McCabe	146a6976ae	KAFKA-15048: Improve handling of unexpected quorum controller errors (#13799 ) When the active quorum controller encounters an "unexpected" error, such as a NullPointerException, it currently resigns its leadership. This PR fixes it so that in addition to doing that, it also increments the metadata error count metric. This will allow us to better track down these errors. This PR also fixes a minor bug where performing read operations on a standby controller would result in an unexpected RuntimeException. The bug happened because the standby controller does not take in-memory snapshots, and read operations were attempting to read from the epoch of the latest committed offset. The fix is for the standby controller to simply read the latest value of each data structure. This is always safe, because standby controllers don't contain uncommitted data. Also, fix a bug where listPartitionReassignments was reading the latest data, rather than data from the last committed offset. Reviewers: dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>	1 year ago
David Arthur	f499662923	KAFKA-15003: Fix ZK sync logic for partition assignments (#13735 ) Fixed the metadata change events in the Migration component to check correctly for the diff in existing topic changes and replicate the metadata to the Zookeeper. Also, made the diff check exhaustive enough to handle the partial writes in Zookeeper when we're try to replicate changes using a snapshot in the event of Controller failover. Add migration client and integration tests to verify the change. Co-authored-by: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>	1 year ago
David Arthur	d27ba5bfba	KAFKA-15010 ZK migration failover support (#13758 ) This patch adds snapshot reconciliation during ZK to KRaft migration. This reconciliation happens whenever a snapshot is loaded by KRaft, or during a controller failover. Prior to this patch, it was possible to miss metadata updates coming from KRaft when dual-writing to ZK. Internally this adds a new state SYNC_KRAFT_TO_ZK to the KRaftMigrationDriver state machine. The controller passes through this state after the initial ZK migration and each time a controller becomes active. Logging during dual-write was enhanced to include a count of write operations happening. Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
Ron Dagostino	e74e5e7ac5	KAFKA-15039: Reduce logging level to trace in PartitionChangeBuilder.… (#13780 ) …tryElection() A CPU profile in a large cluster showed PartitionChangeBuilder.tryElection() taking significant CPU due to logging. We adjust the logging statements in that method for clean elections from DEBUG level to TRACE to mitigate the impact of this logging under normal operations. Unclean elections are now logged at the INFO level rather than DEBUG. Reviewers: Jason Gustafson <jason@confluent.io>, Colin P. McCabe <cmccabe@apache.org>	1 year ago
Proven Provenzano	731c8c967e	KAFKA-15017 Fix snapshot load in dual write mode for ClientQuotas and SCRAM (#13757 ) This patch fixes the case where a ClientQuota or SCRAM credential was added in KRaft, but not written back to ZK. This missed write only occurred when handling a KRaft snapshot. If the changed quota was processed in a metadata delta (which is the typical case), it would be written to ZK. Reviewers: David Arthur <mumrah@gmail.com>	1 year ago
Colin Patrick McCabe	9b3db6d50a	KAFKA-15019: Improve handling of broker heartbeat timeouts (#13759 ) When the active KRaft controller is overloaded, it will not be able to process broker heartbeat requests. Instead, they will be timed out. When using the default configuration, this will happen if the time needed to process a broker heartbeat climbs above a second for a sustained period. This, in turn, could lead to brokers being improperly fenced when they are still alive. With this PR, timed out heartbeats will still update the lastContactNs and metadataOffset of the broker in the BrokerHeartbeatManager. While we don't generate any records, this should still be adequate to prevent spurious fencing. We also log a message at ERROR level so that this condition will be more obvious. Other small changes in this PR: fix grammar issue in log4j of BrokerHeartbeatManager. Add JavaDoc for ClusterControlManager#zkMigrationEnabled field. Add builder for ReplicationControlTestContext to avoid having tons of constructors. Update ClusterControlManager.DEFAULT_SESSION_TIMEOUT_NS to match the default in KafkaConfig. Reviewers: Ismael Juma <ijuma@apache.org>, Ron Dagostino <rdagostino@confluent.io>	1 year ago
David Arthur	7a679af687	KAFKA-15004: Fix configuration dual-write during migration (#13767 ) This patch fixes several small bugs with configuration dual-write during migration. * Topic configs are not written back to ZK while handling snapshot. * New broker/topic configs in KRaft that did not exist in ZK will not be written to ZK. * The sensitive configs are not encoded while writing them to Zookeeper. * Handle topic configs in ConfigMigrationClient and KRaftMigrationZkWriter#handleConfigsSnapshot Added tests to ensure we no longer have the above mentioned issues. Co-authored-by: Akhilesh Chaganti <akhileshchg@users.noreply.github.com> Reviewers: Colin P. McCabe <cmccabe@apache.org>	1 year ago
Colin Patrick McCabe	b74204fa0a	KAFKA-14996: Handle overly large user operations on the kcontroller (#13742 ) Previously, if a user tried to perform an overly large batch operation on the KRaft controller (such as creating a million topics), we would create a very large number of records in memory. Our attempt to write these records to the Raft layer would fail, because there were too many to fit in an atomic batch. This failure, in turn, would trigger a controller failover. (Note: I am assuming here that no topic creation policy was in place that would prevent the creation of a million topics. I am also assuming that the user operation must be done atomically, which is true for all current user operations, since we have not implemented KIP-868 yet.) With this PR, we fail immediately when the number of records we have generated exceeds the threshold that we can apply. This failure does not generate a controller failover. We also now fail with a PolicyViolationException rather than an UnknownServerException. In order to implement this in a simple way, this PR adds the BoundedList class, which wraps any list and adds a maximum length. Attempts to grow the list beyond this length cause an exception to be thrown. Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ijuma@apache.org>, Divij Vaidya <diviv@amazon.com>	1 year ago

1 2 3 4 5 ...

284 Commits (834f72b03de40fb47caaad1397ed061de57c2509)