src-kafka

Commit Graph

Author	SHA1	Message	Date
Colin P. Mccabe	e047864f30	MINOR: fix some warnings in the broker Author: Colin P. Mccabe <cmccabe@confluent.io> Reviewers: Gwen Shapira Closes #6942 from cmccabe/fix-scala-warnings	6 years ago
Boyang Chen	f396372fb8	MINOR: add group coordinator test coverage (#6926 ) Some edge cases are not currently being tested. Add more tests to cover. Reviewers: Guozhang Wang <wangguoz@gmail.com>	6 years ago
José Armando García Sancio	1cc0b5eb81	MINOR: Seal the HostedPartition enumeration (#6917 ) Makes HostedPartition a sealed trait and make all of the match cases explicit. Reviewers: Vikas Singh <soondenana@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Boyang Chen	e981b82601	KAFKA-8500; Static member rejoin should always update member.id (#6899 ) This PR fixes a bug in static group membership. Previously we limit the `member.id` replacement in JoinGroup to only cases when the group is in Stable. This is error-prone and could potentially allow duplicate consumers reading from the same topic. For example, imagine a case where two unknown members join in the `PrepareRebalance` stage at the same time. The PR fixes the following things: 1. Replace `member.id` at any time we see a known static member rejoins group with unknown member.id 2. Immediately fence any ongoing join/sync group callback to early terminate the duplicate member. 3. Clearly handle Dead/Empty cases as exceptional. 4. Return old leader id upon static member leader rejoin to avoid trivial member assignment being triggered. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Gardner Vickers	cb2c4c3d7a	MINOR: Remove integration package prefix from ConsumerTopicCreationTest (#6916 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	6 years ago
Guozhang Wang	bebcbe3a04	KAFKA-8487: Only request re-join on REBALANCE_IN_PROGRESS in CommitOffsetResponse (#6894 ) Plus some minor cleanups on AbstractCoordinator. Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>	6 years ago
Jason Gustafson	783ab74793	KAFKA-8333; Load high watermark checkpoint lazily when initializing replicas (#6800 ) Currently we load the high watermark checkpoint separately for every replica that we load. This patch makes this loading logic lazy and caches the loaded map while a LeaderAndIsr request is being handled. Reviewers: Jun Rao <junrao@gmail.com>	6 years ago
Jason Gustafson	1e5227c230	MINOR: Fix transient failure in PreferredReplicaLeaderElectionCommandTest (#6908 ) We have seen this failing recently due to the follower error: ``` java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:366) at scala.None$.get(Option.scala:364) at kafka.admin.PreferredReplicaLeaderElectionCommandTest.getLeader(PreferredReplicaLeaderElectionCommandTest.scala:101) at kafka.admin.PreferredReplicaLeaderElectionCommandTest.testNoopElection(PreferredReplicaLeaderElectionCommandTest.scala:240) ``` We need to wait for the leader to be available. Reviewers: David Arthur <mumrah@gmail.com>	6 years ago
Jason Gustafson	bb8de0b8c5	KAFKA-8003; Fix flaky testFencingOnTransactionExpiration We see this failure from time to time: ``` java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:633) at kafka.api.TransactionsTest.testFencingOnTransactionExpiration(TransactionsTest.scala:512) ``` The cause is probably that we are using `consumeRecordsFor` which has no expectation on the number of records to fetch and a timeout of just 1s. This patch changes the code to use `consumeRecords` and the default 15s timeout. Note we have also fixed a bug in the test case itself, which was using the wrong topic for the second write, which meant it could never have failed in the anticipated way anyway. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6905 from hachikuji/fix-flaky-transaction-test	6 years ago
Jason Gustafson	07dd7db324	MINOR: Increase timeout for flaky ResetConsumerGroupOffsetTest (#6900 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	6 years ago
Manikumar Reddy	060701fb6f	KAFKA-8461: Wait for follower to join the ISR in testUncleanLeaderElectionDisabled Test Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Jason Gustafson <jason@confluent.io>, Boyang Chen <boyang@confluent.io> Closes #6887 from omkreddy/unclean-leader	6 years ago
José Armando García Sancio	28eccc3ca3	KAFKA-8384; Leader election command integration tests (#6880 ) This patch adds test cases for the leader election command added in KIP-460. Reviewers: Vikas Singh, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Boyang Chen	72c3b88ed5	MINOR: Fixed compiler warnings in LogManagerTest (#6897 ) Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	6 years ago
Wennn	562424cee8	MINOR: Remove redundant semicolons in `KafkaApis` imports (#6889 ) Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
Jason Gustafson	0e95c9f3a8	KAFKA-8400; Do not update follower replica state if the log read failed (#6814 ) This patch checks for errors handling a fetch request before updating follower state. Previously we were unsafely passing the failed `LogReadResult` with most fields set to -1 into `Replica` to update follower state. Additionally, this patch attempts to improve the test coverage for ISR shrinking and expansion logic in `Partition`. Reviewers: Guozhang Wang <wangguoz@gmail.com>	6 years ago
Boyang Chen	fb1f74958d	KAFKA-8386; Use COORDINATOR_NOT_AVAILABLE error when group is Dead (#6762 ) The Dead state in the coordinator is used for groups which are either pending deletion or migration to a new coordinator. Currently requests received while in this state result in an UNKNOWN_MEMBER_ID which causes consumers to reset the memberId. This is a problem for KIP-345 since it can cause an older member to fence a newer member. This patch changes the error code returned in this state to COORDINATOR_NOT_AVAILABLE, which causes the consumer to rediscover the coordinator, but not reset the memberId. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Almog Gavra	8e161580b8	KAFKA-8305; Support default partitions & replication factor in AdminClient#createTopic (KIP-464) (#6728 ) This commit makes three changes: - Adds a constructor for NewTopic(String, Optional<Integer>, Optional<Short>) which allows users to specify Optional.empty() for numPartitions or replicationFactor in order to use the broker default. - Changes AdminManager to accept -1 as valid options for replication factor and numPartitions (resolving to broker defaults). - Makes --partitions and --replication-factor optional arguments when creating topics using kafka-topics.sh. - Adds a dependency on scalaJava8Compat library to make it simpler to convert Scala Option to Java Optional Reviewers: Ismael Juma <ismael@juma.me.uk>, Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Colin P. Mccabe	e6563aab72	KAFKA-4893; Fix deletion and moving of topics with long names Author: Colin P. Mccabe <cmccabe@confluent.io> Reviewers: Gwen Shapira, David Arthur, James Cheng, Vahid Hashemian Closes #6869 from cmccabe/KAFKA-4893	6 years ago
Guozhang Wang	573152dfa8	HOTFIX: Allow multi-batches for old format and no compression (#6871 ) Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
José Armando García Sancio	45fae33937	KAFKA-8383; Integration tests for unclean `electLeaders` (#6857 ) Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
José Armando García Sancio	121308cc7a	KAFKA-8286; Generalized Leader Election Admin RPC (KIP-460) (#6686 ) Implements KIP-460: https://cwiki.apache.org/confluence/display/KAFKA/KIP-460%3A+Admin+Leader+Election+RPC. Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Jason Gustafson	348e5d738f	MINOR: Fix transient failure in PartitionTest.testAddAndRemoveMetrics (#6834 ) The test was failing because other test cases are leaving some metrics in the registry. This patch cleans the registry before and after test runs. Reviewers: Ismael Juma <ismael@juma.me.uk>	6 years ago
Stanislav Kozlovski	efeefa5d18	MINOR: Check invalid bootstrap-server, alter option and config flags before checking for the required --partitions flag (#6786 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	6 years ago
Guozhang Wang	592410fc58	KAFKA-8428: Always require a single batch with un- / compressed messages (#6816 ) I think it's better just to make single-batch as a universal requirement for all versions for compressed messages, and for V2 and beyond uncompressed messages as well. Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
Jason Gustafson	4a3c92bfc7	MINOR: Remove redundant checkpoint thread started field in ReplicaManager (#6813 ) We have two fields `highWatermarkCheckPointThreadStarted` and `hwThreadInitialized` which appear to be serving the same purpose. This patch gets rid of `hwThreadInitialized`. Reviewers: Guozhang Wang <wangguoz@gmail.com>	6 years ago
Jason Gustafson	6afb0ca735	KAFKA-8351; Cleaner should handle transactions spanning multiple segments (#6722 ) When cleaning transactional data, we need to keep track of which transactions still have data associated with them so that we do not remove the markers. We had logic to do this, but the state was not being carried over when beginning cleaning for a new set of segments. This could cause the cleaner to incorrectly believe a transaction marker was no longer needed. The fix here carries the transactional state between groups of segments to be cleaned. Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Jason Gustafson	a1808962e5	KAFKA-8422; Client should send OffsetForLeaderEpoch only if broker supports latest version (#6806 ) In the olden days, OffsetForLeaderEpoch was exclusively an inter-broker protocol and required Cluster level permission. With KIP-320, clients can use this API as well and so we lowered the required permission to Topic Describe. The only way the client can be sure that the new permissions are in use is to require version 3 of the protocol which was bumped for 2.3. If the broker does not support this version, we skip the validation and revert to the old behavior. Additionally, this patch fixes a problem with the newly added replicaId field when parsed from older versions which did not have it. If the field was not present, then we used the consumer's sentinel value, but this would limit the range of visible offsets by the high watermark. To get around this problem, this patch adds a separate "debug" sentinel similar to APIs like Fetch and ListOffsets. Reviewers: Ismael Juma <ismael@juma.me.uk>	6 years ago
Lee Dongjin	eefb25d4ca	MINOR: Fix a few compiler warnings (#6767 ) Reviewers: Boyang Chen <bchen11@outlook.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Jason Gustafson	3696b9882d	KAFKA-8371: Remove dependence on ReplicaManager from Partition (#6705 ) This patch attempts to simplify the interaction between Partition and the various components from `ReplicaManager`. This is primarily to make unit testing easier. I have also tried to eliminate the OfflinePartition sentinel which has always been unsafe. Reviewers: Boyang Chen <bchen11@outlook.com>, David Arthur <mumrah@gmail.com>	6 years ago
Jason Gustafson	5f06999cf3	MINOR: Remove ControllerEventManager metrics on close (#6788 ) Remove created metrics when shutting down `ControllerEventManager`. This fixes transient failures in `ControllerEventManagerTest.testEventQueueTime` and is generally good hygiene. Reviewers: José Armando García Sancio <jsancio@gmail.com>, Ismael Juma <ismael@juma.me.uk>	6 years ago
David Arthur	bacb45e044	MINOR: Set `replicaId` for OffsetsForLeaderEpoch from followers (#6775 ) Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
Jason Gustafson	ce5ce2d569	MINOR: A few logging improvements in the broker (#6773 ) Reviewers: Boyang Chen <bchen11@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	6 years ago
Manikumar Reddy	d77bac1c93	KAFKA-3143: Controller should transition offline replicas on startup Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io> Closes #5041 from omkreddy/KAFKA-3143	6 years ago
Boyang Chen	e00c0d316d	MINOR: Fix typo in heartbeat request protocol definition (#6759 ) This changes the field "generationid" to "generationId" to be consistent with other uses. Reviewers: Shaobo Liu <lambda.tencent@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Boyang Chen	9fa331b811	KAFKA-8225 & KIP-345 part-2: fencing static member instances with conflicting group.instance.id (#6650 ) For static members join/rejoin, we encode the current timestamp in the new member.id. The format looks like group.instance.id-timestamp. During consumer/broker interaction logic (Join, Sync, Heartbeat, Commit), we shall check the whether group.instance.id is known on group. If yes, we shall match the member.id stored on static membership map with the request member.id. If mismatching, this indicates a conflict consumer has used same group.instance.id, and it will receive a fatal exception to shut down. Right now the only missing part is the system test. Will work on it offline while getting the major logic changes reviewed. Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	6 years ago
David Arthur	e2847e8603	KAFKA-8365; Consumer and protocol support for follower fetching (#6731 ) This patch includes API changes for follower fetching per [KIP-392](https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica) as well as the consumer implementation. After this patch, consumers will continue to fetch only from the leader, since the broker implementation to select an alternate read replica is not included here. Adds new `client.rack` consumer configuration property is added which allows the consumer to indicate its rack. This is just an arbitrary string to indicate some relative location, it doesn't have to actually represent a physical rack. We are keeping the naming consistent with the broker property (`broker.rack`). FetchRequest now includes `rack_id` which can optionally be specified by the consumer. FetchResponse includes an optional `preferred_read_replica` field for each partition in the response. OffsetForLeaderEpochRequest also adds new `replica_id` field which is similar to the same field in FetchRequest. When the consumer sees a `preferred_read_replica` in a fetch response, it will use the Node with that ID for the next fetch. Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
Jason Gustafson	d6057a9fe4	MINOR: Remove spammy log message during topic deletion Deletion of a large number of topics can cause a ton of log spam. In a test case on 2.2, deletion of 50 topics with 100 partitions each caused about 158 Mb of data in the controller log. With the improvements to batch StopReplica and the patch here, we reduce that to about 1.5 Mb. Kudos to gwenshap for spotting these spammy messages. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6738 from hachikuji/remove-verbose-topic-deletion-log-message	6 years ago
Aishwarya Gune	414852c701	KAFKA-8346; Improve replica fetcher behavior for handling partition failure [KIP-461] (#6716 ) The replica fetcher thread is terminated in case a partition crashes which leads to under replication. This behavior can be improved by dropping the failed partition. The thread can continue monitoring the rest of the partitions. If all partitions of a thread have failed, the thread would be shut down. This is documented in KIP-461: https://cwiki.apache.org/confluence/display/KAFKA/KIP-461+-+Improve+Replica+Fetcher+behavior+at+handling+partition+failure. Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Mickael Maison	855f899bb5	KAFKA-8256; Replace Heartbeat request/response with automated protocol (#6691 ) Reviewers: Boyang Chen <bchen11@outlook.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Boyang Chen	6e6dcceb93	KAFKA-8220; Avoid kicking out static group members through rebalance timeout (#6666 ) To make static consumer group members more persistent, we want to avoid kicking out unjoined members through rebalance timeout. Essentially we allow static members to participate in a rebalance using their old subscription without sending a JoinGroup. The only catch is that an unjoined static member might be the current group leader, and we may need to elect a different leader. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Boyang Chen	2208f9966d	KAFKA-8354; Replace Sync group request/response with automated protocol (#6729 ) Update SyncGroup API to use the generated protocol classes. Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
Colin Patrick McCabe	0494cd329f	MINOR: Refactor SslFactory (#6674 ) SslFactory: split the part of SslFactory that creates SSLEngine instances into SslEngineBuilder. When (re)configuring, we simply create a new SslEngineBuilder. This allows us to make all the builder fields immutable. It also simplifies the logic for reconfiguring. Because we sometimes need to test old SslEngine instances against new ones, being able to use both the old and the new builder at once is useful. Create an enum named SslClientAuth which encodes the possible values for ssl.client.auth. This will simplify the handling of this configuration. SslTransportLayer#maybeProcessHandshakeFailure should treat an SSLHandshakeException with a "Received fatal alert" message as a handshake error (and therefore an authentication error.) SslFactoryTest: add some line breaks for very long lines. ConfigCommand#main: when terminating the command due to an uncaught exception, log the exception using debug level in slf4j, in addition to printing it to stderr. This makes it easier to debug failing junit tests, where stderr may not be kept, or may be reordered with respect to other slf4j messages. The use of debug level is consistent with how we handle other types of exceptions in ConfigCommand#main. StateChangeLogMerger#main: spell out the full name of scala.io.Source rather than abbreviating it as io.Source. This makes it clearer that it is part of the Scala standard library. It also avoids compiler errors when other libraries whose groupId starts with "io" are used in the broker. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	6 years ago
Rajini Sivaram	050fdd6537	KAFKA-8336; Enable dynamic reconfiguration of broker's client-side certs (#6721 ) Enable reconfiguration of SSL keystores and truststores in client-side channel builders used by brokers for controller, transaction coordinator and replica fetchers. This enables brokers using TLS mutual authentication for inter-broker listener to use short-lived certs that may be updated before expiry without restarting brokers. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	6 years ago
Dhruvil Shah	16d4d8cafc	MINOR: Fix flaky ConsumerTopicCreationTest (#6727 ) `ConsumerTopicCreationTest` relied on `KafkaConsumer#poll` to send a `MetadataRequest` within 100ms to verify if a topic is auto created or not. This is brittle and does not guarantee if the request made it to the broker or was processed successfully. This PR fixes the flaky test by adding another topic; we wait until we consume a previously produced record to this topic. This ensures MetadataRequest was processed and we could then check if the topic we're interested in was created or not. Reviewers: Boyang Chen <bchen11@outlook.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Xiongqi Wu	b86e8c1ea9	MINOR: add docs for KIP-354 KAFKA-7321 (#6724 ) MINOR: update documentation for the log cleaner max compaction lag feature (KIP-354) implemented in KAFKA-7321 Author: Xiongqi Wu <xiowu@linkedin.com> Reviewer: Joel Koshy <jjkoshy@gmail.com>	6 years ago
Jason Gustafson	e4007a6408	KAFKA-8294; Batch StopReplica requests when possible and improve test coverage (#6642 ) The main problem we are trying to solve here is the batching of StopReplica requests and the lack of test coverage for `ControllerChannelManager`. Addressing the first problem was straightforward, but the second problem required quite a bit of work because of the dependence on `KafkaController` for all of the events. It seemed to make sense to separate the events from the processing of events so that we could remove this dependence and improve testability. With the refactoring, I was able to add test cases covering most of the logic in `ControllerChannelManager` including the generation of requests and the expected response handling logic. Note that I have not actually changed any of the event handling logic in `KafkaController`. While refactoring this logic, I found that the event queue time metric was not being correctly computed. The problem is that many of the controller events were singleton objects which inherited the `enqueueTimeMs` field from the `ControllerEvent` trait. This would never get updated, so queue time would be skewed. Reviewers: Jun Rao <junrao@gmail.com>	6 years ago
Xiongqi Wu	1fdc853301	KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354) (#6009 ) KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354) Records become eligible for compaction after the specified time interval. Author: Xiongqi Wu <xiowu@linkedin.com> Reviewer: Joel Koshy <jjkoshy@gmail.com>	6 years ago
Jason Gustafson	d798dbf497	KAFKA-8335; Clean empty batches when sequence numbers are reused (#6715 ) The log cleaner attempts to preserve the last entry for each producerId in order to ensure that sequence/epoch state is not lost. The current validation checks only the last sequence number for each producerId in order to decide whether a batch should be retained. There are two problems with this: 1. Sequence numbers are not unique alone. It is the tuple of sequence number and epoch which is uniquely defined. 2. The group coordinator always writes batches beginning with sequence number 0, which means there could be many batches which have the same sequence number. The complete fix for the second issue would probably add proper sequence number bookkeeping in the coordinator. For now, we have left the coordinator implementation unchanged and changed the cleaner logic to use the last offset written by a producer instead of the last sequence number. Reviewers: Guozhang Wang <wangguoz@gmail.com>	6 years ago
Kengo Seki	511c8e2c9e	MINOR: Remove unnecessary OptionParser#accepts method call from PreferredReplicaLeaderElectionCommand (#6710 ) Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
Bob Barrett	a97e55b838	KAFKA-8332: Refactor ImplicitLinkedHashSet to avoid losing ordering when converting to Scala Because of how conversions between Java collections and Scala collections work, ImplicitLinkedHashMultiSet objects were being treated as unordered in some contexts where they shouldn't be. This broke JOIN_GROUP handling. This patch renames ImplicitLinkedHashMultiSet to ImplicitLinkedHashMultCollection. The order of Collection objects will be preserved when converting to scala. Adding Set and List "views" to the Collection gives us a more elegant way of accessing that functionality when needed. Reviewers: Colin P. McCabe <cmccabe@apache.org>	6 years ago

1 2 3 4 5 ...

2611 Commits (c758122ce59674ec3e33618d896e4e5cdbb45e87)