src-kafka

Author	SHA1	Message	Date
Mickael Maison	635c21311f	KAFKA-8564; Fix NPE on deleted partition dir when no segments remain (#6968 ) Kafka should not NPE while loading a deleted partition dir with no log segments. This patch ensures that there will always be at least one segment after initialization. Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com> Co-authored-by: Mickael Maison <mickael.maison@gmail.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>	5 years ago
John Roesler	a7e771c6da	KAFKA-8452: Compressed BufferValue review follow-up (#6940 ) Belatedly address a few code review comments from #6848 Reviewers: Bill Bejeck <bbejeck@gmail.com>	5 years ago
wenhoujx	93bf965894	KAFKA-8559: Allocate ArrayList with correct size in PartitionStates (#6964 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Boyang Chen	2c9a15045e	MINOR: fix consumer group failure message typo (#6962 ) Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
jolshan	33e39de4ad	KAFKA-8448: Cancel PeriodicProducerExpirationCheck when closing a Log instance (#6847 ) Cancel PeriodicProducerExpirationCheck when closing a Log instance, to avoid a memory leak. Add a method to KafkaScheduler to make this possible. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Konstantine Karantasis	c6ddd31887	MINOR: Update docs for KIP-415 (#6958 ) Update docs with KIP-415 details for incremental cooperative rebalancing Author: Konstantine Karantasis <konstantine@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com>	5 years ago
Boyang Chen	c7db82b59a	MINOR: rename subscription construction function (#6954 ) Per discussion on #6936, some nit fixes to the Subscription initialization path. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Florian Hussonnois	6d6366cd55	KAFKA-6958: Overload KTable methods to allow to name operation name using the new Named class (#6412 ) Sub-task required to allow to define custom processor names with KStreams DSL(KIP-307). This is the 4th PR for KIP-307. Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
Jason Gustafson	52cd59bdb4	MINOR: Simplify controller election utilities (#6944 ) This patch simplifies the controller election API. We were passing `LeaderIsrAndControllerEpoch` into the election utilities even though we just needed `LeaderAndIsr`. We also remove some unneeded collection copies `doElectLeaderForPartitions`. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Boyang Chen	1ae92914e2	HOTFIX: Fix optional import in ConsumerCoordinator (#6953 ) This was caused by back-to-back merging of #6854 (which removed the Optional import) and #6936 (which needed the import). Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Boyang Chen	47f908fa73	KAFKA-8539; Add group.instance.id to Subscription (#6936 ) This PR is part of KIP-345's effort to utilize this new field for more stable topic partition assignment. We add the group instance id to the `Subscription` object to allow partition assignors to make stickier assignments. More details [here](https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances#KIP-345:Introducestaticmembershipprotocoltoreduceconsumerrebalances-ClientBehaviorChanges). Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Boyang Chen	1b9e107388	KAFKA-7853: Refactor coordinator config (#6854 ) An attempt to refactor current coordinator logic. Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Konstantine Karantasis <konstantine@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Vikas Singh	57baa4079d	KAFKA-8457; Move `Log' reference from `Replica` into `Partition` (#6841 ) A `Partition` object contain one or many `Replica` objects. These replica objects in turn can have the "log" if the replica corresponds to the local node. All the code in Partition or ReplicaManager peek into replica object to fetch the log if they need to operate on that. As replica object can represent a local replica or a remote one, this lead to a bunch of "if-else" code in log fetch and offset update code. NOTE: In addition to a "log" that is in use during normal operation, if an alter log directory command is issued, we also create a future log object. This object catches up with local log and then we switch the log directory. So temporarily a Partition can have two local logs. Before this change both logs are inside replica objects. This change is an attempt to untangle this relationship. In particular it moves `Log` from `Replica` into `Partition`. So a partition contains a local log to which all writes go and possibly a "future log" if the partition is being moved between directories. Additionally, it maintains a list of remote replicas for offset and "caught up time" data that it uses for replication protocol. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Vahid Hashemian	c758122ce5	MINOR: Fix expected output in Streams quickstart Include the topic config `segment.bytes`. Author: Vahid Hashemian <vahidhashemian@us.ibm.com> Reviewers: Gwen Shapira Closes #6945 from vahidhashemian/minor/update_streams_quickstart_doc	6 years ago
Colin P. Mccabe	e047864f30	MINOR: fix some warnings in the broker Author: Colin P. Mccabe <cmccabe@confluent.io> Reviewers: Gwen Shapira Closes #6942 from cmccabe/fix-scala-warnings	6 years ago
Guozhang Wang	2ef02f111e	KAFKA-8179: Part I, Bump up consumer protocol to v2 (#6528 ) 1. Add new fields of subscription / assignment and bump up consumer protocol to v2. 2. Update tests to make sure old versioned protocol can be successfully deserialized, and new versioned protocol can be deserialized by old byte code. Reviewers: Boyang Chen <boyang@confluent.io>, Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
wenhoujx	35814298e1	KAFKA-8488: Reduce logging-related string allocation in FetchSessionHandler Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>	6 years ago
Jason Gustafson	8dd4fb5ebe	KAFKA-8530; Check for topic authorization errors in OffsetFetch response (#6928 ) The OffsetFetch requires Topic Describe permission. If a client does not have this, we return TOPIC_AUTHORIZATION_FAILED at the partition level. Currently the consumer does not handle this error explicitly, but raises it as a generic `KafkaException`. For consistency with other APIs and to fix transient test failures in `PlaintextEndToEndAuthorizationTest`, we should raise `TopicAuthorizationFailedException` instead. Reviewers: Ismael Juma <ismael@juma.me.uk>	6 years ago
Stanislav Kozlovski	9e60f4cd17	MINOR: Add system tests README link from main README (#6939 )	6 years ago
cadonna	48b65ac617	MINOR: Refactor `MetricsIntegrationTest` (#6930 ) Reviewers: Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
John Roesler	e54ab292e7	KAFKA-8452: Compressed BufferValue (#6848 ) De-duplicate the common case in which the prior value is the same as the old value. Reviewers: Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
Boyang Chen	f396372fb8	MINOR: add group coordinator test coverage (#6926 ) Some edge cases are not currently being tested. Add more tests to cover. Reviewers: Guozhang Wang <wangguoz@gmail.com>	6 years ago
cadonna	df9ea618a3	KAFKA-8262, KAFKA-8263: Fix flaky test `MetricsIntegrationTest` (#6922 ) - Timeout occurred due to initial slow rebalancing. - Added code to wait until `KafkaStreams` instance is in state RUNNING to check registration of metrics and in state NOT_RUNNING to check deregistration of metrics. - I removed all other wait conditions, because they are not needed if `KafkaStreams` instance is in the right state. Reviewers: Guozhang Wang <wangguoz@gmail.com>	6 years ago
Jason Gustafson	af2801031c	KAFKA-8483/KAFKA-8484; Ensure safe handling of producerId resets (#6883 ) The idempotent producer attempts to detect spurious UNKNOWN_PRODUCER_ID errors and handle them by reassigning sequence numbers to the inflight batches. The inflight batches are tracked in a PriorityQueue. The problem is that the reassignment of sequence numbers depends on the iteration order of PriorityQueue, which does not guarantee any ordering. So this can result in sequence numbers being assigned in the wrong order. This patch fixes the problem by using a sorted set instead of a priority queue so that the iteration order preserves the sequence order. Note that resetting sequence numbers is an exceptional case. This patch also fixes KAFKA-8484, which can cause an IllegalStateException when the producerId is reset while there are pending produce requests inflight. The solution is to ensure that sequence numbers are only reset if the producerId of a failed batch corresponds to the current producerId. Reviewers: Guozhang Wang <wangguoz@gmail.com>	6 years ago
José Armando García Sancio	1cc0b5eb81	MINOR: Seal the HostedPartition enumeration (#6917 ) Makes HostedPartition a sealed trait and make all of the match cases explicit. Reviewers: Vikas Singh <soondenana@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Boyang Chen	e981b82601	KAFKA-8500; Static member rejoin should always update member.id (#6899 ) This PR fixes a bug in static group membership. Previously we limit the `member.id` replacement in JoinGroup to only cases when the group is in Stable. This is error-prone and could potentially allow duplicate consumers reading from the same topic. For example, imagine a case where two unknown members join in the `PrepareRebalance` stage at the same time. The PR fixes the following things: 1. Replace `member.id` at any time we see a known static member rejoins group with unknown member.id 2. Immediately fence any ongoing join/sync group callback to early terminate the duplicate member. 3. Clearly handle Dead/Empty cases as exceptional. 4. Return old leader id upon static member leader rejoin to avoid trivial member assignment being triggered. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
highluck	e2c15e0eeb	MINOR: Remove uncommitted code (#6919 )	6 years ago
Almog Gavra	388889fbfe	KAFKA-8514: Move the scala-java8-compat dependency from clients to core (#6910 ) `clients` is a Java-only module and should not have any Scala dependency. Reviewers: Ismael Juma <ismael@juma.me.uk>	6 years ago
Gardner Vickers	cb2c4c3d7a	MINOR: Remove integration package prefix from ConsumerTopicCreationTest (#6916 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	6 years ago
Guozhang Wang	bebcbe3a04	KAFKA-8487: Only request re-join on REBALANCE_IN_PROGRESS in CommitOffsetResponse (#6894 ) Plus some minor cleanups on AbstractCoordinator. Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>	6 years ago
Carlos Manuel Duclos Vergara	1669b773ba	KAFKA-8501: Removing key and value from exception message (#6904 ) Messages containing key and value were moved to the TRACE logging level, however the exception is still adding the key and value. This commits remove the key and value from StreamsException. Reviewers: Bill Bejeck <bbejeck@gmail.com>	6 years ago
Jason Gustafson	783ab74793	KAFKA-8333; Load high watermark checkpoint lazily when initializing replicas (#6800 ) Currently we load the high watermark checkpoint separately for every replica that we load. This patch makes this loading logic lazy and caches the loaded map while a LeaderAndIsr request is being handled. Reviewers: Jun Rao <junrao@gmail.com>	6 years ago
Bill Bejeck	0864779ce2	MINOR: Increase timeouts to 30 seconds (#6852 ) The ResetIntegrationTest has experienced several failures and it seems the current timeout of 10 seconds may not be enough time Reviewers: Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>	6 years ago
Jason Gustafson	1e5227c230	MINOR: Fix transient failure in PreferredReplicaLeaderElectionCommandTest (#6908 ) We have seen this failing recently due to the follower error: ``` java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:366) at scala.None$.get(Option.scala:364) at kafka.admin.PreferredReplicaLeaderElectionCommandTest.getLeader(PreferredReplicaLeaderElectionCommandTest.scala:101) at kafka.admin.PreferredReplicaLeaderElectionCommandTest.testNoopElection(PreferredReplicaLeaderElectionCommandTest.scala:240) ``` We need to wait for the leader to be available. Reviewers: David Arthur <mumrah@gmail.com>	6 years ago
Victoria Bialas	b9c55351fd	KAFKA-7315 DOCS update TOC internal links serdes all versions (#6875 ) Reviewers: Joel Hamill <joel@confluent.io>, Jim Galasyn <jim.galasyn@confluent.io>, Matthias J. Sax <matthias@confluent.io>	6 years ago
Jason Gustafson	2feb44ebc8	MINOR: Fix race condition on shutdown of verifiable producer We've seen `ReplicaVerificationToolTest.test_replica_lags` fail occasionally due to errors such as the following: ``` RemoteCommandError: ubuntuworker7: Command 'kill -15 2896' returned non-zero exit status 1. Remote error message: bash: line 0: kill: (2896) - No such process ``` The problem seems to be a shutdown race condition when using `max_messages` with the producer. The process may already be gone which will cause the signal to fail. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6906 from hachikuji/fix-failing-replicat-verification-test	6 years ago
Jason Gustafson	c7c310beff	MINOR: Lower producer throughput in flaky upgrade system test We see the upgrade test failing from time to time. I looked into it and found that the root cause is basically that the test throughput can be too high for the 0.9 producer to make progress. Eventually it reaches a point where it has a huge backlog of timed out requests in the accumulator which all have to be expired. We see a long run of messages like this in the output: ``` {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335160","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335163","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335166","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335169","key":null} ``` This can continue for a long time (I have observed up to 1 min) and prevents the producer from successfully writing any new data. While it is busy expiring the batches, no data is getting delivered to the consumer, which causes it to eventually raise a timeout. ``` kafka.consumer.ConsumerTimeoutException at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:50) at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:109) at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69) at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47) at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) ``` The fix here is to reduce the throughput, which seems reasonable since the purpose of the test is to verify the upgrade, which does not demand heavy load. Note that I investigated several failing instances of this test going back to 1.0 and saw a similar pattern, so there does not appear to be a regression. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6907 from hachikuji/lower-throughput-for-upgrade-test	6 years ago
Lucas Bradstreet	677713baf3	KAFKA-8499: ensure java is in PATH for ducker system tests (#6898 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	6 years ago
Boyang Chen	cca05cace4	KAFKA-8331: stream static membership system test (#6877 ) As title suggested, we boost 3 stream instances stream job with one minute session timeout, and once the group is stable, doing couple of rolling bounces for the entire cluster. Every rejoin based on restart should have no generation bump on the client side. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
Jason Gustafson	bb8de0b8c5	KAFKA-8003; Fix flaky testFencingOnTransactionExpiration We see this failure from time to time: ``` java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:633) at kafka.api.TransactionsTest.testFencingOnTransactionExpiration(TransactionsTest.scala:512) ``` The cause is probably that we are using `consumeRecordsFor` which has no expectation on the number of records to fetch and a timeout of just 1s. This patch changes the code to use `consumeRecords` and the default 15s timeout. Note we have also fixed a bug in the test case itself, which was using the wrong topic for the second write, which meant it could never have failed in the anticipated way anyway. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6905 from hachikuji/fix-flaky-transaction-test	6 years ago
Jason Gustafson	07dd7db324	MINOR: Increase timeout for flaky ResetConsumerGroupOffsetTest (#6900 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	6 years ago
Manikumar Reddy	060701fb6f	KAFKA-8461: Wait for follower to join the ISR in testUncleanLeaderElectionDisabled Test Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Jason Gustafson <jason@confluent.io>, Boyang Chen <boyang@confluent.io> Closes #6887 from omkreddy/unclean-leader	6 years ago
José Armando García Sancio	28eccc3ca3	KAFKA-8384; Leader election command integration tests (#6880 ) This patch adds test cases for the leader election command added in KIP-460. Reviewers: Vikas Singh, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Boyang Chen	72c3b88ed5	MINOR: Fixed compiler warnings in LogManagerTest (#6897 ) Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	6 years ago
Wennn	562424cee8	MINOR: Remove redundant semicolons in `KafkaApis` imports (#6889 ) Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
A. Sophie Blee-Goldman	30adf2a096	HOTFIX: Close unused ColumnFamilyHandle (#6893 ) In RocksDBTimestampedStore#openRocksDB we try to open a db with two column families. If this succeeds but the first column family is empty (db.newIterator.seekToFirst.isValid() == false) we never actually close its ColumnFamilyHandle Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	6 years ago
Stanislav Kozlovski	58aa04f91e	MINOR: Improve Trogdor external command worker docs (#6438 ) Reviewers: Colin McCabe <cmccabe@apache.org>, Xi Yang <xi@confluent.io>	6 years ago
A. Sophie Blee-Goldman	59d3a56740	Minor: Replace InternalTopicMetadata with InternalTopicConfig (#6886 ) Quick tech debt cleanup. For some reason StreamsPartitionAssignor uses an InternalTopicMetadata class which wraps an InternalTopicConfig object along with the number of partitions. But InternalTopicConfig already has a numPartitions field, so we should just use it directly instead. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
Jason Gustafson	0e95c9f3a8	KAFKA-8400; Do not update follower replica state if the log read failed (#6814 ) This patch checks for errors handling a fetch request before updating follower state. Previously we were unsafely passing the failed `LogReadResult` with most fields set to -1 into `Replica` to update follower state. Additionally, this patch attempts to improve the test coverage for ISR shrinking and expansion logic in `Partition`. Reviewers: Guozhang Wang <wangguoz@gmail.com>	6 years ago
Boyang Chen	fb1f74958d	KAFKA-8386; Use COORDINATOR_NOT_AVAILABLE error when group is Dead (#6762 ) The Dead state in the coordinator is used for groups which are either pending deletion or migration to a new coordinator. Currently requests received while in this state result in an UNKNOWN_MEMBER_ID which causes consumers to reset the memberId. This is a problem for KIP-345 since it can cause an older member to fence a newer member. This patch changes the error code returned in this state to COORDINATOR_NOT_AVAILABLE, which causes the consumer to rediscover the coordinator, but not reset the memberId. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago

1 2 3 4 5 ...

6395 Commits (05cba28ca7aafd3974e9e818be08f239b6162855) All Branches Search

6395 Commits (05cba28ca7aafd3974e9e818be08f239b6162855)

All Branches