src-kafka

Commit Graph

Author	SHA1	Message	Date
Guozhang Wang	bebcbe3a04	KAFKA-8487: Only request re-join on REBALANCE_IN_PROGRESS in CommitOffsetResponse (#6894 ) Plus some minor cleanups on AbstractCoordinator. Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>	6 years ago
Carlos Manuel Duclos Vergara	1669b773ba	KAFKA-8501: Removing key and value from exception message (#6904 ) Messages containing key and value were moved to the TRACE logging level, however the exception is still adding the key and value. This commits remove the key and value from StreamsException. Reviewers: Bill Bejeck <bbejeck@gmail.com>	6 years ago
Jason Gustafson	783ab74793	KAFKA-8333; Load high watermark checkpoint lazily when initializing replicas (#6800 ) Currently we load the high watermark checkpoint separately for every replica that we load. This patch makes this loading logic lazy and caches the loaded map while a LeaderAndIsr request is being handled. Reviewers: Jun Rao <junrao@gmail.com>	6 years ago
Bill Bejeck	0864779ce2	MINOR: Increase timeouts to 30 seconds (#6852 ) The ResetIntegrationTest has experienced several failures and it seems the current timeout of 10 seconds may not be enough time Reviewers: Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>	6 years ago
Jason Gustafson	1e5227c230	MINOR: Fix transient failure in PreferredReplicaLeaderElectionCommandTest (#6908 ) We have seen this failing recently due to the follower error: ``` java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:366) at scala.None$.get(Option.scala:364) at kafka.admin.PreferredReplicaLeaderElectionCommandTest.getLeader(PreferredReplicaLeaderElectionCommandTest.scala:101) at kafka.admin.PreferredReplicaLeaderElectionCommandTest.testNoopElection(PreferredReplicaLeaderElectionCommandTest.scala:240) ``` We need to wait for the leader to be available. Reviewers: David Arthur <mumrah@gmail.com>	6 years ago
Victoria Bialas	b9c55351fd	KAFKA-7315 DOCS update TOC internal links serdes all versions (#6875 ) Reviewers: Joel Hamill <joel@confluent.io>, Jim Galasyn <jim.galasyn@confluent.io>, Matthias J. Sax <matthias@confluent.io>	6 years ago
Jason Gustafson	2feb44ebc8	MINOR: Fix race condition on shutdown of verifiable producer We've seen `ReplicaVerificationToolTest.test_replica_lags` fail occasionally due to errors such as the following: ``` RemoteCommandError: ubuntuworker7: Command 'kill -15 2896' returned non-zero exit status 1. Remote error message: bash: line 0: kill: (2896) - No such process ``` The problem seems to be a shutdown race condition when using `max_messages` with the producer. The process may already be gone which will cause the signal to fail. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6906 from hachikuji/fix-failing-replicat-verification-test	6 years ago
Jason Gustafson	c7c310beff	MINOR: Lower producer throughput in flaky upgrade system test We see the upgrade test failing from time to time. I looked into it and found that the root cause is basically that the test throughput can be too high for the 0.9 producer to make progress. Eventually it reaches a point where it has a huge backlog of timed out requests in the accumulator which all have to be expired. We see a long run of messages like this in the output: ``` {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335160","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335163","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335166","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335169","key":null} ``` This can continue for a long time (I have observed up to 1 min) and prevents the producer from successfully writing any new data. While it is busy expiring the batches, no data is getting delivered to the consumer, which causes it to eventually raise a timeout. ``` kafka.consumer.ConsumerTimeoutException at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:50) at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:109) at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69) at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47) at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) ``` The fix here is to reduce the throughput, which seems reasonable since the purpose of the test is to verify the upgrade, which does not demand heavy load. Note that I investigated several failing instances of this test going back to 1.0 and saw a similar pattern, so there does not appear to be a regression. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6907 from hachikuji/lower-throughput-for-upgrade-test	6 years ago
Lucas Bradstreet	677713baf3	KAFKA-8499: ensure java is in PATH for ducker system tests (#6898 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	6 years ago
Boyang Chen	cca05cace4	KAFKA-8331: stream static membership system test (#6877 ) As title suggested, we boost 3 stream instances stream job with one minute session timeout, and once the group is stable, doing couple of rolling bounces for the entire cluster. Every rejoin based on restart should have no generation bump on the client side. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
Jason Gustafson	bb8de0b8c5	KAFKA-8003; Fix flaky testFencingOnTransactionExpiration We see this failure from time to time: ``` java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:633) at kafka.api.TransactionsTest.testFencingOnTransactionExpiration(TransactionsTest.scala:512) ``` The cause is probably that we are using `consumeRecordsFor` which has no expectation on the number of records to fetch and a timeout of just 1s. This patch changes the code to use `consumeRecords` and the default 15s timeout. Note we have also fixed a bug in the test case itself, which was using the wrong topic for the second write, which meant it could never have failed in the anticipated way anyway. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6905 from hachikuji/fix-flaky-transaction-test	6 years ago
Jason Gustafson	07dd7db324	MINOR: Increase timeout for flaky ResetConsumerGroupOffsetTest (#6900 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	6 years ago
Manikumar Reddy	060701fb6f	KAFKA-8461: Wait for follower to join the ISR in testUncleanLeaderElectionDisabled Test Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Jason Gustafson <jason@confluent.io>, Boyang Chen <boyang@confluent.io> Closes #6887 from omkreddy/unclean-leader	6 years ago
José Armando García Sancio	28eccc3ca3	KAFKA-8384; Leader election command integration tests (#6880 ) This patch adds test cases for the leader election command added in KIP-460. Reviewers: Vikas Singh, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Boyang Chen	72c3b88ed5	MINOR: Fixed compiler warnings in LogManagerTest (#6897 ) Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	6 years ago
Wennn	562424cee8	MINOR: Remove redundant semicolons in `KafkaApis` imports (#6889 ) Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
A. Sophie Blee-Goldman	30adf2a096	HOTFIX: Close unused ColumnFamilyHandle (#6893 ) In RocksDBTimestampedStore#openRocksDB we try to open a db with two column families. If this succeeds but the first column family is empty (db.newIterator.seekToFirst.isValid() == false) we never actually close its ColumnFamilyHandle Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	6 years ago
Stanislav Kozlovski	58aa04f91e	MINOR: Improve Trogdor external command worker docs (#6438 ) Reviewers: Colin McCabe <cmccabe@apache.org>, Xi Yang <xi@confluent.io>	6 years ago
A. Sophie Blee-Goldman	59d3a56740	Minor: Replace InternalTopicMetadata with InternalTopicConfig (#6886 ) Quick tech debt cleanup. For some reason StreamsPartitionAssignor uses an InternalTopicMetadata class which wraps an InternalTopicConfig object along with the number of partitions. But InternalTopicConfig already has a numPartitions field, so we should just use it directly instead. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
Jason Gustafson	0e95c9f3a8	KAFKA-8400; Do not update follower replica state if the log read failed (#6814 ) This patch checks for errors handling a fetch request before updating follower state. Previously we were unsafely passing the failed `LogReadResult` with most fields set to -1 into `Replica` to update follower state. Additionally, this patch attempts to improve the test coverage for ISR shrinking and expansion logic in `Partition`. Reviewers: Guozhang Wang <wangguoz@gmail.com>	6 years ago
Boyang Chen	fb1f74958d	KAFKA-8386; Use COORDINATOR_NOT_AVAILABLE error when group is Dead (#6762 ) The Dead state in the coordinator is used for groups which are either pending deletion or migration to a new coordinator. Currently requests received while in this state result in an UNKNOWN_MEMBER_ID which causes consumers to reset the memberId. This is a problem for KIP-345 since it can cause an older member to fence a newer member. This patch changes the error code returned in this state to COORDINATOR_NOT_AVAILABLE, which causes the consumer to rediscover the coordinator, but not reset the memberId. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Almog Gavra	8e161580b8	KAFKA-8305; Support default partitions & replication factor in AdminClient#createTopic (KIP-464) (#6728 ) This commit makes three changes: - Adds a constructor for NewTopic(String, Optional<Integer>, Optional<Short>) which allows users to specify Optional.empty() for numPartitions or replicationFactor in order to use the broker default. - Changes AdminManager to accept -1 as valid options for replication factor and numPartitions (resolving to broker defaults). - Makes --partitions and --replication-factor optional arguments when creating topics using kafka-topics.sh. - Adds a dependency on scalaJava8Compat library to make it simpler to convert Scala Option to Java Optional Reviewers: Ismael Juma <ismael@juma.me.uk>, Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>	6 years ago
Colin Patrick McCabe	b6d9e15223	MINOR: Update docs to say 2.3 (#6881 ) Jason Gustafson <jason@confluent.io>	6 years ago
David Arthur	264d1d8a8b	Improve logging in the consumer for epoch updates (#6879 )	6 years ago
Colin P. Mccabe	e6563aab72	KAFKA-4893; Fix deletion and moving of topics with long names Author: Colin P. Mccabe <cmccabe@confluent.io> Reviewers: Gwen Shapira, David Arthur, James Cheng, Vahid Hashemian Closes #6869 from cmccabe/KAFKA-4893	6 years ago
Boyang Chen	055c9c7bd6	KAFKA 8311: better handle timeout exception on Stream thread (#6662 ) The goals for this small diff are: 1. Give user guidance if they want to relax commit timeout threshold 2. Indicate the code path where timeout exception was caught Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>	6 years ago
Matthias J. Sax	ba3dc49437	KAFKA-8155: Add 2.2.0 release to system tests (#6597 ) Reviewers: Bill Bejeck <bill@confluent.io>, Boyang Chen <boyang@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confuent.io>	6 years ago
Hai-Dang Dam	1a3fe9aa52	KAFKA-8404: Add HttpHeader to RestClient HTTP Request and Connector REST API (#6791 ) When Connect forwards a REST request from one worker to another, the Authorization header was not forwarded. This commit changes the Connect framework to add include the authorization header when forwarding requests to other workers. Author: Hai-Dang Dam <damquanghaidang@gmail.com> Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com>	6 years ago
Guozhang Wang	573152dfa8	HOTFIX: Allow multi-batches for old format and no compression (#6871 ) Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
Konstantine Karantasis	55d07e717e	KAFKA-8473: Adjust Connect system tests for incremental cooperative rebalancing (#6872 ) Author: Konstantine Karantasis <konstantine@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com>	6 years ago
cadonna	17712b96c8	KAFKA-6819: Pt. 1 - Refactor thread-level Streams metrics (#6631 ) * StreamsMetricsImpl wraps the Kafka Streams' metrics registry and provides logic to create and register sensors and their corresponding metrics. An example for such logic can be found in threadLevelSensor(). Furthermore, StreamsMetricsmpl keeps track of the sensors on the different levels of an application, i.e., thread, task, etc., and provides logic to remove sensors per level, e.g., removeAllThreadLevelSensors(). There is one StreamsMetricsImpl object per application instance. * ThreadMetrics contains only static methods that specify all built-in thread-level sensors and metrics and provide logic to register and retrieve those thread-level sensors, e.g., commitSensor(). * From anywhere inside the code base with access to StreamsMetricsImpl, thread-level sensors can be accessed by using ThreadMetrics. * ThreadsMetrics does not inherit from StreamsMetricsImpl anymore. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>	6 years ago
José Armando García Sancio	45fae33937	KAFKA-8383; Integration tests for unclean `electLeaders` (#6857 ) Reviewers: Jason Gustafson <jason@confluent.io>	6 years ago
Randall Hauch	ce008e72de	KAFKA-8475: Temporarily restore SslFactory.sslContext() helper Temporarily restore the SslFactory.sslContext() function, which some connectors use. This function is not a public API and it will be removed eventually. For now, we will mark it as deprecated.	6 years ago
Konstantine Karantasis	3c7c988e39	KAFKA-8449: Restart tasks on reconfiguration under incremental cooperative rebalancing (#6850 ) Restart task on reconfiguration under incremental cooperative rebalancing, and keep execution paths separate for config updates between eager and cooperative. Include the group generation in the log message when the worker receives its assignment. Author: Konstantine Karantasis <konstantine@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com>	6 years ago
tadsul	b042b36674	KAFKA-8426; Fix for keeping the ConfigProvider configs consistent with KIP-297 (#6750 ) According to KIP-297 a parameter is passed to ConfigProvider with syntax "config.providers.{name}.param.{param-name}". Currently AbstractConfig allows parameters of the format "config.providers.{name}.{param-name}". With this fix AbstractConfig will be consistent with KIP-297 syntax. Reviewers: Robert Yokota <rayokota@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	6 years ago
tadsul	2c810e4afb	KAFKA-8425: Fix for correctly handling immutable maps (KIP-421 bug) (#6795 ) Since the originals map passed to AbstractConfig constructor may be immutable, avoid updating this map while resolving indirect config variables. Instead a new ResolvingMap instance is now used to store resolved configs. Reviewers: Randall Hauch <rhauch@gmail.com>, Boyang Chen <bchen11@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	6 years ago
Konstantine Karantasis	17345b3be5	KAFKA-8463: Fix redundant reassignment of tasks when leader worker leaves (#6859 ) Author: Konstantine Karantasis <konstantine@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com>	6 years ago
Boyang Chen	87d493f072	KAFKA-8446: Kafka Streams restoration crashes with NPE when the record value is null (#6842 ) When the restored record value is null, we are in danger of NPE during restoration phase. Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	6 years ago
Guozhang Wang	529a5a502e	MINOR: Reordering the props modification with configs construction Reviewers: Randall Hauch <rhauch@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
Florian Hussonnois	78c55c8d66	KAFKA-6958: Overload KStream methods to allow to name operation name using the new Named class (#6411 ) Sub-task required to allow to define custom processor names with KStreams DSL(KIP-307) : - overload methods for stateless operations to accept a Named parameter (filter, filterNot, map, mapValues, foreach, peek, branch, transform, transformValue, flatTransform) - overload process method to accept a Named parameter - overload join/leftJoin/outerJoin methods Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
Lifei Chen	5795675599	MINOR:Replace duplicated code with common function in utils (#6819 ) Reviewers: Ivan Yurchenko <ivanyu@aiven.io>, Matthias J. Sax <matthias@confluent.io>	6 years ago
John Roesler	efa6410611	KAFKA-8199: Implement ValueGetter for Suppress (#6781 ) See also #6684 KTable processors must be supplied with a KTableProcessorSupplier, which in turn requires implementing a ValueGetter, for use with joins and groupings. For suppression, a correct view only includes the previously emitted values (not the currently buffered ones), so this change also involves pushing the Change value type into the suppression buffer's interface, so that it can get the prior value upon first buffering (which is also the previously emitted value). Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>	6 years ago
Guozhang Wang	a4c0b1841a	KAFKA-8389: Remove redundant bookkeeping from MockProcessor (#6761 ) Remove processedKeys / processedValues / processedWithTimestamps as they are covered with processed already. Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>	6 years ago
Matthias J. Sax	55bfea1378	KAFKA-8155: Add 2.1.1 release to system tests (#6596 ) Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>	6 years ago
Alex Diachenko	77a9a108ff	KAFKA-8418: Wait until REST resources are loaded when starting a Connect Worker. (#6840 ) Author: Alex Diachenko <sansanichfb@gmail.com> Reviewers: Arjun Satish <arjun@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>	6 years ago
Matthias J. Sax	77e6e8ec05	KAFKA-6455: Update integration tests to verify result timestamps (#6751 ) Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>	6 years ago
Jason Gustafson	fd9a20e416	KAFKA-8429; Handle offset change when OffsetForLeaderEpoch inflight (#6811 ) It is possible for the offset of a partition to be changed while we are in the middle of validation. If the OffsetForLeaderEpoch request is in-flight and the offset changes, we need to redo the validation after it returns. We had a check for this situation previously, but it was only checking if the current leader epoch had changed. This patch fixes this and moves the validation in `SubscriptionState` where it can be protected with a lock. Additionally, this patch adds test cases for the SubscriptionState validation API. We fix a small bug handling broker downgrades. Basically we should skip validation if the latest metadata does not include leader epoch information. Reviewers: David Arthur <mumrah@gmail.com>	6 years ago
Lifei Chen	9f1ce60a9c	KAFKA-8187: Add wait time for other thread in the same jvm to free the locks (#6818 ) Fix KAFKA-8187: State store record loss across multiple reassignments when using standby tasks. Do not let the thread to transit to RUNNING until all tasks (including standby tasks) are ready. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
A. Sophie Blee-Goldman	932a1b7d7e	MINOR: Extend RocksDB section of Memory Management Docs (#6793 ) Now that we can configure RocksDB to bound the total memory we should include docs describing how, as well as touching on some possible options that should be considered when taking advantage of this feature. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jim Galasyn <jim.galasyn@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	6 years ago
A. Sophie Blee-Goldman	91ccbca4c2	[MINOR] Improve docs for Global Store operations (#6803 ) A lot of confusion seems to have arisen from the StreamBuilder#addGlobalStore(...ProcessorSupplier) method. Users have assumed they can safely use this to transform records before populating their global state store; unfortunately this results in corrupted data as on restore the records are read directly from the source topic changelog, bypassing their custom processor. We should probably provide a means to do this at some point but for the time being we should clarify the proper use of #addGlobalStore as it currently functions Reviewers: Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <bruno@confluent.io>	6 years ago

1 2 3 4 5 ...

6366 Commits (cfea95343dfe9f297bceb98676b7bc04e5035776) All Branches Search

6366 Commits (cfea95343dfe9f297bceb98676b7bc04e5035776)

All Branches