src-kafka

Commit Graph

Author	SHA1	Message	Date
Boyang Chen	1de9981fe3	KAFKA-9581: Remove rebalance exception withholding (#8145 ) The rebalance exception withholding is no longer necessary as we have better mechanism for catching and wrapping these exceptions. Throw them directly should be fine and simplify our current error handling. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Matthias J. Sax	7d504d668f	HOTFIX: fix NPE in Kafka Streams IQ (#8158 ) Reviewers: Vito Jeng <vito@is-land.com.tw>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Matthias J. Sax	97d107a270	KAFKA-9441: Add internal TransactionManager (#8105 ) Upfront refactoring for KIP-447. Introduces `StreamsProducer` that allows to share a producer over multiple tasks and track the TX status. Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Guozhang Wang	003dce5d51	MINOR: Standby task commit needed when offsets updated (#8146 ) This is a minor fix of a regression introduced in the refactoring PR: in current trunk standbyTask#commitNeeded always return false, which would cause standby tasks to never be committed until closed. To go back to the old behavior we would return true when new data has been applied and offsets being updated. Reviewers: Boyang Chen <boyang@confluent.io>, John Roesler <john@confluent.io>	5 years ago
Guozhang Wang	3b6573c150	KAFKA-9481: Graceful handling TaskMigrated and TaskCorrupted (#8058 ) 1. Removed task field from TaskMigrated; the only caller that encodes a task id from StreamTask actually do not throw so we only log it. To handle it on StreamThread we just always enforce rebalance (and we would call onPartitionsLost to remove all tasks as dirty). 2. Added TaskCorruptedException with a set of task-ids. The first scenario of this is the restoreConsumer.poll which throws InvalidOffset indicating that the logs are truncated / compacted. To handle it on StreamThread we first close the corresponding tasks as dirty (if EOS is enabled we would also wipe out the state stores), and then revive them into the CREATED state. 3. Also fixed a bug while investigating KAFKA-9572: when suspending / closing a restoring task we should not commit the new offsets but only updating the checkpoint file. 4. Re-enabled the unit test.	5 years ago
A. Sophie Blee-Goldman	0d16c26e3c	HOTFIX: don't try to remove uninitialized changelogs from assignment & don't prematurely mark task closed (#8140 ) This fixes two issues which together caused the soak to crash/some test to fail occasionally. What happened was: In the main StreamThread loop we initialized a new task in TaskManager#checkForCompletedRestoration which includes registering, but not initializing, its changelogs. We then complete the loop and call poll, which resulted in a rebalance that revoked the newly-initialized task. In TaskManager#handleAssignment we then closed the task cleanly and go to remove the changelogs from the StoreChangelogReader only to get an IllegalStateException because the changelog partitions were not in the restore consumer's assignment (due to being uninitialized). This by itself should^ be a recoverable error, as we catch exceptions here and retry closing the task as unclean. Of course the task actually was successfully closed (clean) so we now get an unexpected exception Illegal state CLOSED while closing active task The fix(es) I'd propose are: 1. Keep the restore consumer's assignment in sync with the registered changelogs, ie the set ChangelogReader#changelogs but pause them until they are initialized edit: since the consumer does still perform some actions (gg fetches) on paused partitions, we should avoid adding uninitialized changelogs to the restore consumer's assignment. Instead, we should just skip them when removing. 2. Move the StoreChangelogReader#remove call to before the task.closeClean so that the task is only marked as closed if everything was successful. We should do so regardless, as we should (attempt to) remove the changelogs even if the clean close failed and we must do unclean. Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Michael Viamari	a41d3d86c1	KAFKA-9533: ValueTransform forwards `null` values (#8108 ) Fixes a bug where KStream#transformValues would forward null values from the provided ValueTransform#transform operation. A test was added for verification KStreamTransformValuesTest#shouldEmitNoRecordIfTransformReturnsNull. A parallel test for non-key ValueTransformer was not added, as they share the same code path. Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
Matthias J. Sax	ebcdcd9fa9	KAFKA-8025: Fix flaky RocksDB test (#8126 ) Reviewers: Bill Bejeck <bill@confluent.io>	5 years ago
Boyang Chen	97c5dc1c13	KAFKA-9545: Fix subscription bugs from Stream refactoring (#8109 ) This PR fixes two bugs related to stream refactoring: 1. The subscribed topics are not updated correctly when topic gets removed from broker. 2. The remainingPartitions computation doesn't account the case when one task has a pattern subscription of multiple topics. Then the input partition change will not be assumed as containsAll The bugs are exposed from integration test testRegexMatchesTopicsAWhenDeleted and could be used to verify the fix works. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Guozhang Wang	d8756e81c5	KAFKA-9274: Gracefully handle timeout exception (#8060 ) 1. Delay the initialization (producer.initTxn) from construction to maybeInitialize; if it times out we just swallow and retry in the next iteration. 2. If completeRestoration (consumer.committed) times out, just swallow and retry in the next iteration. 3. For other calls (producer.partitionsFor, producer.commitTxn, consumer.commit), treat the timeout exception as fatal. Reviewers: Matthias J. Sax <matthias@confluent.io>	5 years ago
John Roesler	8d0b069b0f	KAFKA-9557: correct thread process-rate sensor to measure throughput (#8112 ) Correct the process-rate (and total) sensor to measure throughput (and total record processing count). Reviewers: Guozhang Wang <guozhang@confluent.io>	5 years ago
vinoth chandar	37d0b12cfc	KAFKA-9512: Flaky Test LagFetchIntegrationTest.shouldFetchLagsDuringRestoration (#8076 ) - Added additional synchronization and increased timeouts to handle flakiness - Added some pre-cautionary retries when trying to obtain lag map Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Xavier Léauté	7e1c39f75a	KAFKA-9106 make metrics exposed via jmx configurable (#7674 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
Matthias J. Sax	aa0d0ec32a	KAFKA-6607: Commit correct offsets for transactional input data (#8091 ) Reviewers: Guozhang Wang <guozhang@confluent.io>	5 years ago
John Roesler	1681c78f60	KAFKA-9500: Fix FK Join Topology (#8015 ) Corrects a flaw leading to an exception while building topologies that include both: * A foreign-key join with the result not explicitly materialized * An operation after the join that requires source materialization Also corrects a flaw in TopologyTestDriver leading to output records being enqueued in the wrong order under some (presumably rare) circumstances. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
John Roesler	998f1520f9	KAKFA-9503: Fix TopologyTestDriver output order (#8065 ) Migrates TopologyTestDriver processing to be closer to the same processing/ordering semantics as KafkaStreams. This corrects the output order for recursive topologies as reported in KAFKA-9503, and also works similarly in the case of task idling.	5 years ago
Bruno Cadonna	cde6d18983	KAFKA-9355: Fix bug that removed RocksDB metrics after failure in EOS (#7996 ) * Added init() method to RocksDBMetricsRecorder * Added call to init() of RocksDBMetricsRecorder to init() of RocksDB store * Added call to init() of RocksDBMetricsRecorder to openExisting() of segmented state stores * Adapted unit tests * Added integration test that reproduces the situation in which the bug occurred Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
John Roesler	e16859dc48	KAFKA-9390: Make serde pseudo-topics unique (#8054 ) During the discussion for KIP-213, we decided to pass "pseudo-topics" to the internal serdes we use to construct the wrapper serdes for CombinedKey and hashing the left-hand-side value. However, during the implementation, this strategy wasn't fully implemented, and we wound up using the same topic name for a few different data types. Reviewers: Guozhang Wang <guozhang@confluent.io>	5 years ago
high.lee	dc89c86d43	KAFKA-9483: Add Scala KStream#toTable to the Streams DSL (#8024 ) Part of KIP-523 Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>	5 years ago
Matthias J. Sax	50aead64b9	MINOR: fix and improve StreamsConfig JavaDocs (#8086 ) Reviewer: John Roesler <john@confluent.io>	5 years ago
John Roesler	520a76155c	KAFKA-9517: Fix default serdes with FK join (#8061 ) During the KIP-213 implementation and verification, we neglected to test the code path for falling back to default serdes if none are given in the topology. Reviewer: Bill Bejeck <bbejeck@gmail.com>	5 years ago
Boyang Chen	ff8c40ccb6	KAFKA-9523: Migrate BranchedMultiLevelRepartitionConnectedTopologyTest into a unit test (#8081 ) Relying on integration test to catch an algorithm bug introduces more flakiness, reduce the test into a unit test to reduce the flakiness until we upgrade Java/Scala libs. Checked the test shall fail with older version of StreamsPartitionAssignor. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Bruno Cadonna	3dfc6c15e4	KAFKA-9480: Fix bug that prevented to measure task-level process-rate (#8018 ) Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Guozhang Wang	e70e5d913a	KAFKA-9505: Only loop over topics-to-validate in retries (#8039 ) Found this bug from the repeated flaky runs of system tests, it seems to be long lurking but also would only happen if there are frequent rebalances / topic creation within a short time, which is exactly the case in some of our smoke system tests. Also added a unit test. Reviewers: Boyang Chen <boyang@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Navinder Pal Singh Brar	d76fa1b22d	KAFKA-9487: Follow-up PR of Kafka-9445 (#8033 ) Follows up on the original PR for KAFKA-9445 to address a final round of feedback Reviewers: John Roesler <vvcephei@apache.org>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Matthias J. Sax	059a81e3c9	KAFKA-7658: Follow up to original PR (#8027 ) Follow up to original PR #7985 for KIP-523 (adding `KStream#toTable()` operator) - improve JavaDocs - add more unit tests - fix bug for auto-repartitioning - some code cleanup Reviewers: High Lee <yello1109@daum.net>, John Roesler <john@confluent.io>	5 years ago
Guozhang Wang	a6c9e96bd3	HOTFIX: Fix two test failures in JDK11 (#8063 ) 1. StoreChangelogReaderTest.shouldRequestCommittedOffsetsAndHandleTimeoutException[1] This is due to stricter ternary operator type casting 2. KStreamImplTest.shouldSupportTriggerMaterializedWithKTableFromKStream This is added recently where String typed values for <String, Integer>, in J8 it is allowed but in J11 it is not allowed. Reviewers: John Roesler <john@confluent.io>	5 years ago
A. Sophie Blee-Goldman	f698f3f840	MINOR: further InternalTopologyBuilder cleanup (#8046 ) Followup to KAFKA-7317 and KAFKA-9113, there's some additional cleanup we can do in InternalTopologyBuilder. Mostly refactors the subscription code to make the initialization more explicit and reduce some duplicated code in the update logic. Also some minor cleanup of the build method. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Guozhang Wang	5380938f8b	MINOR: Add timer for update limit offsets (#8047 ) Instead of always try to update committed offset limits as long as there are buffered records for standby tasks, we leverage on the commit interval to reduce our consumer.committed frequency. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, John Roesler <john@confluent.io>	5 years ago
Guozhang Wang	7ea636c661	HOTFIX: checkstyle for newly added unit test	5 years ago
Daniel Beskin	bdd0a9299f	MINOR: Fixing null handilg in ValueAndTimestampSerializer (#7679 ) Since ValueAndTimestampSerializer wraps an unknown Serializer, the output of that Serializer can be null. In which case the line .allocate(rawTimestamp.length + rawValue.length) will throw a NullPointerException. This pull request returns null instead. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Guozhang Wang	4090f9a2b0	KAFKA-9113: Clean up task management and state management (#7997 ) This PR is collaborated by Guozhang Wang and John Roesler. It is a significant tech debt cleanup on task management and state management, and is broken down by several sub-tasks listed below: Extract embedded clients (producer and consumer) into RecordCollector from StreamTask. guozhangwang#2 guozhangwang#5 Consolidate the standby updating and active restoring logic into ChangelogReader and extract out of StreamThread. guozhangwang#3 guozhangwang#4 Introduce Task state life cycle (created, restoring, running, suspended, closing), and refactor the task operations based on the current state. guozhangwang#6 guozhangwang#7 Consolidate AssignedTasks into TaskManager and simplify the logic of changelog management and task management (since they are already moved in step 2) and 3)). guozhangwang#8 guozhangwang#9 Also simplified the StreamThread logic a bit as the embedded clients / changelog restoration logic has been moved into step 1) and 2). guozhangwang#10 Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>	5 years ago
Matthias J. Sax	8bb962d66f	KAFKA-9490: Fix generics for Grouped (#8028 ) Reviewers: Andrew Choi <andchoi@linkedin.com>, John Roesler <john@confluent.io>	5 years ago
David Arthur	7e776b0462	Bump trunk to 2.6.0-SNAPSHOT (#8026 )	5 years ago
Charles Feduke	5ddab1b60c	MINOR: updated documentation where RocksDBStore was being used as the sample class for byte[] versus Bytes examples (#5884 ) Co-authored-by: Guozhang Wang <wangguoz@gmail.com>	5 years ago
high.lee	6b86af3a27	KAFKA-7658: Add KStream#toTable to the Streams DSL (#7985 ) Implements KIP-523. Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
Navinder Pal Singh Brar	05b2361c04	KAFKA-9445: Allow adding changes to allow serving from a specific partition (#7984 ) Implements KIP-562. Reviewers: Vinoth Chandar <vchandar@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
highluck	31ef2b9add	MINOR: Remove unused fields in StreamsMetricsImpl (#7992 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Ted Yu	b50e213eeb	MINOR: Fix topology builder debug log message (#8005 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Ron Dagostino	a3509c0870	MINOR: MiniKdc JVM shutdown hook fix (#7946 ) Also made all shutdown hooks consistent and added tests Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
highluck	2e351e06b3	KAFKA-9152; Improve Sensor Retrieval (#7928 ) This ticket shall improve two aspects of the retrieval of sensors: https://issues.apache.org/jira/browse/KAFKA-9152 Currently, when a sensor is retrieved with Metrics.Sensor() (e.g. ThreadMetrics.createTaskSensor()) after it was created with the same method Metrics.Sensor(), the sensor is added again to the corresponding queue in Sensors (e.g. threadLevelSensors) in StreamsMetricsImpl. Those queues are used to remove the sensors when removeAllLevelSensors() is called. Having multiple times the same sensors in this queue is not an issue from a correctness point of view. However, it would reduce the footprint to only store a sensor once in those queues. When a sensor is retrieved, the current code attempts to create a new sensor and to add to it again the corresponding metrics. This could be avoided. Both aspects could be improved by checking whether a sensor already exists by calling getSensor() on the Metrics object and checking the return value. Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
A. Sophie Blee-Goldman	57b2f6807d	KAFKA-7317: Use collections subscription for main consumer to reduce metadata (#7969 ) Also addresses KAFKA-8821 Note that we still have to fall back to using pattern subscription if the user has added any regex-based source nodes to the topology. Includes some minor cleanup on the side Reviewers: Bill Bejeck <bbejeck@gmail.com>	5 years ago
Levani Kokhreidze	21df6eeda9	MINOR: Use Math.min for StreamsPartitionAssignor#updateMinReceivedVersion method (#7954 ) Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
John Roesler	cd4ab4189e	MINOR: fix flaky StreamsUpgradeTestIntegrationTest (#7974 ) Make the test resilient to rebalance timing. Reviewed-by: Guozhang Wang <wangguoz@gmail.com>	5 years ago
vinoth chandar	bbd3348dcb	KAFKA-9431: Expose API in KafkaStreams to fetch all local offset lags (#7961 ) Add a new method to KafkaStreams to return an estimate of the lags for all partitions of all local stores. Implements: KIP-535 Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com> Reviewed-by: John Roesler <vvcephei@apache.org>	5 years ago
vinoth chandar	0c76fbbbed	KAFKA-6144: IQ option to query standbys (#7962 ) Add a new overload of KafkaStreams#store that allows users to query standby and restoring stores in addition to active ones. Closes: #7962 Implements: KIP-535 Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com> Reviewed-by: John Roesler <vvcephei@apache.org>	5 years ago
vinoth chandar	71c5729a41	KAFKA-6144: Add KeyQueryMetadata APIs to KafkaStreams (#7960 ) Deprecate existing metadata query APIs in favor of new ones that include standby hosts as well as partition information. Closes: #7960 Implements: KIP-535 Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com> Reviewed-by: John Roesler <vvcephei@apache.org>	5 years ago
Matthias J. Sax	81fcb80924	KAFKA-9294: Add tests for Named parameter (#7927 ) Part 2 -- tests for stateful KStream operators Reviewers: Bill Bejeck <bill@confluent.io>	5 years ago
Matthias J. Sax	be4f50e7fd	MINOR: JavaDoc cleanup (#7873 ) Reviewers: Bill Bejeck <bill@confluent.io>	5 years ago
Guozhang Wang	505e8240cd	KAFKA-8421: Still return data during rebalance (#7312 ) Not wait until updateAssignmentMetadataIfNeeded returns true, but only call it once with 0 timeout. Also do not return empty if in rebalance. Trim the pre-fetched records after long polling since assignment may have been changed. Also need to update SubscriptionState to retain the state in assignFromSubscribed if it already exists (similar to assignFromUser), so that we do not need the transition of INITIALIZING to FETCHING. Unit test: this actually took me the most time :) Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Sophie Blee-Goldman <sophie@confluent.io>, Jason Gustafson <jason@confluent.io>, Richard Yu <yohan.richard.yu@gmail.com>, dengziming <dengziming1993@gmail.com>	5 years ago

1 2 3 4 5 ...

1599 Commits (384eb16805ef8f7520f46cc6def25e71495bd6bd)