src-kafka

Commit Graph

Author	SHA1	Message	Date
Bruno Cadonna	c1351c34a9	MINOR: Refactor versions in `FutureSubscriptionInfo` (#7849 ) Simplify `FutureSubscriptionInfo` Reviewers: John Roesler <vvcephei@apache.org>	5 years ago
Lee Dongjin	8c64aa080a	MINOR: trivial cleanups - Reformat header: `CustomDeserializerTest`, `ReplicaVerificationToolTest` - Remove unused constructor: `ConsumerGroupDescription` - Remove unused variables in `TimeOrderedKeyValueBufferTest#shouldRestoreV2Format` - Remove deprecated `Number` consturctor calls; use `Number#valueOf` instread. Author: Lee Dongjin <dongjin@apache.org> Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7202 from dongjinleekr/cleanup/201908	5 years ago
Guozhang Wang	a87decb9e4	KAFKA-9113: Extract clients from tasks to record collectors (#7833 ) This is part1 of a series of PRs for task management cleanup: 1. Primarily cleanup MockRecordCollectors: remove unnecessary anonymous inheritance but just consolidate on the NoOpRecordCollector -> renamed to MockRecordCollector. Most relevant changes are unit tests that would be relying on this MockRecordCollector. 2. Let StandbyContextImpl#recordCollector() to return null instead of returning a no-op collector, since in standby tasks we should ALWAYS bypass the logging logic and only use the inner store for restoreBatch. Returning null helps us to realize this assertion failed as NPE as early as possible whereas a no-op collector just hides the bug. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Bruno Cadonna	dbafa07be3	MINOR: Improve javadoc of user-customizable metrics API (#7810 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
wcarlson5	dd8af2b23d	KAFKA-6049: Add session window support for cogroup (#7782 ) Follow up to PR #7538 (KIP-150) Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
wcarlson5	8b57f6cb3a	KAFKA-6049: Add auto-repartitioning for cogroup (#7792 ) Follow up to PR #7538 (KIP-150) Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Bill Bejeck	7a84d93633	KAFKA-8705: Remove parent node after leaving loop to prevent NPE (#7117 ) Fixes case where multiple children merged from a key-changing node causes an NPE. Reviewers: Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>	5 years ago
wcarlson5	d1161bf106	KAFKA-6049: Add time window support for cogroup (#7774 ) Follow up to PR #7538 (KIP-150) Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
John Roesler	717ce42a6d	KAFKA-9138: Add system test for relational joins (#7664 ) Add a system test to verify the new foreign-key join introduced in KIP-213 Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Guozhang Wang	f770b4946e	HOTFIX: Add comment to remind ordering restrictions on RegexSourceIntegrationTest (#7812 )	5 years ago
Bill Bejeck	8968cdd809	MINOR: Remove line with addAll since collection passed into constructor call. (#7807 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>	5 years ago
Bruno Cadonna	bcfe8c76dd	KAFKA-9230: Refactor user-customizable Streams metrics (#7762 ) As proposed in KIP-444, the user customizable metrics shall be refactored. The refactoring consists of: * adding methods addLatencyRateTotalSensor and addRateTotalSensor to interface StreamMetrics * implement the newly added methods in StreamsMetricsImpl * deprecate methods recordThroughput() and recordLatency() in StreamsMetrics Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Bruno Cadonna	e4827fdbe8	MINOR: Fix javadoc of `flatTransform()` (#7803 ) The two missing } in the javadoc of flatTransform() lead to the following javadoc compile warnings: .../KStream.java:1156: warning - End Delimiter } missing for possible See Tag in comment string Reviewers: Bill Bejeck <bbejeck@gmail.com>	5 years ago
ravowlga123	deafc56fed	KAFKA-8953: Rename UsePreviousTimeOnInvalidTimestamp to UsePartitionTimeOnInvalidTimestamp (#7633 ) Implements KIP-530 Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
A. Sophie Blee-Goldman	cd5618f866	MINOR: clarify node grouping of input topics using pattern subscription (#7793 ) Updates the HTML docs and the javadoc. Reviewers: John Roesler <vvcephei@apache.org>	5 years ago
Bill Bejeck	04cd1457ae	MINOR: Convert last streams join test to TTD (#7777 ) Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
gkomissarov	ba365bbb8d	KAFKA-9131: Remove dead code for handling timeout exception (#7635 ) Remove in catch clause and move it to the callback. Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
John Roesler	18c13d38ed	KAFKA-9231: Streams Threads may die from recoverable errors with EOS enabled (#7748 ) Fix three independent causes of threads dying: 1. `ProducerFencedException` isn't properly handed while suspending a task, and leads to the thread dying. 2. `IllegalStateException`: an internal assertion is violated because a store can get orphaned when an exception is thrown during initialization, again leading to the thread dying. 3. `UnknownProducerIdException`: This exception isn't expected by the Streams code, so when we get it, the relevant thread dies. It's not clear whether we always need to catch this, and in the future, we won't expect it at all, but we are catching it now to be sure we're resilient if/when it happens. Important note: this might actually harm performance if the errors turn out to be ignorable, and we will now rebalance instead of ignoring them. Also, add missing test coverage for the exception handling code. Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>	5 years ago
wcarlson5	0b8ea7e162	KAFKA-6049: Add non-windowed Cogroup operator (KIP-150) (#7538 ) Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
Matthias J. Sax	ba02e8c6b6	KAFKA-9244: Update FK reference should unsubscribe old FK (#7758 ) Reviewers: Adam Bellemare <adam.bellemare@wishabi.com>, John Roesler <john@confluent.io>	5 years ago
Bill Bejeck	a50b7c497d	MINOR: Convert Stream-StreamJoin Integration Test to TTD (#7752 ) Convert StreamStreamJoinIntegrationTest to TTD for more stable testing. Reviewers: Matthias J. Sax <mjsax@apache.org>	5 years ago
A. Sophie Blee-Goldman	3daddab8a8	dont spam (#7730 ) This ends up spamming the logs and doesn't seem to be providing much useful information, rather than logging the full task (which includes the entire topology description) we should just log the task id. Reviewers: Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
Bill Bejeck	e499c960e4	MINOR: Updated StreamTableJoinIntegrationTest to use TTD (#7722 ) Convert StreamTableJoinIntegrationTest to use the ToplogyTestDriver to eliminate flakiness and speed up the build. Reviewers: John Roesler <john@confluent.io>	5 years ago
A. Sophie Blee-Goldman	41a9e2c7c3	HOTFIX: safely clear all active state in onPartitionsLost (#7691 ) After a number of last minute bugs were found stemming from the incremental closing of lost tasks in StreamsRebalanceListener#onPartitionsLost, a safer approach to this edge case seems warranted. We initially wanted to be as "future-proof" as possible, and avoid baking further protocol assumptions into the code that may be broken as the protocol evolves. This meant that rather than simply closing all active tasks and clearing all associated state in #onPartitionsLost(lostPartitions) we would loop through the lostPartitions/lost tasks and remove them one by one from the various data structures/assignments, then verify that everything was empty in the end. This verification in particular has caused us significant trouble, as it turns out to be nontrivial to determine what should in fact be empty, and if so whether it is also being correctly updated. Therefore, before worrying about it being "future-proof" it seems we should make sure it is "present-day-proof" and implement this callback in the safest possible way, by blindly clearing and closing all active task state. We log all the relevant state (at debug level) before clearing it, so we can at least tell from the logs whether/which emptiness checks were being violated. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Andrew Choi <andchoi@linkedin.com>	5 years ago
Bruno Cadonna	19681f6b95	KAFKA-9086: Refactor processor-node-level metrics (#7615 ) Refactors metrics according to KIP-444 Introduces ProcessorNodeMetrics as a central provider for processor node metrics Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
Bruno Cadonna	dcf660a2a0	HOTFIX: Fix unit tests that failed when executed from IDE (#7707 ) Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Alex Kokachev	7f5c380c34	KAFKA-9011: Removed multiple calls to supplier.get() in order to avoid multiple transformer instances being created. (#7685 ) This is a followup PR for #7520 to address issue of multiple calls to get() as it was pointed out by @bbejeck in #7520 (comment) Reviewers: Bill Bejeck <bbejeck@gmail.com>	5 years ago
John Roesler	cac85601a0	KAFKA-9169: fix standby checkpoint initialization (#7681 ) Instead of caching the checkpoint map during StandbyTask initialization, use the latest checkpoints (which would have been updated during suspend). Reviewers: Bill Bejeck <bill@confluent.io>	5 years ago
Alex Kokachev	9a125a72a2	KAFKA-9011: Scala bindings for flatTransform and flatTransformValues in KStream (#7520 ) Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
John Roesler	38ed69241f	MINOR: clarify KafkaStreams.close javadoc (#7605 ) Reviewers: Bill Bejeck <bill@confluent.io>, Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
A. Sophie Blee-Goldman	1cd5d0da1d	MINOR: clarify statefulness in transform/process variations (#7668 ) Reviewer: Almog Gavra <almog@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Bill Bejeck	4deb80676e	KAFKA-9098: When users name repartition topic, use the name for the repartition filter, source and sink node. (#7598 ) When users specify a name for a repartition topic, we should use the same name for the repartition filter, source, and sink nodes. With the addition of KIP-307 if users go to the effort of naming every node in the topology having processor nodes with generated names is inconsistent behavior. Updated tests in the streams test suite. Reviewers: John Roesler <john@confluent.io>, Christopher Pettitt <cpettitt@confluent.io>	5 years ago
Guozhang Wang	4283fd640c	MINOR: Return null in key mapping of committed (#7659 ) To be consistent with other grouping APIs, and also modified callers accordingly. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Bill Bejeck	e3f5d308aa	MINOR: Added one test and some clarifying comments on tests with simulated EOS (#7626 ) Added one test and some comments to clarify how EOS is "enabled" for some of the tests. Ran all streams tests. Reviewers: Matthias J. Sax <mjsax@apache.org>	5 years ago
John Roesler	731018222c	[MINOR] Clean up PartitionAssignor for KIP-441 (#7649 ) On-the-side cleanups extracted from the PR for KAFKA-9103, so that the actual PR can be as small as possible. Reviewers: Christopher Pettitt <cpettitt@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
A. Sophie Blee-Goldman	6b905ade0c	HOTFIX: Remove from restoringByPartition once restored (#7631 ) Minor follow up to #7608: For some reason the AssignedStreamTasks#updateRestored method only updates the restoring and restoredPartitions data structures, but there is a third map holding restored tasks & partitions: restoringByPartitions Also improves the TaskManager#closeLostTasks logging, by separating by case and logging the specific failure before throwing. Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
A. Sophie Blee-Goldman	d61b0c131c	KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance (#7620 ) Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance. Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered. Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Matthias J. Sax	2421a69556	MINOR: Fix Kafka Streams JavaDocs with regard to new StreamJoined class (#7627 ) Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
Bruno Cadonna	96d95e947a	MINOR: Fix sensor retrieval in stand0by task's constructor (#7632 ) We should not use StreamsMetricsImpl. threadLevelSensor directly which would only retrieve the sensor but would not add any metrics to the sensor. Generally speaking we should always use the corresponding-level Metrics class (e.g. ThreadMetrics) to get the sensors which are populated with metrics. Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>, Matthias J. Sax <matthias@confluent.io>	5 years ago
John Roesler	4a5155c934	KAFKA-8868: Generate SubscriptionInfo protocol message (#7248 ) Rather than maintain hand coded protocol serialization code, Streams could use the same code-generation framework as Clients/Core. There isn't a perfect match, since the code generation framework includes an assumption that you're generating "protocol messages", rather than just arbitrary blobs, but I think it's close enough to justify using it, and improving it over time. Using the code generation allows us to drop a lot of detail-oriented, brittle, and hard-to-maintain serialization logic in favor of a schema spec. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	f65c2acad7	KAFKA-8972 (2.4 blocker): bug fix for restoring task (#7617 ) This is a typo bug which is due to calling a wrong map. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Bruno Cadonna	fc0f82372e	KAFKA-8980: Refactor state-store-level streams metrics (#7584 ) Refactors metrics according to KIP-444 Introduces StateStoreMetrics as a central provider for state store metrics Adds metric scope (a.k.a. store type) to the in-memory suppression buffer Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
A. Sophie Blee-Goldman	cb5c7313b4	KAFKA-8972 (2.4 blocker): clear all state for zombie task on TaskMigratedException (#7608 ) Third bugfix for the failing broker bounce system test with cooperative rebalancing: tl;dr We need to remove everything associated with a task when it is closed, but in some cases (eg AssignedTasks#commit) on a TaskMigratedExceptionwe would close it as a zombie and then (only) remove the taskId from therunning` map. This left its partitions, restorers, state stores, etc around and in an undefined state, causing exceptions when closing and/or opening the stores again. Longer explanation: In AssignedTasks (the abstract class from which the standby and active task variations extend) a commit failure (even due to broker down/unavailable) is treated as a TaskMigratedException after which the failed task is closed as a zombie and removed from running -- the remaining tasks (ie those still in running are then also closed as zombies in the subsequent onPartitionsLost However we do not remove the closed task from runningByPartition nor do we remove the corresponding changelogs, if restoring, from the StoreChangelogReader since that applies only to active tasks, and AssignedTasks is generic/abstract. The changelog reader then retains a mapping from the closed task's changelog partition to its CompositeRestoreListener (and does not replace this when the new one comes along after the rebalance). The restore listener has a reference to a specific RocksDBStore instance, one which was closed when the task was closed as a zombie, so it accidentally tries to restore to the "old" RocksDBStore instance rather than the new one that was just opened. Although technically this bug existed before KIP-429, it was only uncovered now that we remove tasks and clear their state/partitions/etc one at a time. We don't technically need to cherrypick the fix back earlier as before we just blindly clear all data structures entirely during an eager rebalance. Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	465f810730	KAFKA-8972 (2.4 blocker): correctly release lost partitions during consumer.unsubscribe() (#7441 ) Inside onLeavePrepare we would look into the assignment and try to revoke the owned tasks and notify users via RebalanceListener#onPartitionsRevoked, and then clear the assignment. However, the subscription's assignment is already cleared in this.subscriptions.unsubscribe(); which means user's rebalance listener would never be triggered. In other words, from consumer client's pov nothing is owned after unsubscribe, but from the user caller's pov the partitions are not revoked yet. For callers like Kafka Streams which rely on the rebalance listener to maintain their internal state, this leads to inconsistent state management and failure cases. Before KIP-429 this issue is hidden away since every time the consumer re-joins the group later, it would still revoke everything anyways regardless of the passed-in parameters of the rebalance listener; with KIP-429 this is easier to reproduce now. Our fixes are following: • Inside unsubscribe, first do onLeavePrepare / maybeLeaveGroup and then subscription.unsubscribe. This we we are guaranteed that the streams' tasks are all closed as revoked by then. • [Optimization] If the generation is reset due to fatal error from join / hb response etc, then we know that all partitions are lost, and we should not trigger onPartitionRevoked, but instead just onPartitionsLost inside onLeavePrepare. This is because we don't want to commit for lost tracks during rebalance which is doomed to fail as we don't have any generation info. Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
A. Sophie Blee-Goldman	56bc507485	MINOR: improve logging of tasks on shutdown (#7597 ) Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Bruno Cadonna	27ba8f5a39	KAFKA-8968: Refactor task-level metrics (#7566 ) Introduces TaskMetrics class Introduces dropped-records Replaces skipped-records with dropped-records with latest built-in metrics version Does not add standby-process-ratio and active-process-ratio Does not refactor parent sensors for processor node metrics Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
A. Sophie Blee-Goldman	5987c76153	KAFKA-8972: Need to flush state even on unclean close (#7589 ) In the case of unclean close we still need to make sure all the stores are flushed before closing any. Reviewers: Matthias J. Sax <matthias@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Bill Bejeck	c015169aa6	MINOR: Streams upgrade system test cleanup (#7571 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>,	5 years ago
Chris Pettitt	6975f1dfa9	KAFKA-8700: Flaky Test QueryableStateIntegrationTest#queryOnRebalance (#7548 ) This is not guaranteed to actually fix queryOnRebalance, since the failure could never be reproduced locally. I did not bump timeouts because it looks like that has been done in the past for this test without success. Instead this change makes the following improvements: It waits for the application to be in a RUNNING state before proceeding with the test. It waits for the remaining instance to return to RUNNING state within a timeout after rebalance. I observed once that we were able to do the KV queries but the instance was still in REBALANCING, so this should reduce some opportunity for flakiness. The meat of this change: we now iterate over all keys in one shot (vs. one at a time with a timeout) and collect various failures, all of which are reported at the end. This should help us to narrow down the cause of flakiness if it shows up again. Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Guozhang Wang	6d8da96ba8	MINOR: Reset timer when all the buffer is drained and empty (#7573 ) For scenarios where the incoming traffic of all input partitions are small, there's a pitfall that the enforced processing timer is not reset after we have enforce processed ALL records. The fix itself is pretty simple: we just reset the timer when there's no buffered records. Reviewers: Javier Holguera <javier.holguera@gmail.com>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>	5 years ago

1 2 3 4 5 ...

1586 Commits (ea72edebf2d484e42a4251c53cd6e383743b5d1a)