This is part1 of a series of PRs for task management cleanup:
1. Primarily cleanup MockRecordCollectors: remove unnecessary anonymous inheritance but just consolidate on the NoOpRecordCollector -> renamed to MockRecordCollector. Most relevant changes are unit tests that would be relying on this MockRecordCollector.
2. Let StandbyContextImpl#recordCollector() to return null instead of returning a no-op collector, since in standby tasks we should ALWAYS bypass the logging logic and only use the inner store for restoreBatch. Returning null helps us to realize this assertion failed as NPE as early as possible whereas a no-op collector just hides the bug.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Fixes case where multiple children merged from a key-changing node causes an NPE.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>
As proposed in KIP-444, the user customizable metrics shall be refactored. The refactoring consists of:
* adding methods addLatencyRateTotalSensor and addRateTotalSensor to interface StreamMetrics
* implement the newly added methods in StreamsMetricsImpl
* deprecate methods recordThroughput() and recordLatency() in StreamsMetrics
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The two missing } in the javadoc of flatTransform() lead to the
following javadoc compile warnings:
.../KStream.java:1156: warning - End Delimiter } missing for possible See Tag in comment string
Reviewers: Bill Bejeck <bbejeck@gmail.com>
Remove in catch clause and move it to the callback.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Fix three independent causes of threads dying:
1. `ProducerFencedException` isn't properly handed while suspending a task, and leads to the thread dying.
2. `IllegalStateException`: an internal assertion is violated because a store can get orphaned when an exception is thrown during initialization, again leading to the thread dying.
3. `UnknownProducerIdException`: This exception isn't expected by the Streams code, so when we get it, the relevant thread dies. It's not clear whether we always need to catch this, and in the future, we won't expect it at all, but we are catching it now to be sure we're resilient if/when it happens. Important note: this might actually harm performance if the errors turn out to be ignorable, and we will now rebalance instead of ignoring them.
Also, add missing test coverage for the exception handling code.
Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
This ends up spamming the logs and doesn't seem to be providing much useful information, rather than logging the full task (which includes the entire topology description) we should just log the task id.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Convert StreamTableJoinIntegrationTest to use the ToplogyTestDriver to eliminate flakiness and speed up the build.
Reviewers: John Roesler <john@confluent.io>
After a number of last minute bugs were found stemming from the incremental closing of lost tasks in StreamsRebalanceListener#onPartitionsLost, a safer approach to this edge case seems warranted. We initially wanted to be as "future-proof" as possible, and avoid baking further protocol assumptions into the code that may be broken as the protocol evolves. This meant that rather than simply closing all active tasks and clearing all associated state in #onPartitionsLost(lostPartitions) we would loop through the lostPartitions/lost tasks and remove them one by one from the various data structures/assignments, then verify that everything was empty in the end. This verification in particular has caused us significant trouble, as it turns out to be nontrivial to determine what should in fact be empty, and if so whether it is also being correctly updated.
Therefore, before worrying about it being "future-proof" it seems we should make sure it is "present-day-proof" and implement this callback in the safest possible way, by blindly clearing and closing all active task state. We log all the relevant state (at debug level) before clearing it, so we can at least tell from the logs whether/which emptiness checks were being violated.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Andrew Choi <andchoi@linkedin.com>
Refactors metrics according to KIP-444
Introduces ProcessorNodeMetrics as a central provider for processor node metrics
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
This is a followup PR for #7520 to address issue of multiple calls to get() as it was pointed out by @bbejeck in #7520 (comment)
Reviewers: Bill Bejeck <bbejeck@gmail.com>
Instead of caching the checkpoint map during StandbyTask
initialization, use the latest checkpoints (which would have
been updated during suspend).
Reviewers: Bill Bejeck <bill@confluent.io>
When users specify a name for a repartition topic, we should use the same name for the repartition filter, source, and sink nodes. With the addition of KIP-307 if users go to the effort of naming every node in the topology having processor nodes with generated names is inconsistent behavior.
Updated tests in the streams test suite.
Reviewers: John Roesler <john@confluent.io>, Christopher Pettitt <cpettitt@confluent.io>
Added one test and some comments to clarify how EOS is "enabled" for some of the tests.
Ran all streams tests.
Reviewers: Matthias J. Sax <mjsax@apache.org>
On-the-side cleanups extracted from the PR for KAFKA-9103, so that the actual PR can be as small as possible.
Reviewers: Christopher Pettitt <cpettitt@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Minor follow up to #7608: For some reason the AssignedStreamTasks#updateRestored method only updates the restoring and restoredPartitions data structures, but there is a third map holding restored tasks & partitions: restoringByPartitions
Also improves the TaskManager#closeLostTasks logging, by separating by case and logging the specific failure before throwing.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance.
Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered.
Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
We should not use StreamsMetricsImpl. threadLevelSensor directly which would only retrieve the sensor but would not add any metrics to the sensor. Generally speaking we should always use the corresponding-level Metrics class (e.g. ThreadMetrics) to get the sensors which are populated with metrics.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Rather than maintain hand coded protocol serialization code, Streams could use the same code-generation framework as Clients/Core.
There isn't a perfect match, since the code generation framework includes an assumption that you're generating "protocol messages", rather than just arbitrary blobs, but I think it's close enough to justify using it, and improving it over time.
Using the code generation allows us to drop a lot of detail-oriented, brittle, and hard-to-maintain serialization logic in favor of a schema spec.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Refactors metrics according to KIP-444
Introduces StateStoreMetrics as a central provider for state store metrics
Adds metric scope (a.k.a. store type) to the in-memory suppression buffer
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
Third bugfix for the failing broker bounce system test with cooperative rebalancing:
tl;dr We need to remove everything associated with a task when it is closed, but in some cases (eg AssignedTasks#commit) on a TaskMigratedExceptionwe would close it as a zombie and then (only) remove the taskId from therunning` map. This left its partitions, restorers, state stores, etc around and in an undefined state, causing exceptions when closing and/or opening the stores again.
Longer explanation:
In AssignedTasks (the abstract class from which the standby and active task variations extend) a commit failure (even due to broker down/unavailable) is treated as a TaskMigratedException after which the failed task is closed as a zombie and removed from running -- the remaining tasks (ie those still in running are then also closed as zombies in the subsequent onPartitionsLost
However we do not remove the closed task from runningByPartition nor do we remove the corresponding changelogs, if restoring, from the StoreChangelogReader since that applies only to active tasks, and AssignedTasks is generic/abstract. The changelog reader then retains a mapping from the closed task's changelog partition to its CompositeRestoreListener (and does not replace this when the new one comes along after the rebalance). The restore listener has a reference to a specific RocksDBStore instance, one which was closed when the task was closed as a zombie, so it accidentally tries to restore to the "old" RocksDBStore instance rather than the new one that was just opened.
Although technically this bug existed before KIP-429, it was only uncovered now that we remove tasks and clear their state/partitions/etc one at a time. We don't technically need to cherrypick the fix back earlier as before we just blindly clear all data structures entirely during an eager rebalance.
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Inside onLeavePrepare we would look into the assignment and try to revoke the owned tasks and notify users via RebalanceListener#onPartitionsRevoked, and then clear the assignment.
However, the subscription's assignment is already cleared in this.subscriptions.unsubscribe(); which means user's rebalance listener would never be triggered. In other words, from consumer client's pov nothing is owned after unsubscribe, but from the user caller's pov the partitions are not revoked yet. For callers like Kafka Streams which rely on the rebalance listener to maintain their internal state, this leads to inconsistent state management and failure cases.
Before KIP-429 this issue is hidden away since every time the consumer re-joins the group later, it would still revoke everything anyways regardless of the passed-in parameters of the rebalance listener; with KIP-429 this is easier to reproduce now.
Our fixes are following:
• Inside unsubscribe, first do onLeavePrepare / maybeLeaveGroup and then subscription.unsubscribe. This we we are guaranteed that the streams' tasks are all closed as revoked by then.
• [Optimization] If the generation is reset due to fatal error from join / hb response etc, then we know that all partitions are lost, and we should not trigger onPartitionRevoked, but instead just onPartitionsLost inside onLeavePrepare. This is because we don't want to commit for lost tracks during rebalance which is doomed to fail as we don't have any generation info.
Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Introduces TaskMetrics class
Introduces dropped-records
Replaces skipped-records with dropped-records with latest built-in
metrics version
Does not add standby-process-ratio and active-process-ratio
Does not refactor parent sensors for processor node metrics
Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
In the case of unclean close we still need to make sure all the stores are flushed before closing any.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
This is not guaranteed to actually fix queryOnRebalance, since the
failure could never be reproduced locally. I did not bump timeouts
because it looks like that has been done in the past for this test
without success. Instead this change makes the following improvements:
It waits for the application to be in a RUNNING state before
proceeding with the test.
It waits for the remaining instance to return to RUNNING state
within a timeout after rebalance. I observed once that we were able to
do the KV queries but the instance was still in REBALANCING, so this
should reduce some opportunity for flakiness.
The meat of this change: we now iterate over all keys in one shot
(vs. one at a time with a timeout) and collect various failures, all of
which are reported at the end. This should help us to narrow down the
cause of flakiness if it shows up again.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
For scenarios where the incoming traffic of all input partitions are small, there's a pitfall that the enforced processing timer is not reset after we have enforce processed ALL records. The fix itself is pretty simple: we just reset the timer when there's no buffered records.
Reviewers: Javier Holguera <javier.holguera@gmail.com>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>