After many runs of reproducing the failure (on my local MP5 it takes about 100 - 200 run to get one) I think it is more likely a flaky one and not exposing a real bug in rebalance protocol.
What I've observed is that, when the verifying consumer is trying to fetch from the output topics (there are 11 of them), it poll(1sec) each time, and retries 30 times if there's no more data to fetch and stop. It means that if there are no data fetched from the output topics for 30 * 1 = 30 seconds then the verification would stop (potentially too early). And for the failure cases, we observe consistent rebalancing among the closing / newly created clients since the closing is async, i.e. while new clients are added it is possible that closing clients triggered rebalance are not completed yet (note that each instance is configured with 3 threads, and in the worst case there are 6 instances running / pending shutdown at the same time, so a group fo 3 * 6 = 18 members is possible).
However, there's still a possible bug that in KIP-429, somehow the rebalance can never stabilize and members keep re-rejoining and hence cause it to fail. We have another unit test that have bumped up to 3 rebalance by @ableegoldman and if that failed again then it may be a better confirmation such bug may exist.
So what I've done so far for this test:
1. When closing a client, wait for it to complete closure before moving on to the next iteration and starting a new instance to reduce the rebalance churns.
2. Poll for 5 seconds instead of 1 to wait for longer time: 5 * 30 = 150 seconds, and locally my laptop finished this test in about 50 seconds.
3. Minor debug logging improvement; in fact some of them is to reduce redundant debug logging since it is too long and sometimes hides the key information.
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>
* Made commit-over-tasks sensor and skipped-records sensor optional since they are removed in the latest version
* Refactored methods for sensor creation
* Adapted unit and integration tests
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Migrate this integration test to use TopologyTestDriver instead of running 3 Streams instances.
Dropped one test that was attempting to produce specific interleavings. If anything, these should be verified deterministically by unit testing.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Also put in some additional logging that makes sense to add, and proved helpful in debugging this particular issue.
Unit tests verifying the encoded supported version were added.
This should get cherry-picked back to 2.1
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Fix the formatting and wording of the foreign-key join javadoc
Optimize handling of null extracted foreign keys
Reviewers: Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
In a KTable context, null record values have a special "tombstone" significance. We should always bypass the serdes for such tombstones, since otherwise the serde could violate Streams' table semantics.
Added test coverage for this case and fixed the code accordingly.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
I looked into the logs of the above tickets, and I think for a couple fo them it is due to the fact that the threads takes time to restore, or just stabilize the rebalance since there are multi-threads. Adding the hook to wait for state to transit to RUNNING upon starting.
Reviewers: Chris Pettitt <cpettitt@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Subtopologies are currently ordered alphabetically by source node, which prior to KIP-307 happened to always result in the "correct" (ie topological) order. Now that users may name their nodes anything they want, we must explicitly order them so that upstream node groups/subtopologies come first and the downstream ones come after.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
All four flavors of the repartition/optimization tests have been reported as flaky and failed in one place or another:
* RepartitionOptimizingIntegrationTest.shouldSendCorrectRecords_OPTIMIZED
* RepartitionOptimizingIntegrationTest.shouldSendCorrectRecords_NO_OPTIMIZATION
* RepartitionWithMergeOptimizingIntegrationTest.shouldSendCorrectRecords_OPTIMIZED
* RepartitionWithMergeOptimizingIntegrationTest.shouldSendCorrectRecords_NO_OPTIMIZATION
They're pretty similar so it makes sense to knock them all out at once. This PR does three things:
* Switch to in-memory stores wherever possible
* Name all operators and update the Topology accordingly (not really a flaky test fix, but had to update the topology names anyway because of the IM stores so figured might as well)
* Port to TopologyTestDriver -- this is the "real" fix, should make a big difference as these repartition tests required multiple roundtrips with the Kafka cluster (while using only the default timeout)
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
After merged #7412 we realized it does not necessarily need that long time: instead of putting 2 million records, we can just have a single put followed by a flush, to make sure that rocksDB file exists locally (verified that after flush the sst file always exist).
Now the RocksDBStoreTest takes about 2.5 seconds, and removing the integration annotation from it.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>
* Renamed tag client-id to thread-id for thread-level metrics and below
* Corrected metrics tag keys for state store that had suffix "-id" instead of "state-id"
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Key improvements with this PR:
* tasks will remain available for IQ during a rebalance (but not during restore)
* continue restoring and processing standby tasks during a rebalance
* continue processing active tasks during rebalance until the RecordQueue is empty*
* only revoked tasks must suspended/closed
* StreamsPartitionAssignor tries to return tasks to their previous consumers within a client
* but do not try to commit, for now (pending KAFKA-7312)
Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
1. Moves StreamsMetricsImpl from StreamThread to KafkaStreams
2. Adds instance-level metrics as specified in KIP-444, i.e.:
-- version
-- commit-id
-- application-id
-- topology-description
-- state
Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
https://issues.apache.org/jira/browse/KAFKA-3705
Allows for a KTable to map its value to a given foreign key and join on another KTable keyed on that foreign key. Applies the joiner, then returns the tuples keyed on the original key. This supports updates from both sides of the join.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Christopher Pettitt <cpettitt@confluent.io>, Bill Bejeck <bbejeck@gmail.com>, Jan Filipiak <Jan.Filipiak@trivago.com>, pgwhalen, Alexei Daniline
Instead of sending the leader's version and having older members try to blindly upgrade.
The only other real change here is that we will also set the VERSION_PROBING error code and return early from onAssignment when we are upgrading our used subscription version (not just downgrading it) since this implies the whole group has finished the rolling upgrade and all members should rejoin with the new subscription version.
Also piggy-backing on a fix for a potentially dangerous edge case, where every thread of an instance is assigned the same set of active tasks.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
added shutdown for thread that triggers recording of RocksDBMetrics
added unit tests to verify the start and shutdown of the thread
refactored a bit of code
Reviewers: Christopher Pettitt <cpettitt@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
A minor refactor to explicitly verify that Processor#close is only called once.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>,
shouldNotThrowExceptionOnRestoreWhenThereIsPreExistingRocksDbFiles takes
1m30s, which is too long for a unit test.
`RocksDBTimestampedStoreTest` inherits from `RocksDBStoreTest` and it's
implicitly considered an integration test too.
Reviewers: Guozhang Wang <guozhang@confluent.io>
A minor change in logic to account for repartition topics where we might not have the num partitions yet in the metadata.
Ran all existing tests plus all streams system tests.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
Detailed info is available in the ticket: https://issues.apache.org/jira/browse/KAFKA-8911
Briefly, implicit defs are calling empty constructors, which exists only for reflection object creation.
Therefore, while using the implicit definitons, a NPE occurs when Serde is called.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Replaced UpdateMetadata{Request, Response}, LeaderAndIsr{Request, Response}
and StopReplica{Request, Response} with the automated protocol classes.
Updated the JSON schema for the 3 request types to be more consistent and
less strict (if needed to avoid duplication).
The general approach is to avoid generating new collections in the request
classes. Normalization happens in the constructor to make this possible. Builders
still have to group by topic to maintain the external ungrouped view.
Introduced new tests for LeaderAndIsrRequest and UpdateMetadataRequest to
verify that the new logic is correct.
A few other clean-ups/fixes in code that was touched due to these changes:
* KAFKA-8956: Refactor DelayedCreatePartitions#updateWaiting to avoid modifying
collection in foreach.
* Avoid unnecessary allocation for state change trace logging if trace logging is not enabled
* Use `toBuffer` instead of `toList`, `toIndexedSeq` or `toSeq` as it generally performs
better and it matches the performance characteristics of `java.util.ArrayList`. This is
particularly important when passing such instances to Java code.
* Minor refactoring for clarity and readability.
* Removed usage of deprecated `/:`, unused imports and unnecessary `var`s.
* Include exception in `AdminClientIntegrationTest` failure message.
* Move StopReplicaRequest verification in `AuthorizerIntegrationTest` to the end
to match the comment.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
Previous KafkaStreamsTest takes 2min20s on my local laptop, because lots of its integration test which is producing / consuming records, and checking state directory file system takes lots of time. On the other hand, these tests should be well simplified with mocks.
This test reduces the test from a clumsy integration test class into a unit tests with mocks of its internal modules. And some other test functions should not be in KafkaStreamsTest actually and have been moved to other modular test classes. Now it takes 2s.
Also it helps removing the potential flakiness of the following (some of them are claimed resolved only because we have not seen them recently, but after looking at the test code I can verify they are still flaky):
* KAFKA-5818 (the original JIRA ticket indeed exposed a real issue that has been fixed, but the test itself remains flaky)
* KAFKA-6215
* KAFKA-7921
* KAFKA-7990
* KAFKA-8319
* KAFKA-8427
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Bruno Cadonna <bruno@confluent.io>