During the discussion for KIP-213, we decided to pass "pseudo-topics"
to the internal serdes we use to construct the wrapper serdes for
CombinedKey and hashing the left-hand-side value. However, during
the implementation, this strategy wasn't fully implemented, and we wound
up using the same topic name for a few different data types.
Reviewers: Guozhang Wang <guozhang@confluent.io>
During the KIP-213 implementation and verification, we neglected to test the
code path for falling back to default serdes if none are given in the topology.
Reviewer: Bill Bejeck <bbejeck@gmail.com>
Relying on integration test to catch an algorithm bug introduces more flakiness, reduce the test into a unit test to reduce the flakiness until we upgrade Java/Scala libs.
Checked the test shall fail with older version of StreamsPartitionAssignor.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Found this bug from the repeated flaky runs of system tests, it seems to be long lurking but also would only happen if there are frequent rebalances / topic creation within a short time, which is exactly the case in some of our smoke system tests.
Also added a unit test.
Reviewers: Boyang Chen <boyang@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Follows up on the original PR for KAFKA-9445 to address a final round of feedback
Reviewers: John Roesler <vvcephei@apache.org>, Matthias J. Sax <matthias@confluent.io>
Follow up to original PR #7985 for KIP-523 (adding `KStream#toTable()` operator)
- improve JavaDocs
- add more unit tests
- fix bug for auto-repartitioning
- some code cleanup
Reviewers: High Lee <yello1109@daum.net>, John Roesler <john@confluent.io>
1. StoreChangelogReaderTest.shouldRequestCommittedOffsetsAndHandleTimeoutException[1] This is due to stricter ternary operator type casting
2. KStreamImplTest.shouldSupportTriggerMaterializedWithKTableFromKStream
This is added recently where String typed values for <String, Integer>, in J8 it is allowed but in J11 it is not allowed.
Reviewers: John Roesler <john@confluent.io>
Followup to KAFKA-7317 and KAFKA-9113, there's some additional cleanup we can do in InternalTopologyBuilder. Mostly refactors the subscription code to make the initialization more explicit and reduce some duplicated code in the update logic.
Also some minor cleanup of the build method.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Instead of always try to update committed offset limits as long as there are buffered records for standby tasks, we leverage on the commit interval to reduce our consumer.committed frequency.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, John Roesler <john@confluent.io>
Since ValueAndTimestampSerializer wraps an unknown Serializer, the output of that Serializer can be null. In which case the line
.allocate(rawTimestamp.length + rawValue.length)
will throw a NullPointerException.
This pull request returns null instead.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This PR is collaborated by Guozhang Wang and John Roesler. It is a significant tech debt cleanup on task management and state management, and is broken down by several sub-tasks listed below:
Extract embedded clients (producer and consumer) into RecordCollector from StreamTask.
guozhangwang#2
guozhangwang#5
Consolidate the standby updating and active restoring logic into ChangelogReader and extract out of StreamThread.
guozhangwang#3
guozhangwang#4
Introduce Task state life cycle (created, restoring, running, suspended, closing), and refactor the task operations based on the current state.
guozhangwang#6
guozhangwang#7
Consolidate AssignedTasks into TaskManager and simplify the logic of changelog management and task management (since they are already moved in step 2) and 3)).
guozhangwang#8
guozhangwang#9
Also simplified the StreamThread logic a bit as the embedded clients / changelog restoration logic has been moved into step 1) and 2).
guozhangwang#10
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>
This ticket shall improve two aspects of the retrieval of sensors:
https://issues.apache.org/jira/browse/KAFKA-9152
Currently, when a sensor is retrieved with *Metrics.*Sensor() (e.g. ThreadMetrics.createTaskSensor()) after it was created with the same method *Metrics.*Sensor(), the sensor is added again to the corresponding queue in Sensors (e.g. threadLevelSensors) in StreamsMetricsImpl. Those queues are used to remove the sensors when removeAllLevelSensors() is called. Having multiple times the same sensors in this queue is not an issue from a correctness point of view. However, it would reduce the footprint to only store a sensor once in those queues.
When a sensor is retrieved, the current code attempts to create a new sensor and to add to it again the corresponding metrics. This could be avoided.
Both aspects could be improved by checking whether a sensor already exists by calling getSensor() on the Metrics object and checking the return value.
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Also addresses KAFKA-8821
Note that we still have to fall back to using pattern subscription if the user has added any regex-based source nodes to the topology. Includes some minor cleanup on the side
Reviewers: Bill Bejeck <bbejeck@gmail.com>
Add a new method to KafkaStreams to return an estimate of the lags for
all partitions of all local stores.
Implements: KIP-535
Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com>
Reviewed-by: John Roesler <vvcephei@apache.org>
Add a new overload of KafkaStreams#store that allows users
to query standby and restoring stores in addition to active ones.
Closes: #7962
Implements: KIP-535
Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com>
Reviewed-by: John Roesler <vvcephei@apache.org>
Deprecate existing metadata query APIs in favor of new
ones that include standby hosts as well as partition
information.
Closes: #7960
Implements: KIP-535
Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com>
Reviewed-by: John Roesler <vvcephei@apache.org>
Not wait until updateAssignmentMetadataIfNeeded returns true, but only call it once with 0 timeout. Also do not return empty if in rebalance.
Trim the pre-fetched records after long polling since assignment may have been changed.
Also need to update SubscriptionState to retain the state in assignFromSubscribed if it already exists (similar to assignFromUser), so that we do not need the transition of INITIALIZING to FETCHING.
Unit test: this actually took me the most time :)
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Sophie Blee-Goldman <sophie@confluent.io>, Jason Gustafson <jason@confluent.io>, Richard Yu <yohan.richard.yu@gmail.com>, dengziming <dengziming1993@gmail.com>
This PR fixes the regression introduced in 2.4 from 2 refactoring PRs:
#7249#7419
The bug was introduced by having a logical path leading numPartitionsCandidate to be 0, which is assigned to numPartitions and later being checked by setNumPartitions. In the subsequent check we will throw illegal argument if the numPartitions is 0.
This bug is both impacting new 2.4 application and upgrades to 2.4 in certain types of topology. The example in original JIRA was imported as a new integration test to guard against such regression. We also verify that without the bug fix application will still fail by running this integration test.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
* Adjust build and documentation.
* Use lambda syntax for SAM types in `core`, `streams-scala` and
`connect-runtime` modules.
* Remove `runnable` and `newThread` from `CoreUtils` as lambda
syntax for SAM types make them unnecessary.
* Remove stale comment in `FunctionsCompatConversions`,
`KGroupedStream`, `KGroupedTable' and `KStream` about Scala 2.11,
the conversions are needed for Scala 2.12 too.
* Deprecate `org.apache.kafka.streams.scala.kstream.Suppressed`
and use `org.apache.kafka.streams.kstream.Suppressed` instead.
* Use `Admin.create` instead of `AdminClient.create`. Static methods
in Java interfaces can be invoked since Scala 2.12. I noticed that
MirrorMaker 2 uses `AdminClient.create`, but I did not change them
as Connectors have restrictions on newer client APIs.
* Improve efficiency in a few `Gauge` implementations by avoiding
unnecessary intermediate collections.
* Remove pointless `Option.apply` in `ZookeeperClient`
`SessionState` metric.
* Fix unused import/variable and other compiler warnings.
* Reduce visibility of some vals/defs.
Reviewers: Manikumar Reddy <manikumar@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Gwen Shapira <gwen@confluent.io>
In some system tests a Streams app is started and then prints a message to stdout, which the system test waits for to confirm the node has successfully been brought up. It then greps for certain log messages in a retriable loop.
But waiting on the Streams app to start/print to stdout does not mean the log file has been created yet, so the grep may return an error. Although this occurs in a retriable loop it is assumed that grep will not fail, and the result is piped to wc and then blindly converted to an int in the python function, which fails since the error message is a string (throws ValueError)
We should catch the ValueError and return a 0 so it can try again rather than immediately crash
Reviewers: Bill Bejeck <bbejeck@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
This is part1 of a series of PRs for task management cleanup:
1. Primarily cleanup MockRecordCollectors: remove unnecessary anonymous inheritance but just consolidate on the NoOpRecordCollector -> renamed to MockRecordCollector. Most relevant changes are unit tests that would be relying on this MockRecordCollector.
2. Let StandbyContextImpl#recordCollector() to return null instead of returning a no-op collector, since in standby tasks we should ALWAYS bypass the logging logic and only use the inner store for restoreBatch. Returning null helps us to realize this assertion failed as NPE as early as possible whereas a no-op collector just hides the bug.
Reviewers: Guozhang Wang <wangguoz@gmail.com>