#5468 introduced a breaking API change that was actually avoidable. This PR re-introduces the old API as deprecated and alters the API introduced by #5468 to be consistent with the other methods
also, fixed misc syntax problems
- fix log statement in Topology Builder.
- addressed some warnings shown by Intellij
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Satish Duggana <satishd@apache.org>, Matthias J. Sax <matthias@confluent.io>
While working on 4th PR, I noticed that I had missed adding stores via the graph vs. directly via the InternalStreamsBuilder. Probably ok to do so, but we should be consistent.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
While debugging the reported issue, I found that our current unit test lacks coverage to actually expose the underlying root cause.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
1. In each iteration, decide if a task is processable if all of its partitions contains data, so it can decide which record to process next.
1.a Add one exception that, if the task indeed have data on some but not all of its partitions, we only consider as not processable for some finite round of iterations.
1.b Add a task-level metric to record whenever we are forced to process a task that is only "partially data available", since it may leads to non-determinism.
2. Break the main loop on put-raw-data and process-them. Since now not all data put into the queue would be processed completely within a single iteration.
3. NOTE that within an iteration, if a task has exhausted one of its queue it will still be processed, since we only update processable list once in each iteration, I'm improving on this on the follow-up part III PR.
4. Found and fixed a bug in metrics recording: the taskName and sensorName parameters were exchanged.
5. Optimized task stream time computation again since our current partition stream time reasoning has been simplified.
6. Added unit tests.
Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <vvcephei@users.noreply.github.com>, Bill Bejeck <bbejeck@gmail.com>
The specific changes in this PR from the second PR include:
1. Changed the types of graph nodes to names conveying more context
2. Build the entire physical plan from the graph, after StreamsBuilder.build() is called.
Other changes are addressed directly as review comments on the PR.
Testing consists of using all existing streams tests to validate building the physical plan with graph
Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
Use delivery timeout instead of retries when possible and remove various TODOs associated with completion of KIP-91.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
* new minimum is 0, just like window size
* refactor tests to use smaller segment sizes as well
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
1. When we reinitialize the state store due to no CHECKPOINT with EOS turned on, we should update the checkpoint to consumer.seekToBeginnning() / consumer.position() to avoid falling into endless iterations.
2. Fixed a few other logic bugs around needsInitializing and needsRestoring.
Reviewers: Jason Gustafson <jason@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
1. As titled and as described in comments.
2. Modified unit test slightly to insert for new keys in committed data to expose this issue.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR now justs removes the check in TaskPairs.hasNewPair that was causing the task assignment issue.
This was done as we need to further refine task assignment strategy and this approach needs to include the statefulness of tasks and is best done in one pass vs taking a "patchy" approach.
Updated current tests and ran locally
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
ZooKeeper client from version 3.4.13 doesn't handle connections to localhost very well. If ZooKeeper is started on 127.0.0.1 on a machine that has both ipv4 and ipv6 and a client is created using localhost rather than the IP address in the connection string, ZooKeeper client attempts to connect to ipv4 or ipv6 randomly with a fixed one second backoff if connection fails. Use 127.0.0.1 instead of localhost in streams tests to avoid intermittent test failures due to ZK client connection timeouts if ipv6 is chosen in consecutive address selections. Also add note to upgrade docs for 2.0.0.
Reviewers: Ismael Juma <github@juma.me.uk>, Matthias J. Sax <matthias@confluent.io>
1. At the beginning of assign, we first check that all the non-repartition source topics are included in the metadata. If not, we log an error at the leader and set an error in the Assignment userData bytes, indicating that leader cannot complete assignment and the error code would indicate the root cause of it.
2. Upon receiving the assignment, if the error is not NONE the streams will shutdown itself with a log entry re-stating the root cause interpreted from the error code.
Author: tedyu <yuzhihong@gmail.com>
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Closes#5322 from tedyu/trunk
1. Remove MinTimestampTracker and its TimestampTracker interface.
2. In RecordQueue, keep track of the head record (deserialized) while put the rest raw bytes records in the fifo queue, the head record as well as the partition timestamp will be updated accordingly.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR uses bulk loading for recovering RocksDBWindowStore, same as RocksDBStore.
Reviewers: Boyang Chen <bchen11@outlook.com>, Shawn Nguyen <shnguyen@pinterest.com>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
#5253 broke standby restoration for windowed stores.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
1. extend isWindowStore to consider session store as well.
2. extend the existing unit test accordingly.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
See also KIP-319.
Replace number-of-segments parameters with segment-interval-ms parameters in various places. The latter was always the parameter that several components needed, and we accidentally supplied the former because it was the one available.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
KAFKA-6986:Export Admin Client metrics through Stream Threads
We already exported producer and consumer metrics through KafkaStreams class:
#4998
It makes sense to also export the Admin client metrics.
I didn't add a separate unittest case for this. Let me know if it's needed.
This is my first contribution, feel free to point out any mistakes that I did.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. Rename metrics scope of rocksDB window and session stores; also modify the store metrics accordingly with guidance on its correlations to metricsScope.
2. Add the missing total metrics for per-thread, per-task, per-node and per-store sensors.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Before KIP-266, consumer.poll(0) would call updateAssignmentMetadataIfNeeded(Long.MAX_VALUE), which makes sure that the rebalance is definitely completed, i.e. both onPartitionRevoked and onPartitionAssigned called within this poll(0). After KIP-266, however, it is possible that only onPartitionRevoked will be called if timeout is elapsed. And hence we need to double check that state is still PARTITIONS_ASSIGNED after the consumer.poll(duration) call.
Reviewers: Ted Yu <yuzhihong@gmail.com>, Matthias J. Sax <matthias@confluent.io>
They rely on finalizers (before Java 11), which create
unnecessary GC load. The alternatives are as easy to
use and don't have this issue.
Also use FileChannel directly instead of retrieving
it from RandomAccessFile whenever possible
since the indirection is unnecessary.
Finally, add a few try/finally blocks.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Rajini Sivaram <rajinisivaram@googlemail.com>
Enforce window retention times strictly:
* records for windows that are expired get dropped
* queries for timestamps old enough to be expired immediately answered with null
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Significant refactor of Segments to use stream-time as the basis of segment expiration.
Previously Segments assumed that the current record time was representative of stream time.
In the event of a "future" event (one whose record time is greater than the stream time), this
would inappropriately drop live segments. Now, Segments will provision the new segment
to house the future event and drop old segments only after they expire.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>