1. When we reinitialize the state store due to no CHECKPOINT with EOS turned on, we should update the checkpoint to consumer.seekToBeginnning() / consumer.position() to avoid falling into endless iterations.
2. Fixed a few other logic bugs around needsInitializing and needsRestoring.
Reviewers: Jason Gustafson <jason@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
1. As titled and as described in comments.
2. Modified unit test slightly to insert for new keys in committed data to expose this issue.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR now justs removes the check in TaskPairs.hasNewPair that was causing the task assignment issue.
This was done as we need to further refine task assignment strategy and this approach needs to include the statefulness of tasks and is best done in one pass vs taking a "patchy" approach.
Updated current tests and ran locally
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
ZooKeeper client from version 3.4.13 doesn't handle connections to localhost very well. If ZooKeeper is started on 127.0.0.1 on a machine that has both ipv4 and ipv6 and a client is created using localhost rather than the IP address in the connection string, ZooKeeper client attempts to connect to ipv4 or ipv6 randomly with a fixed one second backoff if connection fails. Use 127.0.0.1 instead of localhost in streams tests to avoid intermittent test failures due to ZK client connection timeouts if ipv6 is chosen in consecutive address selections. Also add note to upgrade docs for 2.0.0.
Reviewers: Ismael Juma <github@juma.me.uk>, Matthias J. Sax <matthias@confluent.io>
1. At the beginning of assign, we first check that all the non-repartition source topics are included in the metadata. If not, we log an error at the leader and set an error in the Assignment userData bytes, indicating that leader cannot complete assignment and the error code would indicate the root cause of it.
2. Upon receiving the assignment, if the error is not NONE the streams will shutdown itself with a log entry re-stating the root cause interpreted from the error code.
Author: tedyu <yuzhihong@gmail.com>
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Closes#5322 from tedyu/trunk
1. Remove MinTimestampTracker and its TimestampTracker interface.
2. In RecordQueue, keep track of the head record (deserialized) while put the rest raw bytes records in the fifo queue, the head record as well as the partition timestamp will be updated accordingly.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR uses bulk loading for recovering RocksDBWindowStore, same as RocksDBStore.
Reviewers: Boyang Chen <bchen11@outlook.com>, Shawn Nguyen <shnguyen@pinterest.com>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
#5253 broke standby restoration for windowed stores.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
1. extend isWindowStore to consider session store as well.
2. extend the existing unit test accordingly.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
See also KIP-319.
Replace number-of-segments parameters with segment-interval-ms parameters in various places. The latter was always the parameter that several components needed, and we accidentally supplied the former because it was the one available.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
KAFKA-6986:Export Admin Client metrics through Stream Threads
We already exported producer and consumer metrics through KafkaStreams class:
#4998
It makes sense to also export the Admin client metrics.
I didn't add a separate unittest case for this. Let me know if it's needed.
This is my first contribution, feel free to point out any mistakes that I did.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. Rename metrics scope of rocksDB window and session stores; also modify the store metrics accordingly with guidance on its correlations to metricsScope.
2. Add the missing total metrics for per-thread, per-task, per-node and per-store sensors.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Before KIP-266, consumer.poll(0) would call updateAssignmentMetadataIfNeeded(Long.MAX_VALUE), which makes sure that the rebalance is definitely completed, i.e. both onPartitionRevoked and onPartitionAssigned called within this poll(0). After KIP-266, however, it is possible that only onPartitionRevoked will be called if timeout is elapsed. And hence we need to double check that state is still PARTITIONS_ASSIGNED after the consumer.poll(duration) call.
Reviewers: Ted Yu <yuzhihong@gmail.com>, Matthias J. Sax <matthias@confluent.io>
They rely on finalizers (before Java 11), which create
unnecessary GC load. The alternatives are as easy to
use and don't have this issue.
Also use FileChannel directly instead of retrieving
it from RandomAccessFile whenever possible
since the indirection is unnecessary.
Finally, add a few try/finally blocks.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Rajini Sivaram <rajinisivaram@googlemail.com>
Enforce window retention times strictly:
* records for windows that are expired get dropped
* queries for timestamps old enough to be expired immediately answered with null
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Significant refactor of Segments to use stream-time as the basis of segment expiration.
Previously Segments assumed that the current record time was representative of stream time.
In the event of a "future" event (one whose record time is greater than the stream time), this
would inappropriately drop live segments. Now, Segments will provision the new segment
to house the future event and drop old segments only after they expire.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
- Removed Scala consumers (`SimpleConsumer` and `ZooKeeperConsumerConnector`)
and their tests.
- Removed Scala request/response/message classes.
- Removed any mention of new consumer or new producer in the code
with the exception of MirrorMaker where the new.consumer option was
never deprecated so we have to keep it for now. The non-code
documentation has not been updated either, that will be done
separately.
- Removed a number of tools that only made sense in the context
of the Scala consumers (see upgrade notes).
- Updated some tools that worked with both Scala and Java consumers
so that they only support the latter (see upgrade notes).
- Removed `BaseConsumer` and related classes apart from `BaseRecord`
which is used in `MirrorMakerMessageHandler`. The latter is a pluggable
interface so effectively public API.
- Removed `ZkUtils` methods that were only used by the old consumers.
- Removed `ZkUtils.registerBroker` and `ZKCheckedEphemeral` since
the broker now uses the methods in `KafkaZkClient` and no-one else
should be using that method.
- Updated system tests so that they don't use the Scala consumers except
for multi-version tests.
- Updated LogDirFailureTest so that the consumer offsets topic would
continue to be available after all the failures. This was necessary for it
to work with the Java consumer.
- Some multi-version system tests had not been updated to include
recently released Kafka versions, fixed it.
- Updated findBugs and checkstyle configs not to refer to deleted
classes and packages.
Reviewers: Dong Lin <lindong28@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
This version is a WIP and intentionally leaves out some additional required changes to keep the reviewing effort more manageable. This version of the process includes
1. Cleaning up the graph objects to reduce the number of parameters and make the naming conventions more clear.
2. Intercepting all calls to the InternalToplogyBuilder and capturing all details required for possible optimizations and building the final topology.
This PR does not include writing out the current physical plan, so no tests included. The next PR will include additional changes to building the graph and writing the topology out without optimizations, using the current streams tests.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The old timeout configs no longer take effect, as of
53ca52f855e903907378188d29224b3f9cefa6cb. They are replaced
by the new one.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Ted Yu <yuzhihong@gmail.com>
* KAFKA-6474: Rewrite tests to use new public TopologyTestDriver [part 2]
* Refactor:
-KTableFilterTest.java
-KTableImplTest.java
-KTableMapValuesTest.java
-KTableSourceTest.java
* Add access to task, processorTopology, and globalTopology in TopologyTestDriver via TopologyTestDriverWrapper
* Remove unnecessary constructor in TopologyTestDriver
* Change how TopologyTestDriverWrapper#getProcessorContext sets the current node
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Add a unit test that validates after restoreStart, the options are set with bulk loading configs; and after restoreEnd, it resumes to the customized configs
Reviewers: Matthias J. Sax <matthias@confluent.io>
This PR actually contains two changes:
1. leverage on the TOPOLOGY_OPTIMIZATION config to "adjust" the topology internally to reuse the source topic.
2. fixed a long dangling bug that whenever source topic is reused as changelog topic, write the checkpoint file for the consumed offset, this is done by union the ackedOffset from the producer, plus the consumed offset from the consumer, note we will priori ackedOffset since the same topic may show up in both (think about repartition topic), by doing this the consumed offset from source topics can be treated as checkpointed offset when reuse happens.
3. added a few unit and integration tests with / wo the reusing, and make sure the restoration, standby task, and internal topic creation behaviors are all correct.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
*Summary
options.prepareForBulkLoad() and then use the configs from the customized customized RocksDBConfigSetter. This may overwrite the configs set in prepareBulkLoad call. The fix is to move prepareBulkLoad call after applying configs customized RocksDBConfigSetter.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
Make use of the new Consumer#poll(Duration) to avoid getting stuck in poll when the broker is unavailable.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
Adding configuration to StreamsConfig allowing for making topology optimization optional.
Added unit tests are verifying default values, setting correct value and failure on invalid values.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
* KAFKA-6993: Fix defective documentations for KStream/KTable methods
1. Fix the documentation of following methods, e.g., making more detailed description for the overloaded methods:
- KStream#join
- KStream#leftJoin
- KStream#outerJoin
- KTable#filter
- KTable#filterNot
- KTable#mapValues
- KTable#transformValues
- KTable#join
- KTable#leftJoin
- KTable#outerJoin
2. (trivial) with possible new type -> with possibly new type.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>