Updates KStream JavaDoc and web page documentations using new State Store API
Author: Yu Liu <yu.liu003@gmail.com>
Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
The ChangeLoggingKeyValueBytesStore used to write null to its underlying store instead of propagating the delete, which has two drawbacks:
* an iterator will see null values
* unbounded memory growth of the underlying in-memory keyvalue store
The fix will just propagate the delete instead of performing put(key, null).
The changes to the tests:
*extra test whether the key is really gone after delete by calling the approximateEntries on the underlying store. This number is exact because we know the underlying store in the test is of type InMemoryKeyValueStore
* extra test to check a delete is logged as <key, null> (the existing test would also succeed if the key is just absent)
While also updating the corresponding tests of the ChangeLoggingKeyValueStore I noticed the class is nowhere used anymore so I removed it from the source code for clarity.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: Dmitry Minkovsky <dminkovsky@gmail.com>
Reviewers: Joel Hamill <joel-hamill@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Closes#4483 from dminkovsky/fix-javadoc-typo
github comments
* In Checkpoint.write(), if the offset map passed in is empty, skip the writing of the file which would only contain version number and the empty size. From the reading pov, it is the same as no file existed.
* Add related unit tests.
* Minor fixes on log4j messages.
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR avoids unnecessary punctuation calls if punctuations are missed due to large time advances. It also aligns punctuation schedules to the epoch.
Author: Frederic Arno
Reviewers: Michal Borowiecki <michal.borowiecki@openbet.com>, Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>
* removed round-robin approach, try to assign tasks to consumers in a more even manner, added unit test.
* better interleaved task approach, updated tests
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <mjsax@apache.org>
* Implement method to get custom properties
* Add custom properties to getConsumerConfigs and getProducerConfigs
* Add tests
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
1. Rename KTableKTableJoin to KTableKTableInnerJoin. Also removed abstract from other joins.
2. Merge KTableKTableJoinValueGetter.java into KTableKTableInnerJoin.
3. Use set instead of arrays in the stores function, to avoid duplicate stores to be connected to processors.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
[KAFKA-6451](https://issues.apache.org/jira/browse/KAFKA-6451)
Simplified KStreamReduce and KStreamAggregate.
Updated comments in KStreamAggregate.
Author: Tanvi Jaywant <tanvijaywant@Tanvis-Air.home>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>
Closes#4477 from tanvijaywant31/KAFKA-6451
* KAFKA-3625: Add public test utils for Kafka Streams
- add new artifact test-utils
- add TopologyTestDriver
- add MockTime, TestRecord, add TestRecordFactory
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>
This is something I did after my working hours, I would ask people reviewing this do the same, don't take time for this during your work hours.
I try to keep such a PR as limited as possible, for clarity of reading.
==========
Using an empty string concat in order to achieve the String representation of the value you want is bad for 2 reasons, as explained here: (https://stackoverflow.com/questions/1572708/is-conversion-to-string-using-int-value-bad-practice
Readability: it shows what you're trying to do.
Depending on your compiler, it might attempt to create your String by first creating a StringBuffer, appending your value to it and then doing .toString() on that. Which is inefficient.
Also, the Metrics.java file had an empty string being added for the sole reason that the page width forced a string to continue on a new line. Removed that.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Initialize topology after state store restoration.
Although IMHO updating some of the existing tests demonstrates the correct order of operations, I'll probably add an integration test, but I wanted to get this PR in for feedback on the approach.
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>
Closes#4415 from bbejeck/KAFKA-6205_restore_state_stores_before_initializing_topology
minor log4j edits
This is a bug fix that is composed of two parts:
1. The major part is, for all operators that is generating a KTable, we should construct its value getter based on whether the KTable itself is materialized.
1.a If yes, then query the materialized store directly for value getter.
1.b If not, then hand over to its parents value getter (recursively) and apply the computation to return.
2. The minor part is, in KStreamImpl, when joining with a table, we should connect with table's `valueGetterSupplier().storeNames()`, not the `internalStoreName()` as the latter always assume that the KTable is materialized, but that is not always true.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Closes#4421 from guozhangwang/K6398-KTableValueGetter
Author: RichardYuSTUG <yohan.richard.yu2@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#4339 from ConcurrencyPractitioner/kafka-6238
Minor edits on description
1. Replaced KStreamKTableJoinIntegrationTest with the abstract based StreamTableJoinIntegrationTest. Added details on per-step verifications.
2. Minor renaming on GlobalKTableIntegrationTest.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>
Closes#4419 from guozhangwang/KMinor-join-integration-tests-II
This PR is the partial implementation for KIP-149. As the discussion for this KIP is still ongoing, I made a PR on the "safe" portions of the KIP (so that it can be included in the next release) which are 1) `ValueMapperWithKey`, 2) `ValueTransformerWithKeySupplier`, and 3) `ValueTransformerWithKey`.
Author: Jeyhun Karimov <je.karimov@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4309 from jeyhunkarimov/KIP-149_hope_last
0. Rename `JoinIntegrationTest` to `StreamStreamJoinIntegrationTest`, which is only for KStream-KStream joins.
1. Extract the `AbstractJoinIntegrationTest` which is going to be used for all the join integration test classes, parameterized with and without caching.
2. Merge `KStreamRepartitionJoinTest.java` into `StreamStreamJoinIntegrationTest.java` with augmented stream-stream join.
3. Add `TableTableJoinIntegrationTest` with detailed per-step expected results and removed `KTableKTableJoinIntegrationTest`.
Findings of the integration test:
1. Confirmed KAFKA-4309 with caching turned on.
2. Found bug KAFKA-6398.
3. Found bug KAFKA-6443.
4. Found a bug that in CachingKeyValueStore, we would flush before putting the record into the underlying store, when the store is going to be used in the downstream processors with flushing it would result in incorrect results, fixed the issue along with this PR.
5. Consider a new optimization described in KAFKA-6286.
Future works including stream-table joins will be in other PRs.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>
Closes#4331 from guozhangwang/KMinor-join-integration-tests
A spinoff of original pull request #4340 for resolving conflicts.
Author: RichardYuSTUG <yohan.richard.yu2@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4413 from ConcurrencyPractitioner/kafka-6265-2
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: Rohan Desai <desai.p.rohan@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4382 from rodesai/KAFKA-6383
1. Include the parent's queryable store name in KTable.filter if this operator is not materialized.
2. Augment InternalTopologyBuilder checking on null processor / store names from the enum.
3. Unit test.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Closes#4384 from guozhangwang/K6398-topology-builder-exception
Currently CachingKeyValueStore methods are synchronized at method level.
It seems we can use read lock for getter and write lock for put / delete methods.
For getInternal(), if the underlying thread is streamThread, the getInternal() may trigger eviction. This can be handled by obtaining write lock at the beginning of the method for streamThread.
The jmh patch is attached to JIRA:
https://issues.apache.org/jira/secure/attachment/12905140/6412-jmh.v1.txt
Author: tedyu <yuzhihong@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>
Closes#4372 from tedyu/6412
* Implement MockAdminClient.deleteTopics
* Use MockAdminClient instead of MockKafkaAdminClientEnv in StreamsResetterTest
* Rename MockKafkaAdminClientEnv to AdminClientUnitTestEnv
* Use MockAdminClient instead of MockKafkaAdminClientEnv in TopicAdminTest
* Rename KafkaAdminClient to AdminClientUnitTestEnv in KafkaAdminClientTest.java
* Migrate StreamThreadTest to MockAdminClient
* Fix style errors
* Address review comments
* Fix MockAdminClient call
Reviewers: Matthias J. Sax <matthias@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
- Fix zk session state and session change rate metric names: type
should be SessionExpireListener instead of KafkaHealthCheck. Test
verifying the fix was included.
- Handle missing controller in controlled shutdown in the same way as if
the broker is not registered (i.e. retry after backoff).
- Restructure BrokerInfo to reduce duplication. It now contains a
Broker instance and the JSON serde is done in BrokerIdZNode
since `Broker` does not contain all the fields.
- Remove dead code from `ZooKeeperClient.initialize` and remove
redundant `close` calls.
- Move ACL handling and persistent paths definition from ZkUtils to
ZkData (and call ZkData from ZkUtils).
- Remove ZooKeeperClientWrapper and ZooKeeperClientMetrics from
ZkUtils (avoids metrics clash if third party users create a ZkUtils
instance in the same process as the broker).
- Introduce factory method in KafkaZkClient that creates
ZooKeeperClient and remove metric name defaults from
ZooKeeperClient.
- Fix a few instances where ZooKeeperClient was not closed in tests.
- Update a few TestUtils methods to use KafkaZkClient instead of
ZkUtils.
- Add test verifying SessionState metric.
- Various clean-ups.
Testing: mostly relying on existing tests, but added a couple
of new tests as mentioned above.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Jun Rao <junrao@gmail.com>
Closes#4359 from ijuma/kafka-6320-kafka-health-zk-metrics-follow-up
Increase commit interval to make it less likely that we flush the cache in-between.
To make it fool-proof, only compare the "final" result records if cache is enabled.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4364 from mjsax/kafka-6256-flaky-kstream-ktable-join-with-caching-test
* Removed code duplicate from GlobalProcessorContextImpl and ProcessorContextImpl to parent class AbstractProcessorContext
* Exchanged concrete implementations with interfaces to make code more maintainable
* Refactored major code duplicates in InternalTopologyBuilder
* Formatted function parameters as per code review
Added final to code introduced in this PR
* Added missing finals to putNodeGroupName function
Rearranged parameters for resetTopicsPattern function
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
* ensure topics are created with correct partitions BEFORE building the metadata for our stream tasks
* Added a test case. The test should fail with the old logic, because:
While stream-partition-assignor-test-KSTREAM-MAP-0000000001-repartition is created correctly with four partitions, the StreamPartitionAssignor will only assign three tasks to the topic. Test passes with the new logic.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Ted Yu <yuzhihong@gmail.com>
* Return offset of next record of records left after restore completed
* Changed check for restoring partition to remove the "+1" in the guard condition
Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
* KAFKA-6383: complete shutdown for CREATED StreamThreads
When transitioning StreamThread from CREATED to PENDING_SHUTDOWN
free up resources from the caller, rather than the stream thread,
since in this case the stream thread was never actually started.
Have StreamThread.setState return the old state. If the old state is
CREATED in StreamThread.shutdown() then start the thread so that it
can free the resources owned by the StreamThread.
Add a KafkaStreams test to verify that the producer gets closed even
if KafkaStreams was not started
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
1. added functions for KafkaStreams and KafkaClientSupplier.
2. added prefix for admin client in StreamsConfig.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Closes#4338 from guozhangwang/K6150-doc-changes
* Moved metrics in KafkaHealthCheck to ZookeeperClient.
* Converted remaining ZkUtils usage in KafkaServer to ZookeeperClient and removed ZkUtils from KafkaServer.
* Made the re-creation of ZooKeeper during ZK session expiration with infinite retries.
* Added unit tests for all new methods in KafkaZkClient.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>