We would like to also export the producer metrics from StreamThread just like consumer metrics, so that we could gain more visibility of stream application. The approach is to pass in the threadProducer into the StreamThread so that we could export its metrics in dynamic.
Note that this is a pure internal change that doesn't require a KIP, and in the future we also want to export admin client metrics. A followup KIP for admin client will be created once this is merged.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. Remove TopologyBuilder, TopologyBuilderException, KStreamBuilder,
2. Completed the leftover work of https://issues.apache.org/jira/browse/KAFKA-5660, when we remove TopologyBuilderException.
3. Added MockStoreBuilder to replace MockStateStoreSupplier, remove all XXStoreSupplier except StateStoreSupplier as it is still referenced in the logical streams graph.
4. Minor: rename KStreamsFineGrainedAutoResetIntegrationTest.java to FineGrainedAutoResetIntegrationTest.java.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Also removed the InternalValueTransformerWithKey / Supplier which is used to mock away the deprecated punctuate function.
Reviewers: Matthias J. Sax <matthias@confluent.io>
1. Remove the deprecated StateStoreSuppliers, and the corresponding Stores.create() functions and factories: only the base StateStoreSupplier and MockStoreSupplier were still preserved as they are needed by the deprecated TopologyBuilder and KStreamBuilder. Will remove them in a follow-up PR.
2. Add TopologyWrapper.java as the original InternalTopologyBuilderAccessor was removed, but I realized it is still needed as of now.
3. Minor: removed StateStoreTestUtils.java and inline its logic in its callers since now with StoreBuilder it is just a one-liner.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This is continuation of #4978.
From Guozhang:
I think to fix this issue, in init we could consider switching the steps of 1 and 2:
initInternal(context);
underlying.init(context, root);
since
volatile boolean open = false;
it should be sufficient. In this case the check on step 3) will fail if underlying.init is not completed and we will throw InvalidStateStoreException.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Serdes are confusing in the Scala wrapper:
* We have wrappers around Serializer, Deserializer and Serde which are not very useful.
* We have Serdes in 2 places org.apache.kafka.common.serialization.Serde and in DefaultSerdes, instead we should be having only one place where to find all the Serdes.
I wanted to do this PR before the release as this is a breaking change.
This shouldn't add more so the current tests should be enough.
Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>
I'm breaking KAFKA-6813 into a couple of "smaller" PRs and this is the first one. It focused on:
Remove deprecated APIs in KStream, KTable, KGroupedStream, KGroupedTable, SessionWindowedKStream, TimeWindowedKStream.
Also found a couple of overlooked bugs while working on them:
2.a) In KTable.filter / mapValues without the additional parameter indicating the materialized stores, originally we will not materialize the store. After KIP-182 we mistakenly diverge the semantics: for KTable.mapValues it is still the case, for KTable.filter we will always materialize.
2.b) In XXStream/Table.reduce/count, we used to try to reuse the serdes since their types are pre-known (for reduce it is the same types for both key / value, for count it is the same types for key, and Long for value). This was somehow lost in the past refactoring.
2.c) We are enforcing to cast a Serde<V> to Serde<VR> for XXStream / Table.aggregate, for which the returned value type is NOT known, such the enforced casting should not be applied and we should require users to provide us the value serde if they believe the default ones are not applicable.
2.d) Whenever we are creating a new MaterializedInternal we are effectively incrementing the suffix index for the store / processor-node names. However in some places this MaterializedInternal is only used for validation, so the resulted processor-node / store suffix is not monotonic.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
Updated RocksDBSegmentedBytesStoreTest class to include time window serdes.
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
Reviewer: Matthias J. Sax <matthias@confluent.io>, Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
Updated the upgrade doc as well since we do not have an overloaded function without the deprecated parameter before. Also renamed the 1.2 release version to 2.0.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Several build and documentation updates were required after the merge of KAFKA-6670: Implement a Scala wrapper library for Kafka Streams.
Encode Scala major version into streams-scala artifacts.
To differentiate versions of the kafka-streams-scala artifact across Scala major versions it's required to encode the version into the artifact name before its published to a maven repository. This is accomplished by following a similar release process as kafka core, which encodes the Scala major version and then runs the build for each major version of Scala supported. This is considered standard practice when releasing Scala libraries, but is not handled for us automatically with the basic Scala for Gradle support.
After this change you can generate and install the kafka-streams-scala artifact into the local maven repository:
$ ./gradlew -PscalaVersion=2.11 install
$ ./gradlew -PscalaVersion=2.12 install
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Remove the deprecated KafkaStreams#toString function. Also override toString() for internal classes for debugging purposes.
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Moved the shutdown of GlobalStreamThread to after all StreamThread instances have stopped.
There can be a race condition where shut down is called on a StreamThread then shut down is called on a GlobalStreamThread, but if StreamThread is delayed in shutting down, the GlobalStreamThread can shutdown first.
If the StreamThread tries to access a GlobalStateStore before closing the user can get an exception stating "..Store xxx is currently closed "
Tested by running all current streams tests.
Reviewers: Ted Yu <yuzhihong@gmail.com>, John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Wakeup consumers during shutdown to break them out of any internally blocking calls.
Semantically, it should be fine to treat a WakeupException as "no work to do", which will then continue the threads' polling loops, leading them to discover that they are supposed to shut down, which they will do gracefully.
The existing tests should be sufficient to verify no regressions.
Author: John Roesler <john@confluent.io>
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#4930 from vvcephei/streams-client-wakeup-on-shutdown
minor javadoc updates
While working on this, I also refactored the MockProcessor out of the MockProcessorSupplier to cleanup the unit test paths.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR supersedes PR #4654 as it was growing too large. All comments in that PR should be addressed here.
I will attempt to break the PRs for the topology optimization effort into 3 PRs total and will follow this general plan:
1. This PR only adds the graph nodes and graph. The graph nodes will hold the information used to make calls to the InternalTopologyBuilder when using the DSL. Graph nodes are stored in the StreamsTopologyGraph until the final topology needs building then the graph is traversed and optimizations are made at that point. There are no tests in this PR relying on the follow-up PR to use all current streams tests, which should suffice.
2. PR 2 will intercept all DSL calls and build the graph. The InternalStreamsBuilder uses the graph to provide the required info to the InternalTopologyBuilder and build a topology. The condition of satisfaction for this PR is that all current unit, integration and system tests pass using the graph.
3. PR 3 adds some optimizations mainly automatically repartitioning for operations that may modify a key and have child operations that would normally create a separate repartition topic, saving possible unnecessary repartition topics. For example the following topology:
```
KStream<String, String> mappedStreamOther = inputStream.map(new KeyValueMapper<String, String, KeyValue<? extends String, ? extends String>>() {
@Override
public KeyValue<? extends String, ? extends String> apply(String key, String value) {
return KeyValue.pair(key.substring(0, 3), value);
}
});
mappedStreamOther.groupByKey().windowedBy(TimeWindows.of(5000)).count().toStream().to("count-one-out");
mappedStreamOther.groupByKey().windowedBy(TimeWindows.of(10000)).count().toStream().to("count-two-out");
mappedStreamOther.groupByKey().windowedBy(TimeWindows.of(15000)).count().toStream().to("count-three-out");
```
would create 3 repartion topics, but after applying an optimization strategy, only one is created.
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This pull request is for JIRA 6657, for KIP-276.
Added unit tests for new getGlobalConsumerConfigs API and make sure existing restore consumer tests are passing.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This PR does the following:
* Remove the StreamsRepeatingIntegerKeyProducerService and the associated Java class
* Add a parameter to VerifiableProducer.java to enable sending keys when specified
* Update the corresponding Python file verifiable_producer.py to support the new parameter.
Reviewers: Matthias J Sax <matthias@confluentio>, Guozhang Wang <wangguoz@gmail.com>
Removed the following: "zookeeper.connect", "key.serde", "value.serde", "timestamp.extractor"
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Jason Gustafson <jason@confluent.io>
* Fixes a bug in which all NamedCache instances in a process shared
one parent metric.
* Also fixes a bug which incorrectly computed the per-cache metric tag
(which was undetected due to the former bug).
* Drop the StreamsMetricsConventions#xLevelSensorName convention
in favor of StreamsMetricsImpl#xLevelSensor to allow StreamsMetricsImpl
to track thread- and cache-level metrics, so that they may be cleanly declared
from anywhere but still unloaded at the appropriate time. This was necessary
right now so that the NamedCache could register a thread-level parent sensor
to be unloaded when the thread, not the cache, is closed.
* The above changes made it mostly unnecessary for the StreamsMetricsImpl to
expose a reference to the underlying Metrics registry, so I did a little extra work
to remove that reference, including removing inconsistently-used and unnecessary
calls to Metrics#close() in the tests.
The existing tests should be sufficient to verify this change.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J Sax <matthias@confluentio>, Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This PR implements a Scala wrapper library for Kafka Streams. The library is implemented as a project under streams, namely `:streams:streams-scala`. The PR contains the following:
* the library implementation of the wrapper abstractions
* the test suite
* the changes in `build.gradle` to build the library jar
The library has been tested running the tests as follows:
```
$ ./gradlew -Dtest.single=StreamToTableJoinScalaIntegrationTestImplicitSerdes streams:streams-scala:test
$ ./gradlew -Dtest.single=StreamToTableJoinScalaIntegrationTestImplicitSerdesWithAvro streams:streams-scala:test
$ ./gradlew -Dtest.single=WordCountTest streams:streams-scala:test
```
Author: Debasish Ghosh <ghosh.debasish@gmail.com>
Author: Sean Glover <seglo@randonom.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>, John Roesler <john@confluent.io>, Damian Guy <damian@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4756 from debasishg/scala-streams
* unify skipped records metering
* log warnings when things get skipped
* tighten up metrics usage a bit
### Testing strategy:
Unit testing of the metrics and the logs should be sufficient.
Author: John Roesler <john@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4812 from vvcephei/kip-274-streams-skip-metrics
There are a couple minor additions in this PR:
1. Add a new test for window store, to range query upon receiving each record.
2. In the non-windowed state store case, add a get call before the put call.
3. Enable caching by default to be consistent with other Join / Aggregate cases, where caching is enabled by default.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
In the AbstractResetIntegrationTest we can have a transient error when setting the time for the test where the new time is less than the original time, for those cases we should catch the exception and re-try setting the time once versus letting the test fail.
For testing, ran the entire streams test suite.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
If users don't create all topics before starting a streams application, they could get unexpected results. For example, sharing a state store between sub-topologies where one input topic is not created ahead time results in log message that that "Partition X is not assigned to any tasks" does not give any clues as to how this could have occurred.
Also, this PR changes the log level from INFO to WARN when metadata does not have partitions for a given topic.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Some anonymous classes of AbstractProcessor didn't initialize their superclass. This will not set up ProcessorContext context at AbstractProcessor.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
- adds Streams upgrade tests for 1.1 release
- introduces metadata version 3
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
General cleanup of Streams code, mostly resolving compiler warnings and re-formatting.
The regular testing suite should be sufficient.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
guozhangwang
While TopologyTestDriver works well with stores created from KTable it does not with stores from GlobalKTable.
Moreover, for my testing purposes but I think it can be useful to others, I need to get access to the MockProducer inside TopologyTestDriver.
I have added 4 new tests to TopologyTestDriverTest, two for stores from KTable and two for stores from GlobalKTable.
While I was changing the TopologyTestDriver I've also make it implement java.io.Closeable.
Author: Valentino Proietti <valentino.proietti@kydea.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4823 from Vale68/KAFKA-6742
minor renaming
SimpleBenchmark:
1.a Do not rely on manual num.records / bytes collection on atomic integers.
1.b Rely on config files for num.threads, bootstrap.servers, etc.
1.c Add parameters for key skewness and value size.
1.d Refactor the tests for loading phase, adding tumbling-windowed count.
1.e For consumer / consumeproduce, collect metrics on consumer instead.
1.f Force stop the test after 3 minutes, this is based on empirical numbers of 10M records.
Other tests: use config for kafka bootstrap servers.
streams_simple_benchmark.py: only use scale 1 for system test, remove yahoo from benchmark tests.
Note that the JMX based metrics is more accurate than the manually collected metrics.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>