Author: Florian Hussonnois <florian.hussonnois@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3811 from fhussonnois/KAFKA-5839
minor fixes
pull/3758/merge
Florian Hussonnois7 years agocommitted byGuozhang Wang
@ -2904,6 +2904,16 @@ Note that in the <code>WordCountProcessor</code> implementation, users need to r
@@ -2904,6 +2904,16 @@ Note that in the <code>WordCountProcessor</code> implementation, users need to r
);
</pre>
<p>
To retrieve information about the local running threads, you can use the <code>localThreadsMetadata()</code> method after you start the application.
</p>
<preclass="brush: java;">
// For instance, use this method to print/monitor the partitions assigned to each local tasks.
With 1.0 a major API refactoring was accomplished and the new API is cleaner and easier to use.
This change includes the five main classes <code>KafkaStreams<code>, <code>KStreamBuilder</code>,
This change includes the five main classes <code>KafkaStreams</code>, <code>KStreamBuilder</code>,
<code>KStream</code>, <code>KTable</code>, and <code>TopologyBuilder</code> (and some more others).
All changes are fully backward compatible as old API is only deprecated but not removed.
We recommend to move to the new API as soon as you can.
@ -59,7 +59,7 @@
@@ -59,7 +59,7 @@
<p>
The two main classes to specify a topology via the DSL (<code>KStreamBuilder</code>)
or the Processor API (<code>TopologyBuilder</code>) were deprecated and replaced by
<code>StreamsBuilder</code> and <code>Topology<code> (both new classes are located in
<code>StreamsBuilder</code> and <code>Topology</code> (both new classes are located in
package <code>org.apache.kafka.streams</code>).
Note, that <code>StreamsBuilder</code> does not extend <code>Topology</code>, i.e.,
the class hierarchy is different now.
@ -74,7 +74,7 @@
@@ -74,7 +74,7 @@
</p>
<p>
Changing how a topology is specified also affects <code>KafkaStreams<code> constructors,
Changing how a topology is specified also affects <code>KafkaStreams</code> constructors,
that now only accept a <code>Topology</code>.
Using the DSL builder class <code>StreamsBuilder</code> one can get the constructed
<code>Topology</code> via <code>StreamsBuilder#build()</code>.
@ -86,33 +86,61 @@
@@ -86,33 +86,61 @@
</p>
<p>
With the introduction of <ahref="https://cwiki.apache.org/confluence/display/KAFKA/KIP-182%3A+Reduce+Streams+DSL+overloads+and+allow+easier+use+of+custom+storage+engines">KIP-182</a>
you should no longer pass in <code>Serde</code> to <code>KStream#print</code> operations.
If you can't rely on using <code>toString</code> to print your keys an values, you should instead you provide a custom <code>KeyValueMapper</code> via the <code>Printed#withKeyValueMapper</code> call.
New methods in <code>KafkaStreams</code>:
</p>
<ul>
<li> retrieve the current runtime information about the local threads via <code>#localThreadsMetadata()</code></li>
Windowed aggregations have moved from <code>KGroupedStream</code> to <code>WindowedKStream</code>.
You can now perform a windowed aggregation by, for example, using <code>KGroupedStream#windowedBy(Windows)#reduce(Reducer)</code>.
Note: the previous aggregate functions on <code>KGroupedStream</code> still work, but have been deprecated.
Previously the above methods were used to return static and runtime information.
They have been deprecated in favor of using the new classes/methods<code>#localThreadsMetadata()</code> / <code>ThreadMetadata</code> (returning runtime information) and
The Processor API was extended to allow users to schedule <code>punctuate</code> functions either based on data-driven <b>stream time</b> or wall-clock time.
As a result, the original <code>ProcessorContext#schedule</code> is deprecated with a new overloaded function that accepts a user customizable <code>Punctuator</code> callback interface, which triggers its <code>punctuate</code> API method periodically based on the <code>PunctuationType</code>.
The <code>PunctuationType</code> determines what notion of time is used for the punctuation scheduling: either <ahref="/{{version}}/documentation/streams/core-concepts#streams_time">stream time</a> or wall-clock time (by default, <b>stream time</b> is configured to represent event time via <code>TimestampExtractor</code>).
In addition, the <code>punctuate</code> function inside <code>Processor</code> is also deprecated.
More deprecated methods in <code>KafkaStreams</code>:
</p>
<ul>
<li>With the introduction of <ahref="https://cwiki.apache.org/confluence/display/KAFKA/KIP-182%3A+Reduce+Streams+DSL+overloads+and+allow+easier+use+of+custom+storage+engines">KIP-182</a>
you should no longer pass in <code>Serde</code> to <code>KStream#print</code> operations.
If you can't rely on using <code>toString</code> to print your keys an values, you should instead you provide a custom <code>KeyValueMapper</code> via the <code>Printed#withKeyValueMapper</code> call.
</li>
<li>
Windowed aggregations have moved from <code>KGroupedStream</code> to <code>WindowedKStream</code>.
You can now perform a windowed aggregation by, for example, using <code>KGroupedStream#windowedBy(Windows)#reduce(Reducer)</code>.
Note: the previous aggregate functions on <code>KGroupedStream</code> still work, but have been deprecated.
</li>
</ul>
<p>
Before this, users could only schedule based on stream time (i.e. <code>PunctuationType.STREAM_TIME</code>) and hence the <code>punctuate</code> function was data-driven only because stream time is determined (and advanced forward) by the timestamps derived from the input data.
If there is no data arriving at the processor, the stream time would not advance and hence punctuation will not be triggered.
On the other hand, When wall-clock time (i.e. <code>PunctuationType.WALL_CLOCK_TIME</code>) is used, <code>punctuate</code> will be triggered purely based on wall-clock time.
So for example if the <code>Punctuator</code> function is scheduled based on <code>PunctuationType.WALL_CLOCK_TIME</code>, if these 60 records were processed within 20 seconds,
<code>punctuate</code> would be called 2 times (one time every 10 seconds);
if these 60 records were processed within 5 seconds, then no <code>punctuate</code> would be called at all.
Users can schedule multiple <code>Punctuator</code> callbacks with different <code>PunctuationType</code>s within the same processor by simply calling <code>ProcessorContext#schedule</code> multiple times inside processor's <code>init()</code> method.
Modified methods in <code>Processor</code>:
</p>
<ul>
<li>
<p>
The Processor API was extended to allow users to schedule <code>punctuate</code> functions either based on data-driven <b>stream time</b> or wall-clock time.
As a result, the original <code>ProcessorContext#schedule</code> is deprecated with a new overloaded function that accepts a user customizable <code>Punctuator</code> callback interface, which triggers its <code>punctuate</code> API method periodically based on the <code>PunctuationType</code>.
The <code>PunctuationType</code> determines what notion of time is used for the punctuation scheduling: either <ahref="/{{version}}/documentation/streams/core-concepts#streams_time">stream time</a> or wall-clock time (by default, <b>stream time</b> is configured to represent event time via <code>TimestampExtractor</code>).
In addition, the <code>punctuate</code> function inside <code>Processor</code> is also deprecated.
</p>
<p>
Before this, users could only schedule based on stream time (i.e. <code>PunctuationType.STREAM_TIME</code>) and hence the <code>punctuate</code> function was data-driven only because stream time is determined (and advanced forward) by the timestamps derived from the input data.
If there is no data arriving at the processor, the stream time would not advance and hence punctuation will not be triggered.
On the other hand, When wall-clock time (i.e. <code>PunctuationType.WALL_CLOCK_TIME</code>) is used, <code>punctuate</code> will be triggered purely based on wall-clock time.
So for example if the <code>Punctuator</code> function is scheduled based on <code>PunctuationType.WALL_CLOCK_TIME</code>, if these 60 records were processed within 20 seconds,
<code>punctuate</code> would be called 2 times (one time every 10 seconds);
if these 60 records were processed within 5 seconds, then no <code>punctuate</code> would be called at all.
Users can schedule multiple <code>Punctuator</code> callbacks with different <code>PunctuationType</code>s within the same processor by simply calling <code>ProcessorContext#schedule</code> multiple times inside processor's <code>init()</code> method.
</p>
</li>
</ul>
<p>
If you are monitoring on task level or processor-node / state store level Streams metrics, please note that the metrics sensor name and hierarchy was changed: