Browse Source

MINOR: Kafka Streams code samples formating unification (#10651)

Code samples are now unified and correctly formatted.
Samples under Streams use consistently the prism library.

Reviewers: Bruno Cadonna <cadonna@apache.org>
pull/10696/merge
Josep Prat 4 years ago committed by GitHub
parent
commit
aa25176e77
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 28
      docs/streams/developer-guide/app-reset-tool.html
  2. 299
      docs/streams/developer-guide/config-streams.html
  3. 62
      docs/streams/developer-guide/datatypes.html
  4. 1896
      docs/streams/developer-guide/dsl-api.html
  5. 186
      docs/streams/developer-guide/dsl-topology-naming.html
  6. 426
      docs/streams/developer-guide/interactive-queries.html
  7. 99
      docs/streams/developer-guide/memory-mgmt.html
  8. 208
      docs/streams/developer-guide/processor-api.html
  9. 7
      docs/streams/developer-guide/running-app.html
  10. 63
      docs/streams/developer-guide/security.html
  11. 42
      docs/streams/developer-guide/testing.html
  12. 140
      docs/streams/developer-guide/write-streams.html
  13. 172
      docs/streams/index.html
  14. 491
      docs/streams/tutorial.html

28
docs/streams/developer-guide/app-reset-tool.html

@ -78,44 +78,43 @@ @@ -78,44 +78,43 @@
<h2>Step 1: Run the application reset tool<a class="headerlink" href="#step-1-run-the-application-reset-tool" title="Permalink to this headline"></a></h2>
<p>Invoke the application reset tool from the command line</p>
<p>Warning! This tool makes irreversible changes to your application. It is strongly recommended that you run this once with <code class="docutils literal"><span class="pre">--dry-run</span></code> to preview your changes before making them.</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span><code>&lt;path-to-kafka&gt;/bin/kafka-streams-application-reset</code></pre></div>
</div>
<pre class="line-numbers"><code class="language-bash">&lt;path-to-kafka&gt;/bin/kafka-streams-application-reset</code></pre>
<p>The tool accepts the following parameters:</p>
<div class="highlight-bash"><div class="highlight"><pre><span>Option</span><code> <span class="o">(</span>* <span class="o">=</span> required<span class="o">)</span> Description
<pre class="line-numbers"><code class="language-text">Option (* = required) Description
--------------------- -----------
* --application-id &lt;String: id&gt; The Kafka Streams application ID
<span class="o">(</span>application.id<span class="o">)</span>.
(application.id).
--bootstrap-servers &lt;String: urls&gt; Comma-separated list of broker urls with
format: HOST1:PORT1,HOST2:PORT2
<span class="o">(</span>default: localhost:9092<span class="o">)</span>
--by-duration &lt;String: urls&gt; Reset offsets to offset by duration from
current timestamp. Format: '<span>PnDTnHnMnS</span>'
(default: localhost:9092)
--by-duration &lt;String: urls&gt; Reset offsets to offset by duration from
current timestamp. Format: &#39;PnDTnHnMnS&#39;
--config-file &lt;String: file name&gt; Property file containing configs to be
passed to admin clients and embedded
consumer.
--dry-run Display the actions that would be
performed without executing the reset
commands.
--from-file &lt;String: urls&gt; Reset offsets to values defined in CSV
--from-file &lt;String: urls&gt; Reset offsets to values defined in CSV
file.
--input-topics &lt;String: list&gt; Comma-separated list of user input
topics. For these topics, the tool will
reset the offset to the earliest
available offset.
--intermediate-topics &lt;String: list&gt; Comma-separated list of intermediate user
topics <span class="o">(</span>topics used in the through<span class="o">()</span>
method<span class="o">)</span>. For these topics, the tool
topics (topics used in the through()
method). For these topics, the tool
will skip to the end.
--internal-topics &lt;String: list&gt; Comma-separated list of internal topics
to delete. Must be a subset of the
internal topics marked for deletion by
the default behaviour (do a dry-run without
this option to view these topics).
--shift-by &lt;Long: number-of-offsets&gt; Reset offsets shifting current offset by
'n', where 'n' can be positive or
--shift-by &lt;Long: number-of-offsets&gt; Reset offsets shifting current offset by
&#39;n&#39;, where &#39;n&#39; can be positive or
negative
--to-datetime &lt;String&gt; Reset offsets to offset from datetime.
Format: 'YYYY-MM-DDTHH:mm:SS.sss'
Format: &#39;YYYY-MM-DDTHH:mm:SS.sss&#39;
--to-earliest Reset offsets to earliest offset.
--to-latest Reset offsets to latest offset.
--to-offset &lt;Long&gt; Reset offsets to a specific offset.
@ -125,8 +124,7 @@ @@ -125,8 +124,7 @@
directly.
--force Force removing members of the consumer group
(intended to remove left-over members if
long session timeout was configured).</code></pre></div>
</div>
long session timeout was configured).</code></pre>
<p>Consider the following as reset-offset scenarios for <code>input-topics</code>:</p>
<ul>
<li> by-duration</li>

299
docs/streams/developer-guide/config-streams.html

@ -39,16 +39,15 @@ @@ -39,16 +39,15 @@
<li><p class="first">Create a <code class="docutils literal"><span class="pre">java.util.Properties</span></code> instance.</p>
</li>
<li><p class="first">Set the <a class="reference internal" href="#streams-developer-guide-required-configs"><span class="std std-ref">parameters</span></a>. For example:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Properties</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Set a few key parameters</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_ID_CONFIG</span><span class="o">,</span> <span class="s">&quot;my-first-streams-application&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">&quot;kafka-broker1:9092&quot;</span><span class="o">);</span>
<span class="c1">// Any further settings</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(...</span> <span class="o">,</span> <span class="o">...);</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">import java.util.Properties;
import org.apache.kafka.streams.StreamsConfig;
Properties settings = new Properties();
// Set a few key parameters
settings.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-first-streams-application");
settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
// Any further settings
settings.put(... , ...);</code></pre>
</li>
</ol>
<div class="section" id="configuration-parameter-reference">
@ -396,31 +395,31 @@ @@ -396,31 +395,31 @@
The drawback of this approach is that "manual" writes are side effects that are invisible to the Kafka Streams runtime library,
so they do not benefit from the end-to-end processing guarantees of the Streams API:</p>
<pre class="line-numbers"><code class="language-java"> public class SendToDeadLetterQueueExceptionHandler implements DeserializationExceptionHandler {
KafkaProducer&lt;byte[], byte[]&gt; dlqProducer;
String dlqTopic;
<pre class="line-numbers"><code class="language-java">public class SendToDeadLetterQueueExceptionHandler implements DeserializationExceptionHandler {
KafkaProducer&lt;byte[], byte[]&gt; dlqProducer;
String dlqTopic;
@Override
public DeserializationHandlerResponse handle(final ProcessorContext context,
final ConsumerRecord&lt;byte[], byte[]&gt; record,
final Exception exception) {
@Override
public DeserializationHandlerResponse handle(final ProcessorContext context,
final ConsumerRecord&lt;byte[], byte[]&gt; record,
final Exception exception) {
log.warn("Exception caught during Deserialization, sending to the dead queue topic; " +
"taskId: {}, topic: {}, partition: {}, offset: {}",
context.taskId(), record.topic(), record.partition(), record.offset(),
exception);
log.warn("Exception caught during Deserialization, sending to the dead queue topic; " +
"taskId: {}, topic: {}, partition: {}, offset: {}",
context.taskId(), record.topic(), record.partition(), record.offset(),
exception);
dlqProducer.send(new ProducerRecord&lt;&gt;(dlqTopic, record.timestamp(), record.key(), record.value(), record.headers())).get();
dlqProducer.send(new ProducerRecord&lt;&gt;(dlqTopic, record.timestamp(), record.key(), record.value(), record.headers())).get();
return DeserializationHandlerResponse.CONTINUE;
}
return DeserializationHandlerResponse.CONTINUE;
}
@Override
public void configure(final Map&lt;String, ?&gt; configs) {
dlqProducer = .. // get a producer from the configs map
dlqTopic = .. // get the topic name from the configs map
}
}</code></pre>
@Override
public void configure(final Map&lt;String, ?&gt; configs) {
dlqProducer = .. // get a producer from the configs map
dlqTopic = .. // get the topic name from the configs map
}
}</code></pre>
</div></blockquote>
</div>
@ -434,32 +433,31 @@ @@ -434,32 +433,31 @@
<p>Each exception handler can return a <code>FAIL</code> or <code>CONTINUE</code> depending on the record and the exception thrown. Returning <code>FAIL</code> will signal that Streams should shut down and <code>CONTINUE</code> will signal that Streams
should ignore the issue and continue processing. If you want to provide an exception handler that always ignores records that are too large, you could implement something like the following:</p>
<pre class="line-numbers"><code class="language-java">
import java.util.Properties;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.common.errors.RecordTooLargeException;
import org.apache.kafka.streams.errors.ProductionExceptionHandler;
import org.apache.kafka.streams.errors.ProductionExceptionHandler.ProductionExceptionHandlerResponse;
<pre class="line-numbers"><code class="language-java">import java.util.Properties;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.common.errors.RecordTooLargeException;
import org.apache.kafka.streams.errors.ProductionExceptionHandler;
import org.apache.kafka.streams.errors.ProductionExceptionHandler.ProductionExceptionHandlerResponse;
public class IgnoreRecordTooLargeHandler implements ProductionExceptionHandler {
public void configure(Map&lt;String, Object&gt; config) {}
public class IgnoreRecordTooLargeHandler implements ProductionExceptionHandler {
public void configure(Map&lt;String, Object&gt; config) {}
public ProductionExceptionHandlerResponse handle(final ProducerRecord&lt;byte[], byte[]&gt; record,
final Exception exception) {
if (exception instanceof RecordTooLargeException) {
return ProductionExceptionHandlerResponse.CONTINUE;
} else {
return ProductionExceptionHandlerResponse.FAIL;
}
}
}
public ProductionExceptionHandlerResponse handle(final ProducerRecord&lt;byte[], byte[]&gt; record,
final Exception exception) {
if (exception instanceof RecordTooLargeException) {
return ProductionExceptionHandlerResponse.CONTINUE;
} else {
return ProductionExceptionHandlerResponse.FAIL;
}
}
}
Properties settings = new Properties();
Properties settings = new Properties();
// other various kafka streams settings, e.g. bootstrap servers, application id, etc
// other various kafka streams settings, e.g. bootstrap servers, application id, etc
settings.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
IgnoreRecordTooLargeHandler.class);</code></pre></div>
settings.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
IgnoreRecordTooLargeHandler.class);</code></pre></div>
</blockquote>
</div>
<div class="section" id="timestamp-extractor">
@ -512,42 +510,39 @@ @@ -512,42 +510,39 @@
processed but silently dropped. If you want to estimate a new timestamp, you can use the value provided via
<code class="docutils literal"><span class="pre">previousTimestamp</span></code> (i.e., a Kafka Streams timestamp estimation). Here is an example of a custom
<code class="docutils literal"><span class="pre">TimestampExtractor</span></code> implementation:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerRecord</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.TimestampExtractor</span><span class="o">;</span>
<span class="c1">// Extracts the embedded timestamp of a record (giving you &quot;event-time&quot; semantics).</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyEventTimeExtractor</span> <span class="kd">implements</span> <span class="n">TimestampExtractor</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">long</span> <span class="nf">extract</span><span class="o">(</span><span class="kd">final</span> <span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">record</span><span class="o">,</span> <span class="kd">final</span> <span class="kt">long</span> <span class="n">previousTimestamp</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// `Foo` is your own custom class, which we assume has a method that returns</span>
<span class="c1">// the embedded timestamp (milliseconds since midnight, January 1, 1970 UTC).</span>
<span class="kt">long</span> <span class="n">timestamp</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="o">;</span>
<span class="kd">final</span> <span class="n">Foo</span> <span class="n">myPojo</span> <span class="o">=</span> <span class="o">(</span><span class="n">Foo</span><span class="o">)</span> <span class="n">record</span><span class="o">.</span><span class="na">value</span><span class="o">();</span>
<span class="k">if</span> <span class="o">(</span><span class="n">myPojo</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">timestamp</span> <span class="o">=</span> <span class="n">myPojo</span><span class="o">.</span><span class="na">getTimestampInMillis</span><span class="o">();</span>
<span class="o">}</span>
<span class="k">if</span> <span class="o">(</span><span class="n">timestamp</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// Invalid timestamp! Attempt to estimate a new timestamp,</span>
<span class="c1">// otherwise fall back to wall-clock time (processing-time).</span>
<span class="k">if</span> <span class="o">(</span><span class="n">previousTimestamp</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">previousTimestamp</span><span class="o">;</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
</div>
<pre class="line-numbers"><code class="language-java">import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.streams.processor.TimestampExtractor;
// Extracts the embedded timestamp of a record (giving you &quot;event-time&quot; semantics).
public class MyEventTimeExtractor implements TimestampExtractor {
@Override
public long extract(final ConsumerRecord&lt;Object, Object&gt; record, final long previousTimestamp) {
// `Foo` is your own custom class, which we assume has a method that returns
// the embedded timestamp (milliseconds since midnight, January 1, 1970 UTC).
long timestamp = -1;
final Foo myPojo = (Foo) record.value();
if (myPojo != null) {
timestamp = myPojo.getTimestampInMillis();
}
if (timestamp &lt; 0) {
// Invalid timestamp! Attempt to estimate a new timestamp,
// otherwise fall back to wall-clock time (processing-time).
if (previousTimestamp &gt;= 0) {
return previousTimestamp;
} else {
return System.currentTimeMillis();
}
}
}
}</code></pre>
<p>You would then define the custom timestamp extractor in your Streams configuration as follows:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Properties</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<pre class="line-numbers"><code class="language-java">import java.util.Properties;
import org.apache.kafka.streams.StreamsConfig;
<span class="n">Properties</span> <span class="n">streamsConfiguration</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsConfiguration</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG</span><span class="o">,</span> <span class="n">MyEventTimeExtractor</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
</pre></div>
Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, MyEventTimeExtractor.class);</code></pre>
</div>
</div></blockquote>
</div>
@ -707,38 +702,33 @@ @@ -707,38 +702,33 @@
<div><p>The RocksDB configuration. Kafka Streams uses RocksDB as the default storage engine for persistent stores. To change the default
configuration for RocksDB, you can implement <code class="docutils literal"><span class="pre">RocksDBConfigSetter</span></code> and provide your custom class via <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/RocksDBConfigSetter.html">rocksdb.config.setter</a>.</p>
<p>Here is an example that adjusts the memory size consumed by RocksDB.</p>
<div class="highlight-java"><div class="highlight">
<pre>
<span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">CustomRocksDBConfig</span> <span class="kd">implements</span> <span class="n">RocksDBConfigSetter</span> <span class="o">{</span>
<span class="c1">// This object should be a member variable so it can be closed in RocksDBConfigSetter#close.</span>
<span class="kd">private</span> <span class="n">org.rocksdb.Cache</span> <span class="n">cache</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">LRUCache</span><span class="o">(</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">1024L</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setConfig</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">configs</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// See #1 below.</span>
<span class="n">BlockBasedTableConfig</span> <span class="n">tableConfig</span> <span class="o">=</span> <span class="k">(BlockBasedTableConfig)</span> <span class="n">options</span><span><span class="o">.</span><span class="na">tableFormatConfig</span><span class="o">();</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockCache</span><span class="o">(</span><span class="mi">cache</span></span><span class="o">);</span>
<span class="c1">// See #2 below.</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockSize</span><span class="o">(</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
<span class="c1">// See #3 below.</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocks</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
<span class="n">options</span><span class="o">.</span><span class="na">setTableFormatConfig</span><span class="o">(</span><span class="n">tableConfig</span><span class="o">);</span>
<span class="c1">// See #4 below.</span>
<span class="n">options</span><span class="o">.</span><span class="na">setMaxWriteBufferNumber</span><span class="o">(</span><span class="mi">2</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// See #5 below.</span>
<span class="n">cache</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">ROCKSDB_CONFIG_SETTER_CLASS_CONFIG</span><span class="o">,</span> <span class="n">CustomRocksDBConfig</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
</pre>
</div>
</div>
<pre class="line-numbers"><code class="language-java">public static class CustomRocksDBConfig implements RocksDBConfigSetter {
// This object should be a member variable so it can be closed in RocksDBConfigSetter#close.
private org.rocksdb.Cache cache = new org.rocksdb.LRUCache(16 * 1024L * 1024L);
@Override
public void setConfig(final String storeName, final Options options, final Map&lt;String, Object&gt; configs) {
// See #1 below.
BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();
tableConfig.setBlockCache(cache);
// See #2 below.
tableConfig.setBlockSize(16 * 1024L);
// See #3 below.
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setTableFormatConfig(tableConfig);
// See #4 below.
options.setMaxWriteBufferNumber(2);
}
@Override
public void close(final String storeName, final Options options) {
// See #5 below.
cache.close();
}
}
Properties streamsSettings = new Properties();
streamsConfig.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, CustomRocksDBConfig.class);</code></pre>
<dl class="docutils">
<dt>Notes for example:</dt>
<dd><ol class="first last arabic simple">
@ -798,12 +788,12 @@ @@ -798,12 +788,12 @@
and <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/kafka/clients/admin/package-summary.html">admin client</a> that are used internally.
The consumer, producer and admin client settings are defined by specifying parameters in a <code class="docutils literal"><span class="pre">StreamsConfig</span></code> instance.</p>
<p>In this example, the Kafka <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/clients/consumer/ConsumerConfig.html#SESSION_TIMEOUT_MS_CONFIG">consumer session timeout</a> is configured to be 60000 milliseconds in the Streams settings:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Example of a &quot;normal&quot; setting for Kafka Streams</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">&quot;kafka-broker-01:9092&quot;</span><span class="o">);</span>
<span class="c1">// Customize the Kafka consumer settings of your Streams application</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">ConsumerConfig</span><span class="o">.</span><span class="na">SESSION_TIMEOUT_MS_CONFIG</span><span class="o">,</span> <span class="mi">60000</span><span class="o">);</span>
</pre></div>
<pre class="line-numbers"><code class="language-java">Properties streamsSettings = new Properties();
// Example of a &quot;normal&quot; setting for Kafka Streams
streamsSettings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, &quot;kafka-broker-01:9092&quot;);
// Customize the Kafka consumer settings of your Streams application
streamsSettings.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 60000);</code></pre>
</div>
<div class="section" id="naming">
<h4><a class="toc-backref" href="#id17">Naming</a><a class="headerlink" href="#naming" title="Permalink to this headline"></a></h4>
@ -811,18 +801,17 @@ @@ -811,18 +801,17 @@
<code class="docutils literal"><span class="pre">receive.buffer.bytes</span></code> are used to configure TCP buffers; <code class="docutils literal"><span class="pre">request.timeout.ms</span></code> and <code class="docutils literal"><span class="pre">retry.backoff.ms</span></code> control retries for client request;
<code class="docutils literal"><span class="pre">retries</span></code> are used to configure how many retries are allowed when handling retriable errors from broker request responses.
You can avoid duplicate names by prefix parameter names with <code class="docutils literal"><span class="pre">consumer.</span></code>, <code class="docutils literal"><span class="pre">producer.</span></code>, or <code class="docutils literal"><span class="pre">admin.</span></code> (e.g., <code class="docutils literal"><span class="pre">consumer.send.buffer.bytes</span></code> and <code class="docutils literal"><span class="pre">producer.send.buffer.bytes</span></code>).</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// same value for consumer, producer, and admin client</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;value&quot;</span><span class="o">);</span>
<span class="c1">// different values for consumer and producer</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;consumer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;consumer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;producer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;producer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;admin.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;admin-value&quot;</span><span class="o">);</span>
<span class="c1">// alternatively, you can use</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">consumerPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;consumer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">producerPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;producer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">adminClientPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;admin-value&quot;</span><span class="o">);</span>
</pre></div>
<pre class="line-numbers"><code class="language-java">Properties streamsSettings = new Properties();
// same value for consumer, producer, and admin client
streamsSettings.put(&quot;PARAMETER_NAME&quot;, &quot;value&quot;);
// different values for consumer and producer
streamsSettings.put(&quot;consumer.PARAMETER_NAME&quot;, &quot;consumer-value&quot;);
streamsSettings.put(&quot;producer.PARAMETER_NAME&quot;, &quot;producer-value&quot;);
streamsSettings.put(&quot;admin.PARAMETER_NAME&quot;, &quot;admin-value&quot;);
// alternatively, you can use
streamsSettings.put(StreamsConfig.consumerPrefix(&quot;PARAMETER_NAME&quot;), &quot;consumer-value&quot;);
streamsSettings.put(StreamsConfig.producerPrefix(&quot;PARAMETER_NAME&quot;), &quot;producer-value&quot;);
streamsSettings.put(StreamsConfig.adminClientPrefix(&quot;PARAMETER_NAME&quot;), &quot;admin-value&quot;);</code></pre>
<p>You could further separate consumer configuration by adding different prefixes:</p>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">main.consumer.</span></code> for main consumer which is the default consumer of stream source.</li>
@ -830,25 +819,21 @@ @@ -830,25 +819,21 @@
<li><code class="docutils literal"><span class="pre">global.consumer.</span></code> for global consumer which is used in global KTable construction.</li>
</ul>
<p>For example, if you only want to set restore consumer config without touching other consumers' settings, you could simply use <code class="docutils literal"><span class="pre">restore.consumer.</span></code> to set the config.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// same config value for all consumer types</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;consumer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;general-consumer-value&quot;</span><span class="o">);</span>
<span class="c1">// set a different restore consumer config. This would make restore consumer take restore-consumer-value,</span>
<span>// while main consumer and global consumer stay with general-consumer-value</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;restore.consumer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;restore-consumer-value&quot;</span><span class="o">);</span>
<span class="c1">// alternatively, you can use</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">restoreConsumerPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;restore-consumer-value&quot;</span><span class="o">);</span>
</pre></div>
</div>
<pre class="line-numbers"><code class="language-java">Properties streamsSettings = new Properties();
// same config value for all consumer types
streamsSettings.put("consumer.PARAMETER_NAME", "general-consumer-value");
// set a different restore consumer config. This would make restore consumer take restore-consumer-value,
// while main consumer and global consumer stay with general-consumer-value
streamsSettings.put("restore.consumer.PARAMETER_NAME", "restore-consumer-value");
// alternatively, you can use
streamsSettings.put(StreamsConfig.restoreConsumerPrefix("PARAMETER_NAME"), "restore-consumer-value");</code></pre>
<p> Same applied to <code class="docutils literal"><span class="pre">main.consumer.</span></code> and <code class="docutils literal"><span class="pre">main.consumer.</span></code>, if you only want to specify one consumer type config.</p>
<p> Additionally, to configure the internal repartition/changelog topics, you could use the <code class="docutils literal"><span class="pre">topic.</span></code> prefix, followed by any of the standard topic configs.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Override default for both changelog and repartition topics</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;topic.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;topic-value&quot;</span><span class="o">);</span>
<span class="c1">// alternatively, you can use</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">topicPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;topic-value&quot;</span><span class="o">);</span>
</pre></div>
</div>
<pre class="line-numbers"><code class="language-java">Properties streamsSettings = new Properties();
// Override default for both changelog and repartition topics
streamsSettings.put("topic.PARAMETER_NAME", "topic-value");
// alternatively, you can use
streamsSettings.put(StreamsConfig.topicPrefix("PARAMETER_NAME"), "topic-value");</code></pre>
</div>
</div>
<div class="section" id="default-values">
@ -977,11 +962,11 @@ @@ -977,11 +962,11 @@
<h4><a class="toc-backref" href="#id23">replication.factor</a><a class="headerlink" href="#id2" title="Permalink to this headline"></a></h4>
<blockquote>
<div>See the <a class="reference internal" href="#replication-factor-parm"><span class="std std-ref">description here</span></a>.</div></blockquote>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">REPLICATION_FACTOR_CONFIG</span><span class="o">,</span> <span class="mi">3</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">topicPrefix</span><span class="o">(</span><span class="n">TopicConfig</span><span class="o">.</span><span class="na">MIN_IN_SYNC_REPLICAS_CONFIG</span><span class="o">),</span> <span class="mi">2</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">producerPrefix</span><span class="o">(</span><span class="n">ProducerConfig</span><span class="o">.</span><span class="na">ACKS_CONFIG</span><span class="o">),</span> <span class="s">&quot;all&quot;</span><span class="o">);</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">Properties streamsSettings = new Properties();
streamsSettings.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 3);
streamsSettings.put(StreamsConfig.topicPrefix(TopicConfig.MIN_IN_SYNC_REPLICAS_CONFIG), 2);
streamsSettings.put(StreamsConfig.producerPrefix(ProducerConfig.ACKS_CONFIG), "all");</code></pre>
</div>
</div>
</div>

62
docs/streams/developer-guide/datatypes.html

@ -55,40 +55,37 @@ @@ -55,40 +55,37 @@
<div class="section" id="configuring-serdes">
<h2>Configuring SerDes<a class="headerlink" href="#configuring-serdes" title="Permalink to this headline"></a></h2>
<p>SerDes specified in the Streams configuration are used as the default in your Kafka Streams application.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Default serde for keys of data records (here: built-in serde for String type)</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">DEFAULT_KEY_SERDE_CLASS_CONFIG</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">().</span><span class="na">getClass</span><span class="o">().</span><span class="na">getName</span><span class="o">());</span>
<span class="c1">// Default serde for values of data records (here: built-in serde for Long type)</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">DEFAULT_VALUE_SERDE_CLASS_CONFIG</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">().</span><span class="na">getClass</span><span class="o">().</span><span class="na">getName</span><span class="o">());</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.StreamsConfig;
Properties settings = new Properties();
// Default serde for keys of data records (here: built-in serde for String type)
settings.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
// Default serde for values of data records (here: built-in serde for Long type)
settings.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Long().getClass().getName());</code></pre>
</div>
<div class="section" id="overriding-default-serdes">
<h2>Overriding default SerDes<a class="headerlink" href="#overriding-default-serdes" title="Permalink to this headline"></a></h2>
<p>You can also specify SerDes explicitly by passing them to the appropriate API methods, which overrides the default serde settings:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serde</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
<pre class="line-numbers"><code class="language-java">import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
<span class="kd">final</span> <span class="n">Serde</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">stringSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">();</span>
<span class="kd">final</span> <span class="n">Serde</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">longSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">();</span>
final Serde&lt;String&gt; stringSerde = Serdes.String();
final Serde&lt;Long&gt; longSerde = Serdes.Long();
<span class="c1">// The stream userCountByRegion has type `String` for record keys (for region)</span>
<span class="c1">// and type `Long` for record values (for user counts).</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">userCountByRegion</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">userCountByRegion</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">&quot;RegionCountsTopic&quot;</span><span class="o">,</span> <span class="n">Produced</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">longSerde</span><span class="o">));</span></code></pre></div>
</div>
// The stream userCountByRegion has type `String` for record keys (for region)
// and type `Long` for record values (for user counts).
KStream&lt;String, Long&gt; userCountByRegion = ...;
userCountByRegion.to(&quot;RegionCountsTopic&quot;, Produced.with(stringSerde, longSerde));</code></pre>
<p>If you want to override serdes selectively, i.e., keep the defaults for some fields, then don&#8217;t specify the serde whenever you want to leverage the default settings:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serde</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
<span class="c1">// Use the default serializer for record keys (here: region as String) by not specifying the key serde,</span>
<span class="c1">// but override the default serializer for record values (here: userCount as Long).</span>
<span class="kd">final</span> <span class="n">Serde</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">longSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">();</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">userCountByRegion</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">userCountByRegion</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">&quot;RegionCountsTopic&quot;</span><span class="o">,</span> <span class="n">Produced</span><span class="o">.</span><span class="na">valueSerde</span><span class="o">(</span><span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">()));</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
// Use the default serializer for record keys (here: region as String) by not specifying the key serde,
// but override the default serializer for record values (here: userCount as Long).
final Serde&lt;Long&gt; longSerde = Serdes.Long();
KStream&lt;String, Long&gt; userCountByRegion = ...;
userCountByRegion.to(&quot;RegionCountsTopic&quot;, Produced.valueSerde(Serdes.Long()));</code></pre>
<p>If some of your incoming records are corrupted or ill-formatted, they will cause the deserializer class to report an error.
Since 1.0.x we have introduced an <code>DeserializationExceptionHandler</code> interface which allows
you to customize how to handle such records. The customized implementation of the interface can be specified via the <code>StreamsConfig</code>.
@ -101,12 +98,11 @@ @@ -101,12 +98,11 @@
<h3>Primitive and basic types<a class="headerlink" href="#primitive-and-basic-types" title="Permalink to this headline"></a></h3>
<p>Apache Kafka includes several built-in serde implementations for Java primitives and basic types such as <code class="docutils literal"><span class="pre">byte[]</span></code> in
its <code class="docutils literal"><span class="pre">kafka-clients</span></code> Maven artifact:</p>
<div class="highlight-xml"><div class="highlight"><pre><code><span class="nt">&lt;dependency&gt;</span>
<span class="nt">&lt;groupId&gt;</span>org.apache.kafka<span class="nt">&lt;/groupId&gt;</span>
<span class="nt">&lt;artifactId&gt;</span>kafka-clients<span class="nt">&lt;/artifactId&gt;</span>
<span class="nt">&lt;version&gt;</span>{{fullDotVersion}}<span class="nt">&lt;/version&gt;</span>
<span class="nt">&lt;/dependency&gt;</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-xml-doc">&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;2.8.0&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<p>This artifact provides the following serde implementations under the package <a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/clients/src/main/java/org/apache/kafka/common/serialization">org.apache.kafka.common.serialization</a>, which you can leverage when e.g., defining default serializers in your Streams configuration.</p>
<table border="1" class="docutils">
<colgroup>

1896
docs/streams/developer-guide/dsl-api.html

File diff suppressed because it is too large Load Diff

186
docs/streams/developer-guide/dsl-topology-naming.html

@ -71,28 +71,28 @@ @@ -71,28 +71,28 @@
For example, consider the following simple topology:
<br/>
<pre class="line-numbers"><code class="language-text"> KStream&lt;String,String&gt; stream = builder.stream("input");
stream.filter((k,v) -> !v.equals("invalid_txn"))
.mapValues((v) -> v.substring(0,5))
.to("output")</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String,String&gt; stream = builder.stream("input");
stream.filter((k,v) -> !v.equals("invalid_txn"))
.mapValues((v) -> v.substring(0,5))
.to("output");</code></pre>
</p>
<p>
Running <code>Topology#describe()</code> yields this string:
<pre class="line-numbers"><code class="language-text"> Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-MAPVALUES-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-MAPVALUES-0000000002 (stores: [])
--> KSTREAM-SINK-0000000003
<-- KSTREAM-FILTER-0000000001
Sink: KSTREAM-SINK-0000000003 (topic: output)
<-- KSTREAM-MAPVALUES-0000000002</code></pre>
<pre class="line-numbers"><code class="language-text">Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-MAPVALUES-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-MAPVALUES-0000000002 (stores: [])
--> KSTREAM-SINK-0000000003
<-- KSTREAM-FILTER-0000000001
Sink: KSTREAM-SINK-0000000003 (topic: output)
<-- KSTREAM-MAPVALUES-0000000002</code></pre>
From this report, you can see what the different operators are, but what is the broader context here?
For example, consider <code>KSTREAM-FILTER-0000000001</code>, we can see that it's a
@ -112,24 +112,24 @@ @@ -112,24 +112,24 @@
</p>
<p>
Now let's take a look at your topology with all the processors named:
<pre class="line-numbers"><code class="language-text"> KStream&lt;String,String&gt; stream =
builder.stream("input", Consumed.as("Customer_transactions_input_topic"));
stream.filter((k,v) -> !v.equals("invalid_txn"), Named.as("filter_out_invalid_txns"))
.mapValues((v) -> v.substring(0,5), Named.as("Map_values_to_first_6_characters"))
.to("output", Produced.as("Mapped_transactions_output_topic"));</code></pre>
<pre class="line-numbers"><code class="language-text"> Topologies:
Sub-topology: 0
Source: Customer_transactions_input_topic (topics: [input])
--> filter_out_invalid_txns
Processor: filter_out_invalid_txns (stores: [])
--> Map_values_to_first_6_characters
<-- Customer_transactions_input_topic
Processor: Map_values_to_first_6_characters (stores: [])
--> Mapped_transactions_output_topic
<-- filter_out_invalid_txns
Sink: Mapped_transactions_output_topic (topic: output)
<-- Map_values_to_first_6_characters</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String,String&gt; stream =
builder.stream("input", Consumed.as("Customer_transactions_input_topic"));
stream.filter((k,v) -> !v.equals("invalid_txn"), Named.as("filter_out_invalid_txns"))
.mapValues((v) -> v.substring(0,5), Named.as("Map_values_to_first_6_characters"))
.to("output", Produced.as("Mapped_transactions_output_topic"));</code></pre>
<pre class="line-numbers"><code class="language-text">Topologies:
Sub-topology: 0
Source: Customer_transactions_input_topic (topics: [input])
--> filter_out_invalid_txns
Processor: filter_out_invalid_txns (stores: [])
--> Map_values_to_first_6_characters
<-- Customer_transactions_input_topic
Processor: Map_values_to_first_6_characters (stores: [])
--> Mapped_transactions_output_topic
<-- filter_out_invalid_txns
Sink: Mapped_transactions_output_topic (topic: output)
<-- Map_values_to_first_6_characters</code></pre>
Now you can look at the topology description and easily understand what role each processor
plays in the topology. But there's another reason for naming your processor nodes when you
@ -151,52 +151,52 @@ @@ -151,52 +151,52 @@
shifting does have implications for topologies with stateful operators or repartition topics.
Here's a different topology with some state:
<pre class="line-numbers"><code class="language-text"> KStream&lt;String,String&gt; stream = builder.stream("input");
stream.groupByKey()
.count()
.toStream()
.to("output");</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String,String&gt; stream = builder.stream("input");
stream.groupByKey()
.count()
.toStream()
.to("output");</code></pre>
This topology description yields the following:
<pre class="line-numbers"><code class="language-text"> Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-AGGREGATE-0000000002
Processor: KSTREAM-AGGREGATE-0000000002 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000001])
--> KTABLE-TOSTREAM-0000000003
<-- KSTREAM-SOURCE-0000000000
Processor: KTABLE-TOSTREAM-0000000003 (stores: [])
--> KSTREAM-SINK-0000000004
<-- KSTREAM-AGGREGATE-0000000002
Sink: KSTREAM-SINK-0000000004 (topic: output)
<-- KTABLE-TOSTREAM-0000000003</code></pre>
<pre class="line-numbers"><code class="language-text">Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-AGGREGATE-0000000002
Processor: KSTREAM-AGGREGATE-0000000002 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000001])
--> KTABLE-TOSTREAM-0000000003
<-- KSTREAM-SOURCE-0000000000
Processor: KTABLE-TOSTREAM-0000000003 (stores: [])
--> KSTREAM-SINK-0000000004
<-- KSTREAM-AGGREGATE-0000000002
Sink: KSTREAM-SINK-0000000004 (topic: output)
<-- KTABLE-TOSTREAM-0000000003</code></pre>
</p>
<p>
You can see from the topology description above that the state store is named
<code>KSTREAM-AGGREGATE-STATE-STORE-0000000002</code>. Here's what happens when you
add a filter to keep some of the records out of the aggregation:
<pre class="line-numbers"><code class="language-text"> KStream&lt;String,String&gt; stream = builder.stream("input");
stream.filter((k,v)-> v !=null && v.length() >= 6 )
.groupByKey()
.count()
.toStream()
.to("output");</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String,String&gt; stream = builder.stream("input");
stream.filter((k,v)-> v !=null && v.length() >= 6 )
.groupByKey()
.count()
.toStream()
.to("output");</code></pre>
And the corresponding topology:
<pre class="line-numbers"><code class="language-text"> Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-AGGREGATE-0000000003
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-AGGREGATE-0000000003 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000002])
--> KTABLE-TOSTREAM-0000000004
<-- KSTREAM-FILTER-0000000001
Processor: KTABLE-TOSTREAM-0000000004 (stores: [])
--> KSTREAM-SINK-0000000005
<-- KSTREAM-AGGREGATE-0000000003
Sink: KSTREAM-SINK-0000000005 (topic: output)
<-- KTABLE-TOSTREAM-0000000004</code></pre>
<pre class="line-numbers"><code class="language-text">Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-AGGREGATE-0000000003
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-AGGREGATE-0000000003 (stores: [KSTREAM-AGGREGATE-STATE-STORE-0000000002])
--> KTABLE-TOSTREAM-0000000004
<-- KSTREAM-FILTER-0000000001
Processor: KTABLE-TOSTREAM-0000000004 (stores: [])
--> KSTREAM-SINK-0000000005
<-- KSTREAM-AGGREGATE-0000000003
Sink: KSTREAM-SINK-0000000005 (topic: output)
<-- KTABLE-TOSTREAM-0000000004</code></pre>
</p>
<p>
Notice that since you've added an operation <em>before</em> the <code>count</code> operation, the state
@ -216,30 +216,30 @@ @@ -216,30 +216,30 @@
But it's worth reiterating the importance of naming these DSL topology operations again.
Here's how your DSL code looks now giving a specific name to your state store:
<pre class="line-numbers"><code class="language-text"> KStream&lt;String,String&gt; stream = builder.stream("input");
stream.filter((k, v) -> v != null && v.length() >= 6)
.groupByKey()
.count(Materialized.as("Purchase_count_store"))
.toStream()
.to("output");</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String,String&gt; stream = builder.stream("input");
stream.filter((k, v) -> v != null && v.length() >= 6)
.groupByKey()
.count(Materialized.as("Purchase_count_store"))
.toStream()
.to("output");</code></pre>
And here's the topology
<pre class="line-numbers"><code class="language-text"> Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-AGGREGATE-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-AGGREGATE-0000000002 (stores: [Purchase_count_store])
--> KTABLE-TOSTREAM-0000000003
<-- KSTREAM-FILTER-0000000001
Processor: KTABLE-TOSTREAM-0000000003 (stores: [])
--> KSTREAM-SINK-0000000004
<-- KSTREAM-AGGREGATE-0000000002
Sink: KSTREAM-SINK-0000000004 (topic: output)
<-- KTABLE-TOSTREAM-0000000003</code></pre>
<pre class="line-numbers"><code class="language-text">Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-AGGREGATE-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-AGGREGATE-0000000002 (stores: [Purchase_count_store])
--> KTABLE-TOSTREAM-0000000003
<-- KSTREAM-FILTER-0000000001
Processor: KTABLE-TOSTREAM-0000000003 (stores: [])
--> KSTREAM-SINK-0000000004
<-- KSTREAM-AGGREGATE-0000000002
Sink: KSTREAM-SINK-0000000004 (topic: output)
<-- KTABLE-TOSTREAM-0000000003</code></pre>
</p>
<p>
Now, even though you've added processors before your state store, the store name and its changelog

426
docs/streams/developer-guide/interactive-queries.html

@ -129,95 +129,87 @@ @@ -129,95 +129,87 @@
<span id="streams-developer-guide-interactive-queries-local-key-value-stores"></span><h3><a class="toc-backref" href="#id4">Querying local key-value stores</a><a class="headerlink" href="#querying-local-key-value-stores" title="Permalink to this headline"></a></h3>
<p>To query a local key-value store, you must first create a topology with a key-value store. This example creates a key-value
store named &#8220;CountsKeyValueStore&#8221;. This store will hold the latest count for any word that is found on the topic &#8220;word-count-input&#8221;.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties </span> <span class="n">props</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">textLines</span> <span class="o">=</span> <span class="o">...;</span>
<pre class="line-numbers"><code class="language-java">Properties props = ...;
StreamsBuilder builder = ...;
KStream&lt;String, String&gt; textLines = ...;
<span class="c1">// Define the processing topology (here: WordCount)</span>
<span class="n">KGroupedStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-&gt;</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;\\W+&quot;</span><span class="o">)))</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">word</span><span class="o">,</span> <span class="n">Grouped</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
// Define the processing topology (here: WordCount)
KGroupedStream&lt;String, String&gt; groupedByWord = textLines
.flatMapValues(value -&gt; Arrays.asList(value.toLowerCase().split(&quot;\\W+&quot;)))
.groupBy((key, word) -&gt; word, Grouped.with(stringSerde, stringSerde));
<span class="c1">// Create a key-value store named &quot;CountsKeyValueStore&quot; for the all-time word counts</span>
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;CountsKeyValueStore&quot;</span><span class="o">));</span>
// Create a key-value store named &quot;CountsKeyValueStore&quot; for the all-time word counts
groupedByWord.count(Materialized.&lt;String, String, KeyValueStore&lt;Bytes, byte[]&gt;as(&quot;CountsKeyValueStore&quot;));
<span class="c1">// Start an instance of the topology</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">builder</span><span class="o">,</span> <span class="n">props</span><span class="o">);</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span></code></pre></div>
</div>
// Start an instance of the topology
KafkaStreams streams = new KafkaStreams(builder, props);
streams.start();</code></pre>
<p>After the application has started, you can get access to &#8220;CountsKeyValueStore&#8221; and then query it via the <a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/streams/src/main/java/org/apache/kafka/streams/state/ReadOnlyKeyValueStore.java">ReadOnlyKeyValueStore</a> API:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Get the key-value store CountsKeyValueStore</span>
<span class="n">ReadOnlyKeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">keyValueStore</span> <span class="o">=</span>
<span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">&quot;CountsKeyValueStore&quot;</span><span class="o">,</span> <span class="n">QueryableStoreTypes</span><span class="o">.</span><span class="na">keyValueStore</span><span class="o">());</span>
<span class="c1">// Get value by key</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;count for hello:&quot;</span> <span class="o">+</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">&quot;hello&quot;</span><span class="o">));</span>
<span class="c1">// Get the values for a range of keys available in this application instance</span>
<span class="n">KeyValueIterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">range</span> <span class="o">=</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">range</span><span class="o">(</span><span class="s">&quot;all&quot;</span><span class="o">,</span> <span class="s">&quot;streams&quot;</span><span class="o">);</span>
<span class="k">while</span> <span class="o">(</span><span class="n">range</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">range</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;count for &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span> <span class="o">+</span> <span class="s">&quot;: &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
<span class="o">}</span>
<span class="c1">// Get the values for all of the keys available in this application instance</span>
<span class="n">KeyValueIterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">range</span> <span class="o">=</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">all</span><span class="o">();</span>
<span class="k">while</span> <span class="o">(</span><span class="n">range</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">range</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;count for &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span> <span class="o">+</span> <span class="s">&quot;: &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
<span class="o">}</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">// Get the key-value store CountsKeyValueStore
ReadOnlyKeyValueStore&lt;String, Long&gt; keyValueStore =
streams.store(&quot;CountsKeyValueStore&quot;, QueryableStoreTypes.keyValueStore());
// Get value by key
System.out.println(&quot;count for hello:&quot; + keyValueStore.get(&quot;hello&quot;));
// Get the values for a range of keys available in this application instance
KeyValueIterator&lt;String, Long&gt; range = keyValueStore.range(&quot;all&quot;, &quot;streams&quot;);
while (range.hasNext()) {
KeyValue&lt;String, Long&gt; next = range.next();
System.out.println(&quot;count for &quot; + next.key + &quot;: &quot; + next.value);
}
// Get the values for all of the keys available in this application instance
KeyValueIterator&lt;String, Long&gt; range = keyValueStore.all();
while (range.hasNext()) {
KeyValue&lt;String, Long&gt; next = range.next();
System.out.println(&quot;count for &quot; + next.key + &quot;: &quot; + next.value);
}</code></pre>
<p>You can also materialize the results of stateless operators by using the overloaded methods that take a <code class="docutils literal"><span class="pre">queryableStoreName</span></code>
as shown in the example below:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KTable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">regionCounts</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// materialize the result of filtering corresponding to odd numbers</span>
<span class="c1">// the &quot;queryableStoreName&quot; can be subsequently queried.</span>
<span class="n">KTable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">oddCounts</span> <span class="o">=</span> <span class="n">numberLines</span><span class="o">.</span><span class="na">filter</span><span class="o">((</span><span class="n">region</span><span class="o">,</span> <span class="n">count</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="n">count</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">!=</span> <span class="mi">0</span><span class="o">),</span>
<span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;queryableStoreName&quot;</span><span class="o">));</span>
<span class="c1">// do not materialize the result of filtering corresponding to even numbers</span>
<span class="c1">// this means that these results will not be materialized and cannot be queried.</span>
<span class="n">KTable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">oddCounts</span> <span class="o">=</span> <span class="n">numberLines</span><span class="o">.</span><span class="na">filter</span><span class="o">((</span><span class="n">region</span><span class="o">,</span> <span class="n">count</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="n">count</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="o">));</span></code></pre></div>
</div>
</div>
<pre class="line-numbers"><code class="language-java">StreamsBuilder builder = ...;
KTable&lt;String, Integer&gt; regionCounts = ...;
// materialize the result of filtering corresponding to odd numbers
// the &quot;queryableStoreName&quot; can be subsequently queried.
KTable&lt;String, Integer&gt; oddCounts = numberLines.filter((region, count) -&gt; (count % 2 != 0),
Materialized.&lt;String, Integer, KeyValueStore&lt;Bytes, byte[]&gt;as(&quot;queryableStoreName&quot;));
// do not materialize the result of filtering corresponding to even numbers
// this means that these results will not be materialized and cannot be queried.
KTable&lt;String, Integer&gt; oddCounts = numberLines.filter((region, count) -&gt; (count % 2 == 0));</code></pre>
<div class="section" id="querying-local-window-stores">
<span id="streams-developer-guide-interactive-queries-local-window-stores"></span><h3><a class="toc-backref" href="#id5">Querying local window stores</a><a class="headerlink" href="#querying-local-window-stores" title="Permalink to this headline"></a></h3>
<p>A window store will potentially have many results for any given key because the key can be present in multiple windows.
However, there is only one result per window for a given key.</p>
<p>To query a local window store, you must first create a topology with a window store. This example creates a window store
named &#8220;CountsWindowStore&#8221; that contains the counts for words in 1-minute windows.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">textLines</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Define the processing topology (here: WordCount)</span>
<span class="n">KGroupedStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-&gt;</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;\\W+&quot;</span><span class="o">)))</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">word</span><span class="o">,</span> <span class="n">Grouped</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
<span class="c1">// Create a window state store named &quot;CountsWindowStore&quot; that contains the word counts for every minute</span>
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">windowedBy</span><span class="o">(</span><span class="n">TimeWindows</span><span class="o">.</span><span class="na">of</span><span class="o">(<span class="n">Duration</span><span class="o">.</span><span class="na">ofSeconds</span><span class="o">(</span><span class="mi">60</span><span class="o">)))</span>
<span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">WindowStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;CountsWindowStore&quot;</span><span class="o">));</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">StreamsBuilder builder = ...;
KStream&lt;String, String&gt; textLines = ...;
// Define the processing topology (here: WordCount)
KGroupedStream&lt;String, String&gt; groupedByWord = textLines
.flatMapValues(value -&gt; Arrays.asList(value.toLowerCase().split(&quot;\\W+&quot;)))
.groupBy((key, word) -&gt; word, Grouped.with(stringSerde, stringSerde));
// Create a window state store named &quot;CountsWindowStore&quot; that contains the word counts for every minute
groupedByWord.windowedBy(TimeWindows.of(Duration.ofSeconds(60)))
.count(Materialized.&lt;String, Long, WindowStore&lt;Bytes, byte[]&gt;as(&quot;CountsWindowStore&quot;));</code></pre>
<p>After the application has started, you can get access to &#8220;CountsWindowStore&#8221; and then query it via the <a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/streams/src/main/java/org/apache/kafka/streams/state/ReadOnlyWindowStore.java">ReadOnlyWindowStore</a> API:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Get the window store named &quot;CountsWindowStore&quot;</span>
<span class="n">ReadOnlyWindowStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">windowStore</span> <span class="o">=</span>
<span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">&quot;CountsWindowStore&quot;</span><span class="o">,</span> <span class="n">QueryableStoreTypes</span><span class="o">.</span><span class="na">windowStore</span><span class="o">());</span>
<span class="c1">// Fetch values for the key &quot;world&quot; for all of the windows available in this application instance.</span>
<span class="c1">// To get *all* available windows we fetch windows from the beginning of time until now.</span>
<span class="kt">Instant</span> <span class="n">timeFrom</span> <span class="o">=</span> <span class="na">Instant</span><span class="o">.</span><span class="na">ofEpochMilli<span class="o">(</span><span class="mi">0</span><span class="o">);</span> <span class="c1">// beginning of time = oldest available</span>
<span class="kt">Instant</span> <span class="n">timeTo</span> <span class="o">=</span> <span class="n">Instant</span><span class="o">.</span><span class="na">now</span><span class="o">();</span> <span class="c1">// now (in processing-time)</span>
<span class="n">WindowStoreIterator</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">iterator</span> <span class="o">=</span> <span class="n">windowStore</span><span class="o">.</span><span class="na">fetch</span><span class="o">(</span><span class="s">&quot;world&quot;</span><span class="o">,</span> <span class="n">timeFrom</span><span class="o">,</span> <span class="n">timeTo</span><span class="o">);</span>
<span class="k">while</span> <span class="o">(</span><span class="n">iterator</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">iterator</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="kt">long</span> <span class="n">windowTimestamp</span> <span class="o">=</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span><span class="o">;</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Count of &#39;world&#39; @ time &quot;</span> <span class="o">+</span> <span class="n">windowTimestamp</span> <span class="o">+</span> <span class="s">&quot; is &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
<span class="o">}</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">// Get the window store named &quot;CountsWindowStore&quot;
ReadOnlyWindowStore&lt;String, Long&gt; windowStore =
streams.store(&quot;CountsWindowStore&quot;, QueryableStoreTypes.windowStore());
// Fetch values for the key &quot;world&quot; for all of the windows available in this application instance.
// To get *all* available windows we fetch windows from the beginning of time until now.
Instant timeFrom = Instant.ofEpochMilli(0); // beginning of time = oldest available
Instant timeTo = Instant.now(); // now (in processing-time)
WindowStoreIterator&lt;Long&gt; iterator = windowStore.fetch(&quot;world&quot;, timeFrom, timeTo);
while (iterator.hasNext()) {
KeyValue&lt;Long, Long&gt; next = iterator.next();
long windowTimestamp = next.key;
System.out.println(&quot;Count of &#39;world&#39; @ time &quot; + windowTimestamp + &quot; is &quot; + next.value);
}</code></pre>
</div>
<div class="section" id="querying-local-custom-state-stores">
<span id="streams-developer-guide-interactive-queries-custom-stores"></span><h3><a class="toc-backref" href="#id6">Querying local custom state stores</a><a class="headerlink" href="#querying-local-custom-state-stores" title="Permalink to this headline"></a></h3>
@ -233,43 +225,41 @@ @@ -233,43 +225,41 @@
<li>It is recommended that you provide an interface that restricts access to read-only operations. This prevents users of this API from mutating the state of your running Kafka Streams application out-of-band.</li>
</ul>
<p>The class/interface hierarchy for your custom store might look something like:</p>
<div class="highlight-java"><div class="highlight"><pre><code><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">implements</span> <span class="n">StateStore</span><span class="o">,</span> <span class="n">MyWriteableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="c1">// implementation of the actual store</span>
<span class="o">}</span>
<span class="c1">// Read-write interface for MyCustomStore</span>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyWriteableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">extends</span> <span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kt">void</span> <span class="nf">write</span><span class="o">(</span><span class="n">K</span> <span class="n">Key</span><span class="o">,</span> <span class="n">V</span> <span class="n">value</span><span class="o">);</span>
<span class="o">}</span>
<span class="c1">// Read-only interface for MyCustomStore</span>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="n">V</span> <span class="nf">read</span><span class="o">(</span><span class="n">K</span> <span class="n">key</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreBuilder</span> <span class="kd">implements</span> <span class="n">StoreBuilder</span> <span class="o">{</span>
<span class="c1">// implementation of the supplier for MyCustomStore</span>
<span class="o">}</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">public class MyCustomStore&lt;K,V&gt; implements StateStore, MyWriteableCustomStore&lt;K,V&gt; {
// implementation of the actual store
}
// Read-write interface for MyCustomStore
public interface MyWriteableCustomStore&lt;K,V&gt; extends MyReadableCustomStore&lt;K,V&gt; {
void write(K Key, V value);
}
// Read-only interface for MyCustomStore
public interface MyReadableCustomStore&lt;K,V&gt; {
V read(K key);
}
public class MyCustomStoreBuilder implements StoreBuilder {
// implementation of the supplier for MyCustomStore
}</code></pre>
<p>To make this store queryable you must:</p>
<ul class="simple">
<li>Provide an implementation of <a class="reference external" href="https://github.com/apache/kafka/blob/{{dotVersion}}/streams/src/main/java/org/apache/kafka/streams/state/QueryableStoreType.java">QueryableStoreType</a>.</li>
<li>Provide a wrapper class that has access to all of the underlying instances of the store and is used for querying.</li>
</ul>
<p>Here is how to implement <code class="docutils literal"><span class="pre">QueryableStoreType</span></code>:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreType</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">implements</span> <span class="n">QueryableStoreType</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;&gt;</span> <span class="o">{</span>
<pre class="line-numbers"><code class="language-java">public class MyCustomStoreType&lt;K,V&gt; implements QueryableStoreType&lt;MyReadableCustomStore&lt;K,V&gt;&gt; {
<span class="c1">// Only accept StateStores that are of type MyCustomStore</span>
<span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">accepts</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStore</span> <span class="n">stateStore</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">stateStore</span> <span class="n">instanceOf</span> <span class="n">MyCustomStore</span><span class="o">;</span>
<span class="o">}</span>
// Only accept StateStores that are of type MyCustomStore
public boolean accepts(final StateStore stateStore) {
return stateStore instanceOf MyCustomStore;
}
<span class="kd">public</span> <span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="nf">create</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">storeProvider</span><span class="o">,</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="n">MyCustomStoreTypeWrapper</span><span class="o">(</span><span class="n">storeProvider</span><span class="o">,</span> <span class="n">storeName</span><span class="o">,</span> <span class="k">this</span><span class="o">);</span>
<span class="o">}</span>
public MyReadableCustomStore&lt;K,V&gt; create(final StateStoreProvider storeProvider, final String storeName) {
return new MyCustomStoreTypeWrapper(storeProvider, storeName, this);
}
<span class="o">}</span></code></pre></div>
</div>
}</code></pre>
<p>A wrapper class is required because each instance of a Kafka Streams application may run multiple stream tasks and manage
multiple local instances of a particular state store. The wrapper class hides this complexity and lets you query a &#8220;logical&#8221;
state store by name without having to know about all of the underlying local instances of that state store.</p>
@ -279,56 +269,53 @@ @@ -279,56 +269,53 @@
<code class="docutils literal"><span class="pre">StateStoreProvider#stores(String</span> <span class="pre">storeName,</span> <span class="pre">QueryableStoreType&lt;T&gt;</span> <span class="pre">queryableStoreType)</span></code> returns a <code class="docutils literal"><span class="pre">List</span></code> of state
stores with the given storeName and of the type as defined by <code class="docutils literal"><span class="pre">queryableStoreType</span></code>.</p>
<p>Here is an example implementation of the wrapper follows (Java 8+):</p>
<div class="highlight-java"><div class="highlight"><pre><code><span></span><span class="c1">// We strongly recommended implementing a read-only interface</span>
<span class="c1">// to restrict usage of the store to safe read operations!</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreTypeWrapper</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">implements</span> <span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">QueryableStoreType</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">&gt;&gt;</span> <span class="n">customStoreType</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">provider</span><span class="o">;</span>
<span class="kd">public</span> <span class="nf">CustomStoreTypeWrapper</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">provider</span><span class="o">,</span>
<span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span>
<span class="kd">final</span> <span class="n">QueryableStoreType</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">&gt;&gt;</span> <span class="n">customStoreType</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// ... assign fields ...</span>
<span class="o">}</span>
<span class="c1">// Implement a safe read method</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="n">V</span> <span class="nf">read</span><span class="o">(</span><span class="kd">final</span> <span class="n">K</span> <span class="n">key</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// Get all the stores with storeName and of customStoreType</span>
<span class="kd">final</span> <span class="n">List</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">&gt;&gt;</span> <span class="n">stores</span> <span class="o">=</span> <span class="n">provider</span><span class="o">.</span><span class="na">getStores</span><span class="o">(</span><span class="n">storeName</span><span class="o">,</span> <span class="n">customStoreType</span><span class="o">);</span>
<span class="c1">// Try and find the value for the given key</span>
<span class="kd">final</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">V</span><span class="o">&gt;</span> <span class="n">value</span> <span class="o">=</span> <span class="n">stores</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">filter</span><span class="o">(</span><span class="n">store</span> <span class="o">-&gt;</span> <span class="n">store</span><span class="o">.</span><span class="na">read</span><span class="o">(</span><span class="n">key</span><span class="o">)</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">).</span><span class="na">findFirst</span><span class="o">();</span>
<span class="c1">// Return the value if it exists</span>
<span class="k">return</span> <span class="n">value</span><span class="o">.</span><span class="na">orElse</span><span class="o">(</span><span class="kc">null</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">// We strongly recommended implementing a read-only interface
// to restrict usage of the store to safe read operations!
public class MyCustomStoreTypeWrapper&lt;K,V&gt; implements MyReadableCustomStore&lt;K,V&gt; {
private final QueryableStoreType&lt;MyReadableCustomStore&lt;K, V&gt;&gt; customStoreType;
private final String storeName;
private final StateStoreProvider provider;
public CustomStoreTypeWrapper(final StateStoreProvider provider,
final String storeName,
final QueryableStoreType&lt;MyReadableCustomStore&lt;K, V&gt;&gt; customStoreType) {
// ... assign fields ...
}
// Implement a safe read method
@Override
public V read(final K key) {
// Get all the stores with storeName and of customStoreType
final List&lt;MyReadableCustomStore&lt;K, V&gt;&gt; stores = provider.getStores(storeName, customStoreType);
// Try and find the value for the given key
final Optional&lt;V&gt; value = stores.stream().filter(store -&gt; store.read(key) != null).findFirst();
// Return the value if it exists
return value.orElse(null);
}
}</code></pre>
<p>You can now find and query your custom store:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">ProcessorSupplier</span> <span class="n">processorSuppler</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Create CustomStoreSupplier for store name the-custom-store</span>
<span class="n">MyCustomStoreBuilder</span> <span class="n">customStoreBuilder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MyCustomStoreBuilder</span><span class="o">(</span><span class="s">&quot;the-custom-store&quot;</span><span class="o">)</span> <span class="c1">//...;</span>
<span class="c1">// Add the source topic</span>
<span class="n">topology</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="s">&quot;input&quot;</span><span class="o">,</span> <span class="s">&quot;inputTopic&quot;</span><span class="o">);</span>
<span class="c1">// Add a custom processor that reads from the source topic</span>
<span class="n">topology</span><span class="o">.</span><span class="na">addProcessor</span><span class="o">(</span><span class="s">&quot;the-processor&quot;</span><span class="o">,</span> <span class="n">processorSupplier</span><span class="o">,</span> <span class="s">&quot;input&quot;</span><span class="o">);</span>
<span class="c1">// Connect your custom state store to the custom processor above</span>
<span class="n">topology</span><span class="o">.</span><span class="na">addStateStore</span><span class="o">(</span><span class="n">customStoreBuilder</span><span class="o">,</span> <span class="s">&quot;the-processor&quot;</span><span class="o">);</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">topology</span><span class="o">,</span> <span class="n">config</span><span class="o">);</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
<span class="c1">// Get access to the custom store</span>
<span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">store</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">&quot;the-custom-store&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="n">MyCustomStoreType</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">&gt;());</span>
<span class="c1">// Query the store</span>
<span class="n">String</span> <span class="n">value</span> <span class="o">=</span> <span class="n">store</span><span class="o">.</span><span class="na">read</span><span class="o">(</span><span class="s">&quot;key&quot;</span><span class="o">);</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">Topology topology = ...;
ProcessorSupplier processorSuppler = ...;
// Create CustomStoreSupplier for store name the-custom-store
MyCustomStoreBuilder customStoreBuilder = new MyCustomStoreBuilder(&quot;the-custom-store&quot;) //...;
// Add the source topic
topology.addSource(&quot;input&quot;, &quot;inputTopic&quot;);
// Add a custom processor that reads from the source topic
topology.addProcessor(&quot;the-processor&quot;, processorSupplier, &quot;input&quot;);
// Connect your custom state store to the custom processor above
topology.addStateStore(customStoreBuilder, &quot;the-processor&quot;);
KafkaStreams streams = new KafkaStreams(topology, config);
streams.start();
// Get access to the custom store
MyReadableCustomStore&lt;String,String&gt; store = streams.store(&quot;the-custom-store&quot;, new MyCustomStoreType&lt;String,String&gt;());
// Query the store
String value = store.read(&quot;key&quot;);</code></pre>
</div>
</div>
<div class="section" id="querying-remote-state-stores-for-the-entire-app">
@ -369,41 +356,39 @@ interactive queries</span></p> @@ -369,41 +356,39 @@ interactive queries</span></p>
piggybacking additional inter-application communication that goes beyond interactive queries.</p>
</div>
<p>This example shows how to configure and run a Kafka Streams application that supports the discovery of its state stores.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Set the unique RPC endpoint of this application instance through which it</span>
<span class="c1">// can be interactively queried. In a real application, the value would most</span>
<span class="c1">// probably not be hardcoded but derived dynamically.</span>
<span class="n">String</span> <span class="n">rpcEndpoint</span> <span class="o">=</span> <span class="s">&quot;host1:4460&quot;</span><span class="o">;</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_SERVER_CONFIG</span><span class="o">,</span> <span class="n">rpcEndpoint</span><span class="o">);</span>
<span class="c1">// ... further settings may follow here ...</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamsBuilder</span><span class="o">();</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">textLines</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">stream</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">,</span> <span class="s">&quot;word-count-input&quot;</span><span class="o">);</span>
<span class="kd">final</span> <span class="n">KGroupedStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-&gt;</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;\\W+&quot;</span><span class="o">)))</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">word</span><span class="o">,</span> <span class="n">Grouped</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
<span class="c1">// This call to `count()` creates a state store named &quot;word-count&quot;.</span>
<span class="c1">// The state store is discoverable and can be queried interactively.</span>
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">));</span>
<span class="c1">// Start an instance of the topology</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">builder</span><span class="o">,</span> <span class="n">props</span><span class="o">);</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
<span class="c1">// Then, create and start the actual RPC service for remote access to this</span>
<span class="c1">// application instance&#39;s local state stores.</span>
<span class="c1">//</span>
<span class="c1">// This service should be started on the same host and port as defined above by</span>
<span class="c1">// the property `StreamsConfig.APPLICATION_SERVER_CONFIG`. The example below is</span>
<span class="c1">// fictitious, but we provide end-to-end demo applications (such as KafkaMusicExample)</span>
<span class="c1">// that showcase how to implement such a service to get you started.</span>
<span class="n">MyRPCService</span> <span class="n">rpcService</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">rpcService</span><span class="o">.</span><span class="na">listenAt</span><span class="o">(</span><span class="n">rpcEndpoint</span><span class="o">);</span></code></pre></div>
</div>
</div>
<pre class="line-numbers"><code class="language-java">Properties props = new Properties();
// Set the unique RPC endpoint of this application instance through which it
// can be interactively queried. In a real application, the value would most
// probably not be hardcoded but derived dynamically.
String rpcEndpoint = &quot;host1:4460&quot;;
props.put(StreamsConfig.APPLICATION_SERVER_CONFIG, rpcEndpoint);
// ... further settings may follow here ...
StreamsBuilder builder = new StreamsBuilder();
KStream&lt;String, String&gt; textLines = builder.stream(stringSerde, stringSerde, &quot;word-count-input&quot;);
final KGroupedStream&lt;String, String&gt; groupedByWord = textLines
.flatMapValues(value -&gt; Arrays.asList(value.toLowerCase().split(&quot;\\W+&quot;)))
.groupBy((key, word) -&gt; word, Grouped.with(stringSerde, stringSerde));
// This call to `count()` creates a state store named &quot;word-count&quot;.
// The state store is discoverable and can be queried interactively.
groupedByWord.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;as(&quot;word-count&quot;));
// Start an instance of the topology
KafkaStreams streams = new KafkaStreams(builder, props);
streams.start();
// Then, create and start the actual RPC service for remote access to this
// application instance&#39;s local state stores.
//
// This service should be started on the same host and port as defined above by
// the property `StreamsConfig.APPLICATION_SERVER_CONFIG`. The example below is
// fictitious, but we provide end-to-end demo applications (such as KafkaMusicExample)
// that showcase how to implement such a service to get you started.
MyRPCService rpcService = ...;
rpcService.listenAt(rpcEndpoint);</code></pre>
<div class="section" id="discovering-and-accessing-application-instances-and-their-local-state-stores">
<span id="streams-developer-guide-interactive-queries-discover-app-instances-and-stores"></span><h3><a class="toc-backref" href="#id10">Discovering and accessing application instances and their local state stores</a><a class="headerlink" href="#discovering-and-accessing-application-instances-and-their-local-state-stores" title="Permalink to this headline"></a></h3>
<p>The following methods return <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a> objects, which provide meta-information about application instances such as their RPC endpoint and locally available state stores.</p>
@ -419,39 +404,38 @@ interactive queries</span></p> @@ -419,39 +404,38 @@ interactive queries</span></p>
</div>
<p>For example, we can now find the <code class="docutils literal"><span class="pre">StreamsMetadata</span></code> for the state store named &#8220;word-count&#8221; that we defined in the
code example shown in the previous section:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Find all the locations of local instances of the state store named &quot;word-count&quot;</span>
<span class="n">Collection</span><span class="o">&lt;</span><span class="n">StreamsMetadata</span><span class="o">&gt;</span> <span class="n">wordCountHosts</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">allMetadataForStore</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">);</span>
<span class="c1">// For illustrative purposes, we assume using an HTTP client to talk to remote app instances.</span>
<span class="n">HttpClient</span> <span class="n">http</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Get the word count for word (aka key) &#39;alice&#39;: Approach 1</span>
<span class="c1">//</span>
<span class="c1">// We first find the one app instance that manages the count for &#39;alice&#39; in its local state stores.</span>
<span class="n">StreamsMetadata</span> <span class="n">metadata</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">metadataForKey</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">,</span> <span class="s">&quot;alice&quot;</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">().</span><span class="na">serializer</span><span class="o">());</span>
<span class="c1">// Then, we query only that single app instance for the latest count of &#39;alice&#39;.</span>
<span class="c1">// Note: The RPC URL shown below is fictitious and only serves to illustrate the idea. Ultimately,</span>
<span class="c1">// the URL (or, in general, the method of communication) will depend on the RPC layer you opted to</span>
<span class="c1">// implement. Again, we provide end-to-end demo applications (such as KafkaMusicExample) that showcase</span>
<span class="c1">// how to implement such an RPC layer.</span>
<span class="n">Long</span> <span class="n">result</span> <span class="o">=</span> <span class="n">http</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="s">&quot;http://&quot;</span> <span class="o">+</span> <span class="n">metadata</span><span class="o">.</span><span class="na">host</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;:&quot;</span> <span class="o">+</span> <span class="n">metadata</span><span class="o">.</span><span class="na">port</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;/word-count/alice&quot;</span><span class="o">);</span>
<span class="c1">// Get the word count for word (aka key) &#39;alice&#39;: Approach 2</span>
<span class="c1">//</span>
<span class="c1">// Alternatively, we could also choose (say) a brute-force approach where we query every app instance</span>
<span class="c1">// until we find the one that happens to know about &#39;alice&#39;.</span>
<span class="n">Optional</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">result</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">allMetadataForStore</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">)</span>
<span class="o">.</span><span class="na">stream</span><span class="o">()</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">streamsMetadata</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="c1">// Construct the (fictituous) full endpoint URL to query the current remote application instance</span>
<span class="n">String</span> <span class="n">url</span> <span class="o">=</span> <span class="s">&quot;http://&quot;</span> <span class="o">+</span> <span class="n">streamsMetadata</span><span class="o">.</span><span class="na">host</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;:&quot;</span> <span class="o">+</span> <span class="n">streamsMetadata</span><span class="o">.</span><span class="na">port</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;/word-count/alice&quot;</span><span class="o">;</span>
<span class="c1">// Read and return the count for &#39;alice&#39;, if any.</span>
<span class="k">return</span> <span class="n">http</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="n">url</span><span class="o">);</span>
<span class="o">})</span>
<span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="n">s</span> <span class="o">-&gt;</span> <span class="n">s</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span>
<span class="o">.</span><span class="na">findFirst</span><span class="o">();</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">KafkaStreams streams = ...;
// Find all the locations of local instances of the state store named &quot;word-count&quot;
Collection&lt;StreamsMetadata&gt; wordCountHosts = streams.allMetadataForStore(&quot;word-count&quot;);
// For illustrative purposes, we assume using an HTTP client to talk to remote app instances.
HttpClient http = ...;
// Get the word count for word (aka key) &#39;alice&#39;: Approach 1
//
// We first find the one app instance that manages the count for &#39;alice&#39; in its local state stores.
StreamsMetadata metadata = streams.metadataForKey(&quot;word-count&quot;, &quot;alice&quot;, Serdes.String().serializer());
// Then, we query only that single app instance for the latest count of &#39;alice&#39;.
// Note: The RPC URL shown below is fictitious and only serves to illustrate the idea. Ultimately,
// the URL (or, in general, the method of communication) will depend on the RPC layer you opted to
// implement. Again, we provide end-to-end demo applications (such as KafkaMusicExample) that showcase
// how to implement such an RPC layer.
Long result = http.getLong(&quot;http://&quot; + metadata.host() + &quot;:&quot; + metadata.port() + &quot;/word-count/alice&quot;);
// Get the word count for word (aka key) &#39;alice&#39;: Approach 2
//
// Alternatively, we could also choose (say) a brute-force approach where we query every app instance
// until we find the one that happens to know about &#39;alice&#39;.
Optional&lt;Long&gt; result = streams.allMetadataForStore(&quot;word-count&quot;)
.stream()
.map(streamsMetadata -&gt; {
// Construct the (fictituous) full endpoint URL to query the current remote application instance
String url = &quot;http://&quot; + streamsMetadata.host() + &quot;:&quot; + streamsMetadata.port() + &quot;/word-count/alice&quot;;
// Read and return the count for &#39;alice&#39;, if any.
return http.getLong(url);
})
.filter(s -&gt; s != null)
.findFirst();</code></pre>
<p>At this point the full state of the application is interactively queryable:</p>
<ul class="simple">
<li>You can discover the running instances of the application and the state stores they manage locally.</li>

99
docs/streams/developer-guide/memory-mgmt.html

@ -80,10 +80,9 @@ @@ -80,10 +80,9 @@
</ul>
<p>The cache size is specified through the <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> parameter, which is a global setting per
processing topology:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Enable record cache of size 10 MB.</span>
<span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">// Enable record cache of size 10 MB.
Properties props = new Properties();
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 10 * 1024 * 1024L);</code></pre>
<p>This parameter controls the number of bytes allocated for caching. Specifically, for a processor topology instance with
<code class="docutils literal"><span class="pre">T</span></code> threads and <code class="docutils literal"><span class="pre">C</span></code> bytes allocated for caching, each thread will have an even <code class="docutils literal"><span class="pre">C/T</span></code> bytes to construct its own
cache and use as it sees fit among its tasks. This means that there are as many caches as there are threads, but no sharing of
@ -103,27 +102,16 @@ @@ -103,27 +102,16 @@
<p>Here are example settings for both parameters based on desired scenarios.</p>
<ul>
<li><p class="first">To turn off caching the cache size can be set to zero:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Disable record cache</span>
<span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">0</span><span class="o">);</span></code></pre></div>
</div>
<p>Turning off caching might result in high write traffic for the underlying RocksDB store.
With default settings caching is enabled within Kafka Streams but RocksDB caching is disabled.
Thus, to avoid high write traffic it is recommended to enable RocksDB caching if Kafka Streams caching is turned off.</p>
<p>For example, the RocksDB Block Cache could be set to 100MB and Write Buffer size to 32 MB. For more information, see
the <a class="reference internal" href="config-streams.html#streams-developer-guide-rocksdb-config"><span class="std std-ref">RocksDB config</span></a>.</p>
</div></blockquote>
<pre class="line-numbers"><code class="language-java">// Disable record cache
Properties props = new Properties();
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);</code></pre>
</li>
<li><p class="first">To enable caching but still have an upper bound on how long records will be cached, you can set the commit interval. In this example, it is set to 1000 milliseconds:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Enable record cache of size 10 MB.</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
<span class="c1">// Set commit interval to 1 second.</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">COMMIT_INTERVAL_MS_CONFIG</span><span class="o">,</span> <span class="mi">1000</span><span class="o">);</span></code></pre></div>
</div>
</div></blockquote>
<pre class="line-numbers"><code class="language-java">Properties props = new Properties();
// Enable record cache of size 10 MB.
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 10 * 1024 * 1024L);
// Set commit interval to 1 second.
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);</code></pre>
</li>
</ul>
<p>The effect of these two configurations is described in the figure below. The records are shown using 4 keys: blue, red, yellow, and green. Assume the cache has space for only 3 keys.</p>
@ -156,13 +144,12 @@ @@ -156,13 +144,12 @@
<p>Following from the example first shown in section <a class="reference internal" href="processor-api.html#streams-developer-guide-state-store"><span class="std std-ref">State Stores</span></a>, to disable caching, you can
add the <code class="docutils literal"><span class="pre">withCachingDisabled</span></code> call (note that caches are enabled by default, however there is an explicit <code class="docutils literal"><span class="pre">withCachingEnabled</span></code>
call).</p>
<div class="highlight-java"><div class="highlight"><pre><code><span></span><span class="n">StoreBuilder</span> <span class="n">countStoreBuilder</span> <span class="o">=</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">withCachingEnabled</span><span class="o">()</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">StoreBuilder countStoreBuilder =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("Counts"),
Serdes.String(),
Serdes.Long())
.withCachingEnabled();</code></pre>
</div>
<div class="section" id="rocksdb">
<h2><a class="toc-backref" href="#id3">RocksDB</a><a class="headerlink" href="#rocksdb" title="Permalink to this headline"></a></h2>
@ -171,44 +158,42 @@ @@ -171,44 +158,42 @@
<code class="docutils literal"><span class="pre">rocksdb.config.setter</span></code> configuration.</p>
<p>Also, we recommend changing RocksDB's default memory allocator, because the default allocator may lead to increased memory consumption.
To change the memory allocator to <code>jemalloc</code>, you need to set the environment variable <code>LD_PRELOAD</code>before you start your Kafka Streams application:</p>
<pre>
# example: install jemalloc (on Debian)
<pre class="line-numbers"><code class="language-bash"># example: install jemalloc (on Debian)
$ apt install -y libjemalloc-dev
# set LD_PRELOAD before you start your Kafka Streams application
$ export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libjemalloc.so”
</pre>
</code></pre>
<p> As of 2.3.0 the memory usage across all instances can be bounded, limiting the total off-heap memory of your Kafka Streams application. To do so you must configure RocksDB to cache the index and filter blocks in the block cache, limit the memtable memory through a shared <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Write-Buffer-Manager">WriteBufferManager</a> and count its memory against the block cache, and then pass the same Cache object to each instance. See <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB">RocksDB Memory Usage</a> for details. An example RocksDBConfigSetter implementing this is shown below:</p>
<pre class="line-numbers"><code class="language-java">public static class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {
<div class="highlight-java"><div class="highlight"><pre><span></span> <span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">BoundedMemoryRocksDBConfig</span> <span class="kd">implements</span> <span class="n">RocksDBConfigSetter</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kt">static</span> <span class="n">org.rocksdb.Cache</span> <span class="n">cache</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">LRUCache</span><span class="o">(</span><span class="mi">TOTAL_OFF_HEAP_MEMORY</span><span class="o">,</span> <span class="n">-1</span><span class="o">,</span> <span class="n">false</span><span class="o">,</span> <span class="n">INDEX_FILTER_BLOCK_RATIO</span><span class="o">);</span><sup><a href="#fn1" id="ref1">1</a></sup>
<span class="kd">private</span> <span class="kt">static</span> <span class="n">org.rocksdb.WriteBufferManager</span> <span class="n">writeBufferManager</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">WriteBufferManager</span><span class="o">(</span><span class="mi">TOTAL_MEMTABLE_MEMORY</span><span class="o">,</span> cache<span class="o">);</span>
private static org.rocksdb.Cache cache = new org.rocksdb.LRUCache(TOTAL_OFF_HEAP_MEMORY, -1, false, INDEX_FILTER_BLOCK_RATIO);</code><a href="#fn1" id="ref1"><sup>1</sup></a><code>
private static org.rocksdb.WriteBufferManager writeBufferManager = new org.rocksdb.WriteBufferManager(TOTAL_MEMTABLE_MEMORY, cache);
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setConfig</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">configs</span><span class="o">)</span> <span class="o">{</span>
@Override
public void setConfig(final String storeName, final Options options, final Map&lt;String, Object&gt; configs) {
<span class="n">BlockBasedTableConfig</span> <span class="n">tableConfig</span> <span class="o">=</span> <span class="k">(BlockBasedTableConfig)</span> <span class="n">options</span><span><span class="o">.</span><span class="na">tableFormatConfig</span><span class="o">();</span>
BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();
<span class="c1"> // These three options in combination will limit the memory used by RocksDB to the size passed to the block cache (TOTAL_OFF_HEAP_MEMORY)</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockCache</span><span class="o">(</span><span class="mi">cache</span><span class="o">);</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocks</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
<span class="n">options</span><span class="o">.</span><span class="na">setWriteBufferManager</span><span class="o">(</span><span class="mi">writeBufferManager</span><span class="o">);</span>
// These three options in combination will limit the memory used by RocksDB to the size passed to the block cache (TOTAL_OFF_HEAP_MEMORY)
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
<span class="c1"> // These options are recommended to be set when bounding the total memory</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocksWithHighPriority</span><span class="o">(</span><span class="mi">true</span><span class="o">);</span><sup><a href="#fn2" id="ref2">2</a></sup>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setPinTopLevelIndexAndFilter</span><span class="o">(</span><span class="mi">true</span><span class="o">);</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockSize</span><span class="o">(</span><span class="mi">BLOCK_SIZE</span><span class="o">);</span><sup><a href="#fn3" id="ref3">3</a></sup>
<span class="n">options</span><span class="o">.</span><span class="na">setMaxWriteBufferNumber</span><span class="o">(</span><span class="mi">N_MEMTABLES</span><span class="o">);</span>
<span class="n">options</span><span class="o">.</span><span class="na">setWriteBufferSize</span><span class="o">(</span><span class="mi">MEMTABLE_SIZE</span><span class="o">);</span>
// These options are recommended to be set when bounding the total memory
tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true);</code><a href="#fn2" id="ref2"><sup>2</sup></a><code>
tableConfig.setPinTopLevelIndexAndFilter(true);
tableConfig.setBlockSize(BLOCK_SIZE);</code><a href="#fn3" id="ref3"><sup>3</sup></a><code>
options.setMaxWriteBufferNumber(N_MEMTABLES);
options.setWriteBufferSize(MEMTABLE_SIZE);
<span class="n">options</span><span class="o">.</span><span class="na">setTableFormatConfig</span><span class="o">(</span><span class="n">tableConfig</span><span class="o">);</span>
<span class="o">}</span>
options.setTableFormatConfig(tableConfig);
}
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// Cache and WriteBufferManager should not be closed here, as the same objects are shared by every store instance.</span>
<span class="o">}</span>
<span class="o">}</span>
@Override
public void close(final String storeName, final Options options) {
// Cache and WriteBufferManager should not be closed here, as the same objects are shared by every store instance.
}
}</code></pre>
<div>
<sup id="fn1">1. INDEX_FILTER_BLOCK_RATIO can be used to set a fraction of the block cache to set aside for "high priority" (aka index and filter) blocks, preventing them from being evicted by data blocks. See the full signature of the <a class="reference external" href="https://github.com/facebook/rocksdb/blob/master/java/src/main/java/org/rocksdb/LRUCache.java#L72">LRUCache constructor</a>.
NOTE: the boolean parameter in the cache constructor lets you control whether the cache should enforce a strict memory limit by failing the read or iteration in the rare cases where it might go larger than its capacity. Due to a

208
docs/streams/developer-guide/processor-api.html

@ -119,47 +119,46 @@ @@ -119,47 +119,46 @@
<li>In the <code class="docutils literal"><span class="pre">process()</span></code> method, upon each received record, split the value string into words, and update their counts into the state store (we will talk about this later in this section).</li>
<li>In the <code class="docutils literal"><span class="pre">punctuate()</span></code> method, iterate the local state store and send the aggregated counts to the downstream processor (we will talk about downstream processors later in this section), and commit the current stream state.</li>
</ul>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">WordCountProcessor</span> <span class="kd">implements</span> <span class="n">Processor</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="o">{</span>
<pre class="line-numbers"><code class="language-java">public class WordCountProcessor implements Processor&lt;String, String&gt; {
<span class="kd">private</span> <span class="n">ProcessorContext</span> <span class="n">context</span><span class="o">;</span>
<span class="kd">private</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">kvStore</span><span class="o">;</span>
private ProcessorContext context;
private KeyValueStore&lt;String, Long&gt; kvStore;
<span class="nd">@Override</span>
<span class="nd">@SuppressWarnings</span><span class="o">(</span><span class="s">&quot;unchecked&quot;</span><span class="o">)</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">init</span><span class="o">(</span><span class="n">ProcessorContext</span> <span class="n">context</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// keep the processor context locally because we need it in punctuate() and commit()</span>
<span class="k">this</span><span class="o">.</span><span class="na">context</span> <span class="o">=</span> <span class="n">context</span><span class="o">;</span>
@Override
@SuppressWarnings(&quot;unchecked&quot;)
public void init(ProcessorContext context) {
// keep the processor context locally because we need it in punctuate() and commit()
this.context = context;
<span class="c1">// retrieve the key-value store named &quot;Counts&quot;</span>
<span class="n">kvStore</span> <span class="o">=</span> <span class="o">(</span><span class="n">KeyValueStore</span><span class="o">)</span> <span class="n">context</span><span class="o">.</span><span class="na">getStateStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">);</span>
// retrieve the key-value store named &quot;Counts&quot;
kvStore = (KeyValueStore) context.getStateStore(&quot;Counts&quot;);
<span class="c1">// schedule a punctuate() method every second based on stream-time</span>
<span class="k">this</span><span class="o">.</span><span class="na">context</span><span class="o">.</span><span class="na">schedule</span><span class="o">(</span><span class="na">Duration</span><span class="o">.</span><span class="na">ofSeconds</span><span class="o">(</span><span class="mi">1000</span><span class="o">),</span> <span class="n">PunctuationType</span><span class="o">.</span><span class="na">STREAM_TIME</span><span class="o">,</span> <span class="o">(</span><span class="n">timestamp</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="n">KeyValueIterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">iter</span> <span class="o">=</span> <span class="k">this</span><span class="o">.</span><span class="na">kvStore</span><span class="o">.</span><span class="na">all</span><span class="o">();</span>
<span class="k">while</span> <span class="o">(</span><span class="n">iter</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">entry</span> <span class="o">=</span> <span class="n">iter</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="n">context</span><span class="o">.</span><span class="na">forward</span><span class="o">(</span><span class="n">entry</span><span class="o">.</span><span class="na">key</span><span class="o">,</span> <span class="n">entry</span><span class="o">.</span><span class="na">value</span><span class="o">.</span><span class="na">toString</span><span class="o">());</span>
<span class="o">}</span>
<span class="n">iter</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
// schedule a punctuate() method every second based on stream-time
this.context.schedule(Duration.ofSeconds(1000), PunctuationType.STREAM_TIME, (timestamp) -&gt; {
KeyValueIterator&lt;String, Long&gt; iter = this.kvStore.all();
while (iter.hasNext()) {
KeyValue&lt;String, Long&gt; entry = iter.next();
context.forward(entry.key, entry.value.toString());
}
iter.close();
<span class="c1">// commit the current processing progress</span>
<span class="n">context</span><span class="o">.</span><span class="na">commit</span><span class="o">();</span>
<span class="o">});</span>
<span class="o">}</span>
// commit the current processing progress
context.commit();
});
}
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">punctuate</span><span class="o">(</span><span class="kt">long</span> <span class="n">timestamp</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// this method is deprecated and should not be used anymore</span>
<span class="o">}</span>
@Override
public void punctuate(long timestamp) {
// this method is deprecated and should not be used anymore
}
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">()</span> <span class="o">{</span>
<span class="c1">// close any resources managed by this processor</span>
<span class="c1">// Note: Do not close any StateStores as these are managed by the library</span>
<span class="o">}</span>
@Override
public void close() {
// close any resources managed by this processor
// Note: Do not close any StateStores as these are managed by the library
}
<span class="o">}</span></code></pre></div>
</div>
}</code></pre>
<div class="admonition note">
<p><b>Note</b></p>
<p class="last"><strong>Stateful processing with state stores:</strong>
@ -234,19 +233,18 @@ @@ -234,19 +233,18 @@
<li>Use <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/Stores.html#persistentTimestampedWindowStore-java.lang.String-java.time.Duration-java.time.Duration-boolean-">persistentTimestampedWindowStore</a>
when you need a persistent windowedKey-(value/timestamp) store.</li>
</ul>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Creating a persistent key-value store:</span>
<span class="c1">// here, we create a `KeyValueStore&lt;String, Long&gt;` named &quot;persistent-counts&quot;.</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<pre class="line-numbers"><code class="language-java">// Creating a persistent key-value store:
// here, we create a `KeyValueStore&lt;String, Long&gt;` named &quot;persistent-counts&quot;.
import org.apache.kafka.streams.state.StoreBuilder;
import org.apache.kafka.streams.state.Stores;
<span class="c1">// Using a `KeyValueStoreBuilder` to build a `KeyValueStore`.</span>
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;persistent-counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">());</span>
<span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">countStore</span> <span class="o">=</span> <span class="n">countStoreSupplier</span><span class="o">.</span><span class="na">build</span><span class="o">();</span></code></pre></div>
</div>
// Using a `KeyValueStoreBuilder` to build a `KeyValueStore`.
StoreBuilder&lt;KeyValueStore&lt;String, Long&gt;&gt; countStoreSupplier =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(&quot;persistent-counts&quot;),
Serdes.String(),
Serdes.Long());
KeyValueStore&lt;String, Long&gt; countStore = countStoreSupplier.build();</code></pre>
</td>
</tr>
<tr class="row-odd"><td>In-memory
@ -268,19 +266,18 @@ @@ -268,19 +266,18 @@
<li>Use <a class="reference external" href="/{{version}}/javadoc/org/apache/kafka/streams/state/TimestampedWindowStore.html">TimestampedWindowStore</a>
when you need to store windowedKey-(value/timestamp) pairs.</li>
</ul>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Creating an in-memory key-value store:</span>
<span class="c1">// here, we create a `KeyValueStore&lt;String, Long&gt;` named &quot;inmemory-counts&quot;.</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<pre class="line-numbers"><code class="language-java">// Creating an in-memory key-value store:
// here, we create a `KeyValueStore&lt;String, Long&gt;` named &quot;inmemory-counts&quot;.
import org.apache.kafka.streams.state.StoreBuilder;
import org.apache.kafka.streams.state.Stores;
<span class="c1">// Using a `KeyValueStoreBuilder` to build a `KeyValueStore`.</span>
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">inMemoryKeyValueStore</span><span class="o">(</span><span class="s">&quot;inmemory-counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">());</span>
<span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">countStore</span> <span class="o">=</span> <span class="n">countStoreSupplier</span><span class="o">.</span><span class="na">build</span><span class="o">();</span></code></pre></div>
</div>
// Using a `KeyValueStoreBuilder` to build a `KeyValueStore`.
StoreBuilder&lt;KeyValueStore&lt;String, Long&gt;&gt; countStoreSupplier =
Stores.keyValueStoreBuilder(
Stores.inMemoryKeyValueStore(&quot;inmemory-counts&quot;),
Serdes.String(),
Serdes.Long());
KeyValueStore&lt;String, Long&gt; countStore = countStoreSupplier.build();</code></pre>
</td>
</tr>
</tbody>
@ -317,15 +314,14 @@ @@ -317,15 +314,14 @@
of the store through <code class="docutils literal"><span class="pre">enableLogging()</span></code> and <code class="docutils literal"><span class="pre">disableLogging()</span></code>.
You can also fine-tune the associated topic&#8217;s configuration if needed.</p>
<p>Example for disabling fault-tolerance:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<pre class="line-numbers"><code class="language-java">import org.apache.kafka.streams.state.StoreBuilder;
import org.apache.kafka.streams.state.Stores;
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span> <span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">withLoggingDisabled</span><span class="o">();</span> <span class="c1">// disable backing up the store to a changelog topic</span></code></pre></div>
</div>
StoreBuilder&lt;KeyValueStore&lt;String, Long&gt;&gt; countStoreSupplier = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(&quot;Counts&quot;),
Serdes.String(),
Serdes.Long())
.withLoggingDisabled(); // disable backing up the store to a changelog topic</code></pre>
<div class="admonition attention">
<p class="first admonition-title">Attention</p>
<p class="last">If the changelog is disabled then the attached state store is no longer fault tolerant and it can&#8217;t have any <a class="reference internal" href="config-streams.html#streams-developer-guide-standby-replicas"><span class="std std-ref">standby replicas</span></a>.</p>
@ -333,19 +329,18 @@ @@ -333,19 +329,18 @@
<p>Here is an example for enabling fault tolerance, with additional changelog-topic configuration:
You can add any log config from <a class="reference external" href="https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogConfig.scala">kafka.log.LogConfig</a>.
Unrecognized configs will be ignored.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<pre class="line-numbers"><code class="language-java">import org.apache.kafka.streams.state.StoreBuilder;
import org.apache.kafka.streams.state.Stores;
<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">changelogConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">();</span>
<span class="c1">// override min.insync.replicas</span>
<span class="n">changelogConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">TopicConfig</span><span class="o">.</span><span class="na">MIN_IN_SYNC_REPLICAS_CONFIG</span><span class="o">,</span> <span class="s">&quot;1&quot;</span><span class="o">)</span>
Map&lt;String, String&gt; changelogConfig = new HashMap();
// override min.insync.replicas
changelogConfig.put(TopicConfig.MIN_IN_SYNC_REPLICAS_CONFIG, &quot;1&quot;)
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span> <span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">withLoggingEnabled</span><span class="o">(</span><span class="n">changlogConfig</span><span class="o">);</span> <span class="c1">// enable changelogging, with custom changelog settings</span></code></pre></div>
</div>
StoreBuilder&lt;KeyValueStore&lt;String, Long&gt;&gt; countStoreSupplier = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(&quot;Counts&quot;),
Serdes.String(),
Serdes.Long())
.withLoggingEnabled(changlogConfig); // enable changelogging, with custom changelog settings</code></pre>
</div>
<div class="section" id="timestamped-state-stores">
<span id="streams-developer-guide-state-store-timestamps"></span><h3><a class="toc-backref" href="#id11">Timestamped State Stores</a><a class="headerlink" href="#timestamped-state-stores" title="Permalink to this headline"></a></h3>
@ -389,12 +384,11 @@ @@ -389,12 +384,11 @@
<code class="docutils literal"><span class="pre">partition</span></code>, <code class="docutils literal"><span class="pre">offset</span></code>, <code class="docutils literal"><span class="pre">timestamp</span></code> and
<code class="docutils literal"><span class="pre">headers</span></code>.</p>
<p>Here is an example implementation of how to add a new header to the record:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">public void process(String key, String value) {</span>
<pre class="line-numbers"><code class="language-java">public void process(String key, String value) {
<span class="c1">// add a header to the elements</span>
<span class="n">context()</span><span class="o">.</span><span class="na">headers</span><span class="o">()</span><span class="o">.</span><span class="na">add</span><span class="o">.</span><span class="o">(</span><span class="s">&quot;key&quot;</span><span class="o">,</span> <span class="s">&quot;key&quot;</span>
<span class="o">}</span></code></pre></div>
</div>
// add a header to the elements
context().headers().add.(&quot;key&quot;, &quot;value&quot;);
}</code></pre>
<div class="section" id="connecting-processors-and-state-stores">
<h2><a class="toc-backref" href="#id8">Connecting Processors and State Stores</a><a class="headerlink" href="#connecting-processors-and-state-stores" title="Permalink to this headline"></a></h2>
<p>Now that a <a class="reference internal" href="#streams-developer-guide-stream-processor"><span class="std std-ref">processor</span></a> (WordCountProcessor) and the
@ -403,16 +397,16 @@ @@ -403,16 +397,16 @@
to generate input data streams into the topology, and sink processors with the specified Kafka topics to generate
output data streams out of the topology.</p>
<p>Here is an example implementation:</p>
<pre class="line-numbers"><code class="language-java"> Topology builder = new Topology();
// add the source processor node that takes Kafka topic "source-topic" as input
builder.addSource("Source", "source-topic")
// add the WordCountProcessor node which takes the source processor as its upstream processor
.addProcessor("Process", () -> new WordCountProcessor(), "Source")
// add the count store associated with the WordCountProcessor processor
.addStateStore(countStoreBuilder, "Process")
// add the sink processor node that takes Kafka topic "sink-topic" as output
// and the WordCountProcessor node as its upstream processor
.addSink("Sink", "sink-topic", "Process");</code></pre>
<pre class="line-numbers"><code class="language-java">Topology builder = new Topology();
// add the source processor node that takes Kafka topic "source-topic" as input
builder.addSource("Source", "source-topic")
// add the WordCountProcessor node which takes the source processor as its upstream processor
.addProcessor("Process", () -> new WordCountProcessor(), "Source")
// add the count store associated with the WordCountProcessor processor
.addStateStore(countStoreBuilder, "Process")
// add the sink processor node that takes Kafka topic "sink-topic" as output
// and the WordCountProcessor node as its upstream processor
.addSink("Sink", "sink-topic", "Process");</code></pre>
<p>Here is a quick explanation of this example:</p>
<ul class="simple">
<li>A source processor node named <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> is added to the topology using the <code class="docutils literal"><span class="pre">addSource</span></code> method, with one Kafka topic
@ -429,22 +423,22 @@ @@ -429,22 +423,22 @@
This can be done by implementing <code class="docutils literal"><span class="pre">ConnectedStoreProvider#stores()</span></code> on the <code class="docutils literal"><span class="pre">ProcessorSupplier</span></code>
instead of calling <code class="docutils literal"><span class="pre">Topology#addStateStore()</span></code>, like this:
</p>
<pre class="line-numbers"><code class="language-java"> Topology builder = new Topology();
// add the source processor node that takes Kafka "source-topic" as input
builder.addSource("Source", "source-topic")
// add the WordCountProcessor node which takes the source processor as its upstream processor.
// the ProcessorSupplier provides the count store associated with the WordCountProcessor
.addProcessor("Process", new ProcessorSupplier&ltString, String&gt() {
public Processor&ltString, String&gt get() {
return new WordCountProcessor();
}
public Set&ltStoreBuilder&lt?&gt&gt stores() {
return countStoreBuilder;
}
}, "Source")
// add the sink processor node that takes Kafka topic "sink-topic" as output
// and the WordCountProcessor node as its upstream processor
.addSink("Sink", "sink-topic", "Process");</code></pre>
<pre class="line-numbers"><code class="language-java">Topology builder = new Topology();
// add the source processor node that takes Kafka "source-topic" as input
builder.addSource("Source", "source-topic")
// add the WordCountProcessor node which takes the source processor as its upstream processor.
// the ProcessorSupplier provides the count store associated with the WordCountProcessor
.addProcessor("Process", new ProcessorSupplier&ltString, String&gt() {
public Processor&ltString, String&gt get() {
return new WordCountProcessor();
}
public Set&ltStoreBuilder&lt?&gt&gt stores() {
return countStoreBuilder;
}
}, "Source")
// add the sink processor node that takes Kafka topic "sink-topic" as output
// and the WordCountProcessor node as its upstream processor
.addSink("Sink", "sink-topic", "Process");</code></pre>
<p>This allows for a processor to "own" state stores, effectively encapsulating their usage from the user wiring the topology.
Multiple processors that share a state store may provide the same store with this technique, as long as the <code class="docutils literal"><span class="pre">StoreBuilder</span></code> is the same <code class="docutils literal"><span class="pre">instance</span></code>.</p>
<p>In these topologies, the <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> stream processor node is considered a downstream processor of the <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> node, and an

7
docs/streams/developer-guide/running-app.html

@ -51,10 +51,9 @@ @@ -51,10 +51,9 @@
<div class="section" id="starting-a-kafka-streams-application">
<span id="streams-developer-guide-execution-starting"></span><h2><a class="toc-backref" href="#id3">Starting a Kafka Streams application</a><a class="headerlink" href="#starting-a-kafka-streams-application" title="Permalink to this headline"></a></h2>
<p>You can package your Java application as a fat JAR file and then start the application like this:</p>
<div class="highlight-bash"><div class="highlight"><pre><code><span></span><span class="c1"># Start the application in class `com.example.MyStreamsApp`</span>
<span class="c1"># from the fat JAR named `path-to-app-fatjar.jar`.</span>
$ java -cp path-to-app-fatjar.jar com.example.MyStreamsApp</code></pre></div>
</div>
<pre class="line-numbers"><code class="language-bash"># Start the application in class `com.example.MyStreamsApp`
# from the fat JAR named `path-to-app-fatjar.jar`.
$ java -cp path-to-app-fatjar.jar com.example.MyStreamsApp</code></pre>
<p>When you start your application you are launching a Kafka Streams instance of your application. You can run multiple
instances of your application. A common scenario is that there are multiple instances of your application running in
parallel. For more information, see <a class="reference internal" href="../architecture.html#streams_architecture_tasks"><span class="std std-ref">Parallelism Model</span></a>.</p>

63
docs/streams/developer-guide/security.html

@ -98,47 +98,44 @@ @@ -98,47 +98,44 @@
then you must also include these SSL certificates in the correct locations within the Docker image.</p>
<p>The snippet below shows the settings to enable client authentication and SSL encryption for data-in-transit between your
Kafka Streams application and the Kafka cluster it is reading and writing from:</p>
<div class="highlight-bash"><div class="highlight"><pre><code><span></span><span class="c1"># Essential security settings to enable client authentication and SSL encryption</span>
bootstrap.servers<span class="o">=</span>kafka.example.com:9093
security.protocol<span class="o">=</span>SSL
ssl.truststore.location<span class="o">=</span>/etc/security/tls/kafka.client.truststore.jks
ssl.truststore.password<span class="o">=</span>test1234
ssl.keystore.location<span class="o">=</span>/etc/security/tls/kafka.client.keystore.jks
ssl.keystore.password<span class="o">=</span>test1234
ssl.key.password<span class="o">=</span>test1234</code></pre></div>
</div>
<pre class="line-numbers"><code class="language-bash"># Essential security settings to enable client authentication and SSL encryption
bootstrap.servers=kafka.example.com:9093
security.protocol=SSL
ssl.truststore.location=/etc/security/tls/kafka.client.truststore.jks
ssl.truststore.password=test1234
ssl.keystore.location=/etc/security/tls/kafka.client.keystore.jks
ssl.keystore.password=test1234
ssl.key.password=test1234</code></pre>
<p>Configure these settings in the application for your <code class="docutils literal"><span class="pre">Properties</span></code> instance. These settings will encrypt any
data-in-transit that is being read from or written to Kafka, and your application will authenticate itself against the
Kafka brokers that it is communicating with. Note that this example does not cover client authorization.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Code of your Java application that uses the Kafka Streams library</span>
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_ID_CONFIG</span><span class="o">,</span> <span class="s">&quot;secure-kafka-streams-app&quot;</span><span class="o">);</span>
<span class="c1">// Where to find secure Kafka brokers. Here, it&#39;s on port 9093.</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">&quot;kafka.example.com:9093&quot;</span><span class="o">);</span>
<span class="c1">//</span>
<span class="c1">// ...further non-security related settings may follow here...</span>
<span class="c1">//</span>
<span class="c1">// Security settings.</span>
<span class="c1">// 1. These settings must match the security settings of the secure Kafka cluster.</span>
<span class="c1">// 2. The SSL trust store and key store files must be locally accessible to the application.</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">CommonClientConfigs</span><span class="o">.</span><span class="na">SECURITY_PROTOCOL_CONFIG</span><span class="o">,</span> <span class="s">&quot;SSL&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_TRUSTSTORE_LOCATION_CONFIG</span><span class="o">,</span> <span class="s">&quot;/etc/security/tls/kafka.client.truststore.jks&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_TRUSTSTORE_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEYSTORE_LOCATION_CONFIG</span><span class="o">,</span> <span class="s">&quot;/etc/security/tls/kafka.client.keystore.jks&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEYSTORE_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEY_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">// Code of your Java application that uses the Kafka Streams library
Properties settings = new Properties();
settings.put(StreamsConfig.APPLICATION_ID_CONFIG, &quot;secure-kafka-streams-app&quot;);
// Where to find secure Kafka brokers. Here, it&#39;s on port 9093.
settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, &quot;kafka.example.com:9093&quot;);
//
// ...further non-security related settings may follow here...
//
// Security settings.
// 1. These settings must match the security settings of the secure Kafka cluster.
// 2. The SSL trust store and key store files must be locally accessible to the application.
settings.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, &quot;SSL&quot;);
settings.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, &quot;/etc/security/tls/kafka.client.truststore.jks&quot;);
settings.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, &quot;test1234&quot;);
settings.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, &quot;/etc/security/tls/kafka.client.keystore.jks&quot;);
settings.put(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, &quot;test1234&quot;);
settings.put(SslConfigs.SSL_KEY_PASSWORD_CONFIG, &quot;test1234&quot;);</code></pre>
<p>If you incorrectly configure a security setting in your application, it will fail at runtime, typically right after you
start it. For example, if you enter an incorrect password for the <code class="docutils literal"><span class="pre">ssl.keystore.password</span></code> setting, an error message
similar to this would be logged and then the application would terminate:</p>
<div class="highlight-bash"><div class="highlight"><pre><code><span></span><span class="c1"># Misconfigured ssl.keystore.password</span>
Exception in thread <span class="s2">&quot;main&quot;</span> org.apache.kafka.common.KafkaException: Failed to construct kafka producer
<span class="o">[</span>...snip...<span class="o">]</span>
<pre class="line-numbers"><code class="language-text"># Misconfigured ssl.keystore.password
Exception in thread &quot;main&quot; org.apache.kafka.common.KafkaException: Failed to construct kafka producer
[...snip...]
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException:
java.io.IOException: Keystore was tampered with, or password was incorrect
<span class="o">[</span>...snip...<span class="o">]</span>
Caused by: java.security.UnrecoverableKeyException: Password verification failed</code></pre></div>
</div>
[...snip...]
Caused by: java.security.UnrecoverableKeyException: Password verification failed</code></pre>
<p>Monitor your Kafka Streams application log files for such error messages to spot any misconfigured applications quickly.</p>
</div>
</div>

42
docs/streams/developer-guide/testing.html

@ -71,15 +71,15 @@ @@ -71,15 +71,15 @@
You can use the test driver to verify that your specified processor topology computes the correct result
with the manually piped in data records.
The test driver captures the results records and allows to query its embedded state stores.
<pre class="line-numbers"><code class="language-text">// Processor API
<pre class="line-numbers"><code class="language-java">// Processor API
Topology topology = new Topology();
topology.addSource("sourceProcessor", "input-topic");
topology.addProcessor("processor", ..., "sourceProcessor");
topology.addSink("sinkProcessor", "output-topic", "processor");
topology.addSource(&quot;sourceProcessor&quot;, &quot;input-topic&quot;);
topology.addProcessor(&quot;processor&quot;, ..., &quot;sourceProcessor&quot;);
topology.addSink(&quot;sinkProcessor&quot;, &quot;output-topic&quot;, &quot;processor&quot;);
// or
// using DSL
StreamsBuilder builder = new StreamsBuilder();
builder.stream("input-topic").filter(...).to("output-topic");
builder.stream(&quot;input-topic&quot;).filter(...).to(&quot;output-topic&quot;);
Topology topology = builder.build();
// create test driver
@ -88,7 +88,7 @@ TopologyTestDriver testDriver = new TopologyTestDriver(topology);</code></pre> @@ -88,7 +88,7 @@ TopologyTestDriver testDriver = new TopologyTestDriver(topology);</code></pre>
With the test driver you can create <code>TestInputTopic</code> giving topic name and the corresponding serializers.
<code>TestInputTopic</code> provides various methods to pipe new message values, keys and values, or list of KeyValue objects.
</p>
<pre class="line-numbers"><code class="language-text">TestInputTopic&lt;String, Long&gt; inputTopic = testDriver.createInputTopic("input-topic", stringSerde.serializer(), longSerde.serializer());
<pre class="line-numbers"><code class="language-java">TestInputTopic&lt;String, Long&gt; inputTopic = testDriver.createInputTopic("input-topic", stringSerde.serializer(), longSerde.serializer());
inputTopic.pipeInput("key", 42L);</code></pre>
<p>
To verify the output, you can use <code>TestOutputTopic</code>
@ -97,7 +97,7 @@ inputTopic.pipeInput("key", 42L);</code></pre> @@ -97,7 +97,7 @@ inputTopic.pipeInput("key", 42L);</code></pre>
For example, you can validate returned <code>KeyValue</code> with standard assertions
if you only care about the key and value, but not the timestamp of the result record.
</p>
<pre class="line-numbers"><code class="language-text">TestOutputTopic&lt;String, Long&gt; outputTopic = testDriver.createOutputTopic("output-topic", stringSerde.deserializer(), longSerde.deserializer());
<pre class="line-numbers"><code class="language-java">TestOutputTopic&lt;String, Long&gt; outputTopic = testDriver.createOutputTopic("output-topic", stringSerde.deserializer(), longSerde.deserializer());
assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("key", 42L)));</code></pre>
<p>
<code>TopologyTestDriver</code> supports punctuations, too.
@ -105,18 +105,18 @@ assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("key", 42L)) @@ -105,18 +105,18 @@ assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("key", 42L))
Wall-clock-time punctuations can also be triggered by advancing the test driver's wall-clock-time (the
driver mocks wall-clock-time internally to give users control over it).
</p>
<pre class="line-numbers"><code class="language-text">testDriver.advanceWallClockTime(Duration.ofSeconds(20));</code></pre>
<pre class="line-numbers"><code class="language-java">testDriver.advanceWallClockTime(Duration.ofSeconds(20));</code></pre>
<p>
Additionally, you can access state stores via the test driver before or after a test.
Accessing stores before a test is useful to pre-populate a store with some initial values.
After data was processed, expected updates to the store can be verified.
</p>
<pre class="line-numbers"><code class="language-text">KeyValueStore store = testDriver.getKeyValueStore("store-name");</code></pre>
<pre class="line-numbers"><code class="language-java">KeyValueStore store = testDriver.getKeyValueStore("store-name");</code></pre>
<p>
Note, that you should always close the test driver at the end to make sure all resources are release
properly.
</p>
<pre class="line-numbers"><code class="language-text">testDriver.close();</code></pre>
<pre class="line-numbers"><code class="language-java">testDriver.close();</code></pre>
<h3>Example</h3>
<p>
@ -125,7 +125,7 @@ assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("key", 42L)) @@ -125,7 +125,7 @@ assertThat(outputTopic.readKeyValue(), equalTo(new KeyValue&lt;&gt;("key", 42L))
While processing, no output is generated, but only the store is updated.
Output is only sent downstream based on event-time and wall-clock punctuations.
</p>
<pre class="line-numbers"><code class="language-text">private TopologyTestDriver testDriver;
<pre class="line-numbers"><code class="language-java">private TopologyTestDriver testDriver;
private TestInputTopic&lt;String, Long&gt; inputTopic;
private TestOutputTopic&lt;String, Long&gt; outputTopic;
private KeyValueStore&lt;String, Long&gt; store;
@ -275,21 +275,21 @@ public class CustomMaxAggregator implements Processor&lt;String, Long&gt; { @@ -275,21 +275,21 @@ public class CustomMaxAggregator implements Processor&lt;String, Long&gt; {
<b>Construction</b>
<p>
To begin with, instantiate your processor and initialize it with the mock context:
<pre class="line-numbers"><code class="language-text">final Processor processorUnderTest = ...;
<pre class="line-numbers"><code class="language-java">final Processor processorUnderTest = ...;
final MockProcessorContext context = new MockProcessorContext();
processorUnderTest.init(context);</code></pre>
If you need to pass configuration to your processor or set the default serdes, you can create the mock with
config:
<pre class="line-numbers"><code class="language-text">final Properties props = new Properties();
<pre class="line-numbers"><code class="language-java">final Properties props = new Properties();
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.Long().getClass());
props.put("some.other.config", "some config value");
props.put(&quot;some.other.config&quot;, &quot;some config value&quot;);
final MockProcessorContext context = new MockProcessorContext(props);</code></pre>
</p>
<b>Captured data</b>
<p>
The mock will capture any values that your processor forwards. You can make assertions on them:
<pre class="line-numbers"><code class="language-text">processorUnderTest.process("key", "value");
<pre class="line-numbers"><code class="language-java">processorUnderTest.process("key", "value");
final Iterator&lt;CapturedForward&gt; forwarded = context.forwarded().iterator();
assertEquals(forwarded.next().keyValue(), new KeyValue&lt;&gt;(..., ...));
@ -301,9 +301,9 @@ context.resetForwards(); @@ -301,9 +301,9 @@ context.resetForwards();
assertEquals(context.forwarded().size(), 0);</code></pre>
If your processor forwards to specific child processors, you can query the context for captured data by
child name:
<pre class="line-numbers"><code class="language-text">final List&lt;CapturedForward&gt; captures = context.forwarded("childProcessorName");</code></pre>
<pre class="line-numbers"><code class="language-java">final List&lt;CapturedForward&gt; captures = context.forwarded("childProcessorName");</code></pre>
The mock also captures whether your processor has called <code>commit()</code> on the context:
<pre class="line-numbers"><code class="language-text">assertTrue(context.committed());
<pre class="line-numbers"><code class="language-java">assertTrue(context.committed());
// commit captures can also be reset.
context.resetCommit();
@ -314,8 +314,8 @@ assertFalse(context.committed());</code></pre> @@ -314,8 +314,8 @@ assertFalse(context.committed());</code></pre>
<p>
In case your processor logic depends on the record metadata (topic, partition, offset, or timestamp),
you can set them on the context, either all together or individually:
<pre class="line-numbers"><code class="language-text">context.setRecordMetadata("topicName", /*partition*/ 0, /*offset*/ 0L, /*timestamp*/ 0L);
context.setTopic("topicName");
<pre class="line-numbers"><code class="language-java">context.setRecordMetadata(&quot;topicName&quot;, /*partition*/ 0, /*offset*/ 0L, /*timestamp*/ 0L);
context.setTopic(&quot;topicName&quot;);
context.setPartition(0);
context.setOffset(0L);
context.setTimestamp(0L);</code></pre>
@ -327,7 +327,7 @@ context.setTimestamp(0L);</code></pre> @@ -327,7 +327,7 @@ context.setTimestamp(0L);</code></pre>
You're encouraged to use a simple in-memory store of the appropriate type (KeyValue, Windowed, or
Session), since the mock context does <i>not</i> manage changelogs, state directories, etc.
</p>
<pre class="line-numbers"><code class="language-text">final KeyValueStore&lt;String, Integer&gt; store =
<pre class="line-numbers"><code class="language-java">final KeyValueStore&lt;String, Integer&gt; store =
Stores.keyValueStoreBuilder(
Stores.inMemoryKeyValueStore("myStore"),
Serdes.String(),
@ -342,7 +342,7 @@ context.register(store, /*deprecated parameter*/ false, /*parameter unused in mo @@ -342,7 +342,7 @@ context.register(store, /*deprecated parameter*/ false, /*parameter unused in mo
Processors can schedule punctuators to handle periodic tasks.
The mock context does <i>not</i> automatically execute punctuators, but it does capture them to
allow you to unit test them as well:
<pre class="line-numbers"><code class="language-text">final MockProcessorContext.CapturedPunctuator capturedPunctuator = context.scheduledPunctuators().get(0);
<pre class="line-numbers"><code class="language-java">final MockProcessorContext.CapturedPunctuator capturedPunctuator = context.scheduledPunctuators().get(0);
final long interval = capturedPunctuator.getIntervalMs();
final PunctuationType type = capturedPunctuator.getType();
final boolean cancelled = capturedPunctuator.cancelled();

140
docs/streams/developer-guide/write-streams.html

@ -90,22 +90,22 @@ @@ -90,22 +90,22 @@
<p class="last">See the section <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data Types and Serialization</span></a> for more information about Serializers/Deserializers.</p>
</div>
<p>Example <code class="docutils literal"><span class="pre">pom.xml</span></code> snippet when using Maven:</p>
<pre class="line-numbers"><code class="language-xml"><dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>{{fullDotVersion}}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>{{fullDotVersion}}</version>
</dependency>
<!-- Optionally include Kafka Streams DSL for Scala for Scala {{scalaVersion}} -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams-scala_{{scalaVersion}}</artifactId>
<version>{{fullDotVersion}}</version>
</dependency></code></pre>
<pre class="line-numbers"><code class="language-xml">&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-streams&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;
&lt;!-- Optionally include Kafka Streams DSL for Scala for Scala {{scalaVersion}} --&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-streams-scala_{{scalaVersion}}&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
</div>
<div class="section" id="using-kafka-streams-within-your-application-code">
<h2>Using Kafka Streams within your application code<a class="headerlink" href="#using-kafka-streams-within-your-application-code" title="Permalink to this headline"></a></h2>
@ -120,79 +120,69 @@ @@ -120,79 +120,69 @@
<li>The second argument is an instance of <code class="docutils literal"><span class="pre">java.util.Properties</span></code>, which defines the configuration for this specific topology.</li>
</ul>
<p>Code example:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.KafkaStreams</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.kstream.StreamsBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.Topology</span><span class="o">;</span>
<span class="c1">// Use the builders to define the actual processing topology, e.g. to specify</span>
<span class="c1">// from which input topics to read, which stream operations (filter, map, etc.)</span>
<span class="c1">// should be called, and so on. We will cover this in detail in the subsequent</span>
<span class="c1">// sections of this Developer Guide.</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the DSL</span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="c1">//</span>
<span class="c1">// OR</span>
<span class="c1">//</span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the Processor API</span>
<span class="c1">// Use the configuration to tell your application where the Kafka cluster is,</span>
<span class="c1">// which Serializers/Deserializers to use by default, to specify security settings,</span>
<span class="c1">// and so on.</span>
<span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">topology</span><span class="o">,</span> <span class="n">props</span><span class="o">);</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.kstream.StreamsBuilder;
import org.apache.kafka.streams.processor.Topology;
// Use the builders to define the actual processing topology, e.g. to specify
// from which input topics to read, which stream operations (filter, map, etc.)
// should be called, and so on. We will cover this in detail in the subsequent
// sections of this Developer Guide.
StreamsBuilder builder = ...; // when using the DSL
Topology topology = builder.build();
//
// OR
//
Topology topology = ...; // when using the Processor API
// Use the configuration to tell your application where the Kafka cluster is,
// which Serializers/Deserializers to use by default, to specify security settings,
// and so on.
Properties props = ...;
KafkaStreams streams = new KafkaStreams(topology, props);</code></pre>
<p>At this point, internal structures are initialized, but the processing is not started yet.
You have to explicitly start the Kafka Streams thread by calling the <code class="docutils literal"><span class="pre">KafkaStreams#start()</span></code> method:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Start the Kafka Streams threads</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">// Start the Kafka Streams threads
streams.start();</code></pre>
<p>If there are other instances of this stream processing application running elsewhere (e.g., on another machine), Kafka
Streams transparently re-assigns tasks from the existing instances to the new instance that you just started.
For more information, see <a class="reference internal" href="../architecture.html#streams_architecture_tasks"><span class="std std-ref">Stream Partitions and Tasks</span></a> and <a class="reference internal" href="../architecture.html#streams_architecture_threads"><span class="std std-ref">Threading Model</span></a>.</p>
<p>To catch any unexpected exceptions, you can set an <code class="docutils literal"><span class="pre">java.lang.Thread.UncaughtExceptionHandler</span></code> before you start the
application. This handler is called whenever a stream thread is terminated by an unexpected exception:</p>
<div class="highlight-java"><div class="highlight"><pre><code><span></span><span class="c1">// Java 8+, using lambda expressions</span>
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">((</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
<span class="o">});</span>
<span class="c1">// Java 7</span>
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">.</span><span class="na">UncaughtExceptionHandler</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">uncaughtException</span><span class="o">(</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
<span class="o">}</span>
<span class="o">});</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">// Java 8+, using lambda expressions
streams.setUncaughtExceptionHandler((Thread thread, Throwable throwable) -&gt; {
// here you should examine the throwable/exception and perform an appropriate action!
});
// Java 7
streams.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
public void uncaughtException(Thread thread, Throwable throwable) {
// here you should examine the throwable/exception and perform an appropriate action!
}
});</code></pre>
<p>To stop the application instance, call the <code class="docutils literal"><span class="pre">KafkaStreams#close()</span></code> method:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Stop the Kafka Streams threads</span>
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span></code></pre></div>
</div>
<pre class="line-numbers"><code class="language-java">// Stop the Kafka Streams threads
streams.close();</code></pre>
<p>To allow your application to gracefully shutdown in response to SIGTERM, it is recommended that you add a shutdown hook
and call <code class="docutils literal"><span class="pre">KafkaStreams#close</span></code>.</p>
<ul>
<li><p class="first">Here is a shutdown hook example in Java 8+:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
<span class="c1">// You can optionally provide a timeout to `close`.</span>
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="n">streams</span><span class="o">::</span><span class="n">close</span><span class="o">));</span></code></pre></div>
</div>
</div></blockquote>
<pre class="line-numbers"><code class="language-java">// Add shutdown hook to stop the Kafka Streams threads.
// You can optionally provide a timeout to `close`.
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));</code></pre>
</li>
<li><p class="first">Here is a shutdown hook example in Java 7:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
<span class="c1">// You can optionally provide a timeout to `close`.</span>
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="k">new</span> <span class="n">Runnable</span><span class="o">()</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">run</span><span class="o">()</span> <span class="o">{</span>
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}));</span></code></pre></div>
</div>
</div></blockquote>
<pre class="line-numbers"><code class="language-java">// Add shutdown hook to stop the Kafka Streams threads.
// You can optionally provide a timeout to `close`.
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
@Override
public void run() {
streams.close();
}
}));</code></pre>
</li>
</ul>
<p>After an application is stopped, Kafka Streams will migrate any tasks that had been running in this instance to available remaining

172
docs/streams/index.html

@ -154,95 +154,95 @@ @@ -154,95 +154,95 @@
</div>
<div class="code-example__snippet b-java-8 selected">
<pre class="line-numbers"><code class="language-java"> import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.state.KeyValueStore;
import java.util.Arrays;
import java.util.Properties;
public class WordCountApplication {
public static void main(final String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
StreamsBuilder builder = new StreamsBuilder();
KStream&lt;String, String&gt; textLines = builder.stream("TextLinesTopic");
KTable&lt;String, Long&gt; wordCounts = textLines
.flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))
.groupBy((key, word) -> word)
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt;as("counts-store"));
wordCounts.toStream().to("WordsWithCountsTopic", Produced.with(Serdes.String(), Serdes.Long()));
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
}
}</code></pre>
<pre class="line-numbers"><code class="language-java">import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.state.KeyValueStore;
import java.util.Arrays;
import java.util.Properties;
public class WordCountApplication {
public static void main(final String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
StreamsBuilder builder = new StreamsBuilder();
KStream&lt;String, String&gt; textLines = builder.stream("TextLinesTopic");
KTable&lt;String, Long&gt; wordCounts = textLines
.flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))
.groupBy((key, word) -> word)
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt;as("counts-store"));
wordCounts.toStream().to("WordsWithCountsTopic", Produced.with(Serdes.String(), Serdes.Long()));
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
}
}</code></pre>
</div>
<div class="code-example__snippet b-java-7">
<pre class="line-numbers"><code class="language-java"> import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.ValueMapper;
import org.apache.kafka.streams.kstream.KeyValueMapper;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.state.KeyValueStore;
import java.util.Arrays;
import java.util.Properties;
public class WordCountApplication {
public static void main(final String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
StreamsBuilder builder = new StreamsBuilder();
KStream&lt;String, String&gt; textLines = builder.stream("TextLinesTopic");
KTable&lt;String, Long&gt; wordCounts = textLines
.flatMapValues(new ValueMapper&lt;String, Iterable&lt;String&gt;&gt;() {
@Override
public Iterable&lt;String&gt; apply(String textLine) {
return Arrays.asList(textLine.toLowerCase().split("\\W+"));
}
})
.groupBy(new KeyValueMapper&lt;String, String, String&gt;() {
@Override
public String apply(String key, String word) {
return word;
}
})
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt;as("counts-store"));
wordCounts.toStream().to("WordsWithCountsTopic", Produced.with(Serdes.String(), Serdes.Long()));
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
}
}</code></pre>
<pre class="line-numbers"><code class="language-java">import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.ValueMapper;
import org.apache.kafka.streams.kstream.KeyValueMapper;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.state.KeyValueStore;
import java.util.Arrays;
import java.util.Properties;
public class WordCountApplication {
public static void main(final String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
StreamsBuilder builder = new StreamsBuilder();
KStream&lt;String, String&gt; textLines = builder.stream("TextLinesTopic");
KTable&lt;String, Long&gt; wordCounts = textLines
.flatMapValues(new ValueMapper&lt;String, Iterable&lt;String&gt;&gt;() {
@Override
public Iterable&lt;String&gt; apply(String textLine) {
return Arrays.asList(textLine.toLowerCase().split("\\W+"));
}
})
.groupBy(new KeyValueMapper&lt;String, String, String&gt;() {
@Override
public String apply(String key, String word) {
return word;
}
})
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt;as("counts-store"));
wordCounts.toStream().to("WordsWithCountsTopic", Produced.with(Serdes.String(), Serdes.Long()));
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
}
}</code></pre>
</div>
<div class="code-example__snippet b-scala">

491
docs/streams/tutorial.html

@ -42,32 +42,31 @@ @@ -42,32 +42,31 @@
We are going to use a Kafka Streams Maven Archetype for creating a Streams project structure with the following commands:
</p>
<pre class="line-numbers"><code class="language-bash"> mvn archetype:generate \
-DarchetypeGroupId=org.apache.kafka \
-DarchetypeArtifactId=streams-quickstart-java \
-DarchetypeVersion={{fullDotVersion}} \
-DgroupId=streams.examples \
-DartifactId=streams.examples \
-Dversion=0.1 \
-Dpackage=myapps</code></pre>
<pre class="line-numbers"><code class="language-bash">mvn archetype:generate \
-DarchetypeGroupId=org.apache.kafka \
-DarchetypeArtifactId=streams-quickstart-java \
-DarchetypeVersion={{fullDotVersion}} \
-DgroupId=streams.examples \
-DartifactId=streams.examples \
-Dversion=0.1 \
-Dpackage=myapps</code></pre>
<p>
You can use a different value for <code>groupId</code>, <code>artifactId</code> and <code>package</code> parameters if you like.
Assuming the above parameter values are used, this command will create a project structure that looks like this:
</p>
<pre class="line-numbers"><code class="language-bash"> &gt; tree streams.examples
streams-quickstart
|-- pom.xml
|-- src
|-- main
|-- java
| |-- myapps
| |-- LineSplit.java
| |-- Pipe.java
| |-- WordCount.java
|-- resources
|-- log4j.properties</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; tree streams.examples
streams-quickstart
|-- pom.xml
|-- src
|-- main
|-- java
| |-- myapps
| |-- LineSplit.java
| |-- Pipe.java
| |-- WordCount.java
|-- resources
|-- log4j.properties</code></pre>
<p>
The <code>pom.xml</code> file included in the project already has the Streams dependency defined.
@ -79,22 +78,22 @@ @@ -79,22 +78,22 @@
Since we are going to start writing such programs from scratch, we can now delete these examples:
</p>
<pre class="line-numbers"><code class="language-bash"> &gt; cd streams-quickstart
&gt; rm src/main/java/myapps/*.java</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; cd streams-quickstart
&gt; rm src/main/java/myapps/*.java</code></pre>
<h4><a id="tutorial_code_pipe" href="#tutorial_code_pipe">Writing a first Streams application: Pipe</a></h4>
It's coding time now! Feel free to open your favorite IDE and import this Maven project, or simply open a text editor and create a java file under <code>src/main/java/myapps</code>.
Let's name it <code>Pipe.java</code>:
<pre class="line-numbers"><code class="language-java"> package myapps;
<pre class="line-numbers"><code class="language-java">package myapps;
public class Pipe {
public class Pipe {
public static void main(String[] args) throws Exception {
public static void main(String[] args) throws Exception {
}
}</code></pre>
}
}</code></pre>
<p>
We are going to fill in the <code>main</code> function to write this pipe program. Note that we will not list the import statements as we go since IDEs can usually add them automatically.
@ -107,16 +106,16 @@ @@ -107,16 +106,16 @@
and <code>StreamsConfig.APPLICATION_ID_CONFIG</code>, which gives the unique identifier of your Streams application to distinguish itself with other applications talking to the same Kafka cluster:
</p>
<pre class="line-numbers"><code class="language-java"> Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); // assuming that the Kafka broker this application is talking to runs on local machine with port 9092</code></pre>
<pre class="line-numbers"><code class="language-java">Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); // assuming that the Kafka broker this application is talking to runs on local machine with port 9092</code></pre>
<p>
In addition, you can customize other configurations in the same map, for example, default serialization and deserialization libraries for the record key-value pairs:
</p>
<pre class="line-numbers"><code class="language-java"> props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());</code></pre>
<pre class="line-numbers"><code class="language-java">props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());</code></pre>
<p>
For a full list of configurations of Kafka Streams please refer to this <a href="/{{version}}/documentation/#streamsconfigs">table</a>.
@ -128,13 +127,13 @@ @@ -128,13 +127,13 @@
We can use a topology builder to construct such a topology,
</p>
<pre class="line-numbers"><code class="language-java"> final StreamsBuilder builder = new StreamsBuilder();</code></pre>
<pre class="line-numbers"><code class="language-java">final StreamsBuilder builder = new StreamsBuilder();</code></pre>
<p>
And then create a source stream from a Kafka topic named <code>streams-plaintext-input</code> using this topology builder:
</p>
<pre class="line-numbers"><code class="language-java"> KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");</code></pre>
<p>
Now we get a <code>KStream</code> that is continuously generating records from its source Kafka topic <code>streams-plaintext-input</code>.
@ -142,38 +141,38 @@ @@ -142,38 +141,38 @@
The simplest thing we can do with this stream is to write it into another Kafka topic, say it's named <code>streams-pipe-output</code>:
</p>
<pre class="line-numbers"><code class="language-java"> source.to("streams-pipe-output");</code></pre>
<pre class="line-numbers"><code class="language-java">source.to("streams-pipe-output");</code></pre>
<p>
Note that we can also concatenate the above two lines into a single line as:
</p>
<pre class="line-numbers"><code class="language-java"> builder.stream("streams-plaintext-input").to("streams-pipe-output");</code></pre>
<pre class="line-numbers"><code class="language-java">builder.stream("streams-plaintext-input").to("streams-pipe-output");</code></pre>
<p>
We can inspect what kind of <code>topology</code> is created from this builder by doing the following:
</p>
<pre class="line-numbers"><code class="language-java"> final Topology topology = builder.build();</code></pre>
<pre class="line-numbers"><code class="language-java">final Topology topology = builder.build();</code></pre>
<p>
And print its description to standard output as:
</p>
<pre class="line-numbers"><code class="language-java"> System.out.println(topology.describe());</code></pre>
<pre class="line-numbers"><code class="language-java">System.out.println(topology.describe());</code></pre>
<p>
If we just stop here, compile and run the program, it will output the following information:
</p>
<pre class="line-numbers"><code class="language-bash"> &gt; mvn clean package
&gt; mvn exec:java -Dexec.mainClass=myapps.Pipe
Sub-topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000(topics: streams-plaintext-input) --> KSTREAM-SINK-0000000001
Sink: KSTREAM-SINK-0000000001(topic: streams-pipe-output) <-- KSTREAM-SOURCE-0000000000
Global Stores:
none</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; mvn clean package
&gt; mvn exec:java -Dexec.mainClass=myapps.Pipe
Sub-topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000(topics: streams-plaintext-input) --> KSTREAM-SINK-0000000001
Sink: KSTREAM-SINK-0000000001(topic: streams-pipe-output) <-- KSTREAM-SOURCE-0000000000
Global Stores:
none</code></pre>
<p>
As shown above, it illustrates that the constructed topology has two processor nodes, a source node <code>KSTREAM-SOURCE-0000000000</code> and a sink node <code>KSTREAM-SINK-0000000001</code>.
@ -189,7 +188,7 @@ @@ -189,7 +188,7 @@
we can now construct the Streams client with the two components we have just constructed above: the configuration map specified in a <code>java.util.Properties</code> instance and the <code>Topology</code> object.
</p>
<pre class="line-numbers"><code class="language-java"> final KafkaStreams streams = new KafkaStreams(topology, props);</code></pre>
<pre class="line-numbers"><code class="language-java">final KafkaStreams streams = new KafkaStreams(topology, props);</code></pre>
<p>
By calling its <code>start()</code> function we can trigger the execution of this client.
@ -197,76 +196,76 @@ @@ -197,76 +196,76 @@
We can, for example, add a shutdown hook with a countdown latch to capture a user interrupt and close the client upon terminating this program:
</p>
<pre class="line-numbers"><code class="language-java"> final CountDownLatch latch = new CountDownLatch(1);
<pre class="line-numbers"><code class="language-java">final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
@Override
public void run() {
streams.close();
latch.countDown();
}
});
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
@Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);</code></pre>
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);</code></pre>
<p>
The complete code so far looks like this:
</p>
<pre class="line-numbers"><code class="language-java"> package myapps;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
<pre class="line-numbers"><code class="language-java">package myapps;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
public class Pipe {
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
public class Pipe {
final StreamsBuilder builder = new StreamsBuilder();
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
builder.stream("streams-plaintext-input").to("streams-pipe-output");
final StreamsBuilder builder = new StreamsBuilder();
final Topology topology = builder.build();
builder.stream("streams-plaintext-input").to("streams-pipe-output");
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
final Topology topology = builder.build();
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
@Override
public void run() {
streams.close();
latch.countDown();
}
});
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
@Override
public void run() {
streams.close();
latch.countDown();
}
}</code></pre>
});
try {
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
}</code></pre>
<p>
If you already have the Kafka broker up and running at <code>localhost:9092</code>,
@ -274,8 +273,8 @@ @@ -274,8 +273,8 @@
you can run this code in your IDE or on the command line, using Maven:
</p>
<pre class="line-numbers"><code class="language-brush"> &gt; mvn clean package
&gt; mvn exec:java -Dexec.mainClass=myapps.Pipe</code></pre>
<pre class="line-numbers"><code class="language-brush">&gt; mvn clean package
&gt; mvn exec:java -Dexec.mainClass=myapps.Pipe</code></pre>
<p>
For detailed instructions on how to run a Streams application and observe its computing results,
@ -291,33 +290,33 @@ @@ -291,33 +290,33 @@
We can first create another program by first copy the existing <code>Pipe.java</code> class:
</p>
<pre class="line-numbers"><code class="language-brush"> &gt; cp src/main/java/myapps/Pipe.java src/main/java/myapps/LineSplit.java</code></pre>
<pre class="line-numbers"><code class="language-brush">&gt; cp src/main/java/myapps/Pipe.java src/main/java/myapps/LineSplit.java</code></pre>
<p>
And change its class name as well as the application id config to distinguish with the original program:
</p>
<pre class="line-numbers"><code class="language-java"> public class LineSplit {
<pre class="line-numbers"><code class="language-java">public class LineSplit {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-linesplit");
// ...
}
}</code></pre>
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-linesplit");
// ...
}
}</code></pre>
<p>
Since each of the source stream's record is a <code>String</code> typed key-value pair,
let's treat the value string as a text line and split it into words with a <code>FlatMapValues</code> operator:
</p>
<pre class="line-numbers"><code class="language-java"> KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
KStream&lt;String, String&gt; words = source.flatMapValues(new ValueMapper&lt;String, Iterable&lt;String&gt;&gt;() {
@Override
public Iterable&lt;String&gt; apply(String value) {
return Arrays.asList(value.split("\\W+"));
}
});</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
KStream&lt;String, String&gt; words = source.flatMapValues(new ValueMapper&lt;String, Iterable&lt;String&gt;&gt;() {
@Override
public Iterable&lt;String&gt; apply(String value) {
return Arrays.asList(value.split("\\W+"));
}
});</code></pre>
<p>
The operator will take the <code>source</code> stream as its input, and generate a new stream named <code>words</code>
@ -327,31 +326,31 @@ @@ -327,31 +326,31 @@
Note if you are using JDK 8 you can use lambda expression and simplify the above code as:
</p>
<pre class="line-numbers"><code class="language-java"> KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
KStream&lt;String, String&gt; words = source.flatMapValues(value -> Arrays.asList(value.split("\\W+")));</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
KStream&lt;String, String&gt; words = source.flatMapValues(value -> Arrays.asList(value.split("\\W+")));</code></pre>
<p>
And finally we can write the word stream back into another Kafka topic, say <code>streams-linesplit-output</code>.
Again, these two steps can be concatenated as the following (assuming lambda expression is used):
</p>
<pre class="line-numbers"><code class="language-java"> KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
source.flatMapValues(value -> Arrays.asList(value.split("\\W+")))
.to("streams-linesplit-output");</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
source.flatMapValues(value -> Arrays.asList(value.split("\\W+")))
.to("streams-linesplit-output");</code></pre>
<p>
If we now describe this augmented topology as <code>System.out.println(topology.describe())</code>, we will get the following:
</p>
<pre class="line-numbers"><code class="language-bash"> &gt; mvn clean package
&gt; mvn exec:java -Dexec.mainClass=myapps.LineSplit
Sub-topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000(topics: streams-plaintext-input) --> KSTREAM-FLATMAPVALUES-0000000001
Processor: KSTREAM-FLATMAPVALUES-0000000001(stores: []) --> KSTREAM-SINK-0000000002 <-- KSTREAM-SOURCE-0000000000
Sink: KSTREAM-SINK-0000000002(topic: streams-linesplit-output) <-- KSTREAM-FLATMAPVALUES-0000000001
Global Stores:
none</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; mvn clean package
&gt; mvn exec:java -Dexec.mainClass=myapps.LineSplit
Sub-topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000(topics: streams-plaintext-input) --> KSTREAM-FLATMAPVALUES-0000000001
Processor: KSTREAM-FLATMAPVALUES-0000000001(stores: []) --> KSTREAM-SINK-0000000002 <-- KSTREAM-SOURCE-0000000000
Sink: KSTREAM-SINK-0000000002(topic: streams-linesplit-output) <-- KSTREAM-FLATMAPVALUES-0000000001
Global Stores:
none</code></pre>
<p>
As we can see above, a new processor node <code>KSTREAM-FLATMAPVALUES-0000000001</code> is injected into the topology between the original source and sink nodes.
@ -365,41 +364,41 @@ @@ -365,41 +364,41 @@
The complete code looks like this (assuming lambda expression is used):
</p>
<pre class="line-numbers"><code class="language-java"> package myapps;
<pre class="line-numbers"><code class="language-java">package myapps;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.KStream;
import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class LineSplit {
public class LineSplit {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-linesplit");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-linesplit");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
final StreamsBuilder builder = new StreamsBuilder();
final StreamsBuilder builder = new StreamsBuilder();
KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
source.flatMapValues(value -> Arrays.asList(value.split("\\W+")))
.to("streams-linesplit-output");
KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
source.flatMapValues(value -> Arrays.asList(value.split("\\W+")))
.to("streams-linesplit-output");
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// ... same as Pipe.java above
}
}</code></pre>
// ... same as Pipe.java above
}
}</code></pre>
<h4><a id="tutorial_code_wordcount" href="#tutorial_code_wordcount">Writing a third Streams application: Wordcount</a></h4>
@ -408,47 +407,47 @@ @@ -408,47 +407,47 @@
Following similar steps let's create another program based on the <code>LineSplit.java</code> class:
</p>
<pre class="line-numbers"><code class="language-java"> public class WordCount {
<pre class="line-numbers"><code class="language-java">public class WordCount {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
// ...
}
}</code></pre>
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
// ...
}
}</code></pre>
<p>
In order to count the words we can first modify the <code>flatMapValues</code> operator to treat all of them as lower case (assuming lambda expression is used):
</p>
<pre class="line-numbers"><code class="language-java"> source.flatMapValues(new ValueMapper&lt;String, Iterable&lt;String&gt;&gt;() {
@Override
public Iterable&lt;String&gt; apply(String value) {
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+"));
}
});</code></pre>
<pre class="line-numbers"><code class="language-java">source.flatMapValues(new ValueMapper&lt;String, Iterable&lt;String&gt;&gt;() {
@Override
public Iterable&lt;String&gt; apply(String value) {
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+"));
}
});</code></pre>
<p>
In order to do the counting aggregation we have to first specify that we want to key the stream on the value string, i.e. the lower cased word, with a <code>groupBy</code> operator.
This operator generate a new grouped stream, which can then be aggregated by a <code>count</code> operator, which generates a running count on each of the grouped keys:
</p>
<pre class="line-numbers"><code class="language-java"> KTable&lt;String, Long&gt; counts =
source.flatMapValues(new ValueMapper&lt;String, Iterable&lt;String&gt;&gt;() {
@Override
public Iterable&lt;String&gt; apply(String value) {
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+"));
}
})
.groupBy(new KeyValueMapper&lt;String, String, String&gt;() {
@Override
public String apply(String key, String value) {
return value;
}
})
// Materialize the result into a KeyValueStore named "counts-store".
// The Materialized store is always of type &lt;Bytes, byte[]&gt; as this is the format of the inner most store.
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt; as("counts-store"));</code></pre>
<pre class="line-numbers"><code class="language-java">KTable&lt;String, Long&gt; counts =
source.flatMapValues(new ValueMapper&lt;String, Iterable&lt;String&gt;&gt;() {
@Override
public Iterable&lt;String&gt; apply(String value) {
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+"));
}
})
.groupBy(new KeyValueMapper&lt;String, String, String&gt;() {
@Override
public String apply(String key, String value) {
return value;
}
})
// Materialize the result into a KeyValueStore named "counts-store".
// The Materialized store is always of type &lt;Bytes, byte[]&gt; as this is the format of the inner most store.
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt; as("counts-store"));</code></pre>
<p>
Note that the <code>count</code> operator has a <code>Materialized</code> parameter that specifies that the
@ -463,7 +462,7 @@ @@ -463,7 +462,7 @@
We need to provide overridden serialization methods for <code>Long</code> types, otherwise a runtime exception will be thrown:
</p>
<pre class="line-numbers"><code class="language-java"> counts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));</code></pre>
<pre class="line-numbers"><code class="language-java">counts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));</code></pre>
<p>
Note that in order to read the changelog stream from topic <code>streams-wordcount-output</code>,
@ -472,33 +471,33 @@ @@ -472,33 +471,33 @@
Assuming lambda expression from JDK 8 can be used, the above code can be simplified as:
</p>
<pre class="line-numbers"><code class="language-java"> KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
source.flatMapValues(value -> Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+")))
.groupBy((key, value) -> value)
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt;as("counts-store"))
.toStream()
.to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));</code></pre>
<pre class="line-numbers"><code class="language-java">KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
source.flatMapValues(value -> Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+")))
.groupBy((key, value) -> value)
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt;as("counts-store"))
.toStream()
.to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));</code></pre>
<p>
If we again describe this augmented topology as <code>System.out.println(topology.describe())</code>, we will get the following:
</p>
<pre class="line-numbers"><code class="language-bash"> &gt; mvn clean package
&gt; mvn exec:java -Dexec.mainClass=myapps.WordCount
Sub-topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000(topics: streams-plaintext-input) --> KSTREAM-FLATMAPVALUES-0000000001
Processor: KSTREAM-FLATMAPVALUES-0000000001(stores: []) --> KSTREAM-KEY-SELECT-0000000002 <-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-KEY-SELECT-0000000002(stores: []) --> KSTREAM-FILTER-0000000005 <-- KSTREAM-FLATMAPVALUES-0000000001
Processor: KSTREAM-FILTER-0000000005(stores: []) --> KSTREAM-SINK-0000000004 <-- KSTREAM-KEY-SELECT-0000000002
Sink: KSTREAM-SINK-0000000004(topic: Counts-repartition) <-- KSTREAM-FILTER-0000000005
Sub-topology: 1
Source: KSTREAM-SOURCE-0000000006(topics: Counts-repartition) --> KSTREAM-AGGREGATE-0000000003
Processor: KSTREAM-AGGREGATE-0000000003(stores: [Counts]) --> KTABLE-TOSTREAM-0000000007 <-- KSTREAM-SOURCE-0000000006
Processor: KTABLE-TOSTREAM-0000000007(stores: []) --> KSTREAM-SINK-0000000008 <-- KSTREAM-AGGREGATE-0000000003
Sink: KSTREAM-SINK-0000000008(topic: streams-wordcount-output) <-- KTABLE-TOSTREAM-0000000007
Global Stores:
none</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; mvn clean package
&gt; mvn exec:java -Dexec.mainClass=myapps.WordCount
Sub-topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000(topics: streams-plaintext-input) --> KSTREAM-FLATMAPVALUES-0000000001
Processor: KSTREAM-FLATMAPVALUES-0000000001(stores: []) --> KSTREAM-KEY-SELECT-0000000002 <-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-KEY-SELECT-0000000002(stores: []) --> KSTREAM-FILTER-0000000005 <-- KSTREAM-FLATMAPVALUES-0000000001
Processor: KSTREAM-FILTER-0000000005(stores: []) --> KSTREAM-SINK-0000000004 <-- KSTREAM-KEY-SELECT-0000000002
Sink: KSTREAM-SINK-0000000004(topic: Counts-repartition) <-- KSTREAM-FILTER-0000000005
Sub-topology: 1
Source: KSTREAM-SOURCE-0000000006(topics: Counts-repartition) --> KSTREAM-AGGREGATE-0000000003
Processor: KSTREAM-AGGREGATE-0000000003(stores: [Counts]) --> KTABLE-TOSTREAM-0000000007 <-- KSTREAM-SOURCE-0000000006
Processor: KTABLE-TOSTREAM-0000000007(stores: []) --> KSTREAM-SINK-0000000008 <-- KSTREAM-AGGREGATE-0000000003
Sink: KSTREAM-SINK-0000000008(topic: streams-wordcount-output) <-- KTABLE-TOSTREAM-0000000007
Global Stores:
none</code></pre>
<p>
As we can see above, the topology now contains two disconnected sub-topologies.
@ -517,49 +516,49 @@ @@ -517,49 +516,49 @@
The complete code looks like this (assuming lambda expression is used):
</p>
<pre class="line-numbers"><code class="language-java"> package myapps;
<pre class="line-numbers"><code class="language-java">package myapps;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.state.KeyValueStore;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.state.KeyValueStore;
import java.util.Arrays;
import java.util.Locale;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.Arrays;
import java.util.Locale;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class WordCount {
public class WordCount {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
final StreamsBuilder builder = new StreamsBuilder();
final StreamsBuilder builder = new StreamsBuilder();
KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
source.flatMapValues(value -> Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+")))
.groupBy((key, value) -> value)
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt;as("counts-store"))
.toStream()
.to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
KStream&lt;String, String&gt; source = builder.stream("streams-plaintext-input");
source.flatMapValues(value -> Arrays.asList(value.toLowerCase(Locale.getDefault()).split("\\W+")))
.groupBy((key, value) -> value)
.count(Materialized.&lt;String, Long, KeyValueStore&lt;Bytes, byte[]&gt;&gt;as("counts-store"))
.toStream()
.to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// ... same as Pipe.java above
}
}</code></pre>
// ... same as Pipe.java above
}
}</code></pre>
<div class="pagination">
<a href="/{{version}}/documentation/streams/quickstart" class="pagination__btn pagination__btn__prev">Previous</a>

Loading…
Cancel
Save