Mirror of Apache Kafka
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

265 lines
19 KiB

<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<!-- div class="sticky-top">
<div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div>
</div -->
</div>
<div class="section" id="writing-a-streams-application">
<span id="streams-write-app"></span><h1>Writing a Streams Application<a class="headerlink" href="#writing-a-streams-application" title="Permalink to this headline"></a></h1>
<p class="topic-title first"><b>Table of Contents</b></p>
<ul class="simple">
<li><a class="reference internal" href="#libraries-and-maven-artifacts" id="id1">Libraries and Maven artifacts</a></li>
<li><a class="reference internal" href="#using-kafka-streams-within-your-application-code" id="id2">Using Kafka Streams within your application code</a></li>
<li><a class="reference internal" href="#testing-a-streams-app" id="id3">Testing a Streams application</a></li>
</ul>
<p>Any Java or Scala application that makes use of the Kafka Streams library is considered a Kafka Streams application.
The computational logic of a Kafka Streams application is defined as a <a class="reference internal" href="../core-concepts#streams_topology"><span class="std std-ref">processor topology</span></a>,
which is a graph of stream processors (nodes) and streams (edges).</p>
<p>You can define the processor topology with the Kafka Streams APIs:</p>
<dl class="docutils">
<dt><a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl"><span class="std std-ref">Kafka Streams DSL</span></a></dt>
<dd>A high-level API that provides the most common data transformation operations such as <code class="docutils literal"><span class="pre">map</span></code>, <code class="docutils literal"><span class="pre">filter</span></code>, <code class="docutils literal"><span class="pre">join</span></code>, and <code class="docutils literal"><span class="pre">aggregations</span></code> out of the box. The DSL is the recommended starting point for developers new to Kafka Streams, and should cover many use cases and stream processing needs. If you're writing a Scala application then you can use the <a href="dsl-api.html#scala-dsl"><span class="std std-ref">Kafka Streams DSL for Scala</span></a> library which removes much of the Java/Scala interoperability boilerplate as opposed to working directly with the Java DSL.</a></dd>
<dt><a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a></dt>
<dd>A low-level API that lets you add and connect processors as well as interact directly with state stores. The Processor API provides you with even more flexibility than the DSL but at the expense of requiring more manual work on the side of the application developer (e.g., more lines of code).</dd>
</dl>
<div class="section" id="libraries-and-maven-artifacts">
<span id="streams-developer-guide-maven"></span><h2>Libraries and Maven artifacts</h2>
<p>This section lists the Kafka Streams related libraries that are available for writing your Kafka Streams applications.</p>
<p>You can define dependencies on the following libraries for your Kafka Streams applications.</p>
<table border="1" class="non-scrolling-table docutils">
<colgroup>
<col width="14%" />
<col width="19%" />
<col width="12%" />
<col width="55%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Group ID</th>
<th class="head">Artifact ID</th>
<th class="head">Version</th>
<th class="head">Description</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td><code class="docutils literal"><span class="pre">org.apache.kafka</span></code></td>
<td><code class="docutils literal"><span class="pre">kafka-streams</span></code></td>
<td><code class="docutils literal"><span class="pre">{{fullDotVersion}}</span></code></td>
<td>(Required) Base library for Kafka Streams.</td>
</tr>
<tr class="row-odd"><td><code class="docutils literal"><span class="pre">org.apache.kafka</span></code></td>
<td><code class="docutils literal"><span class="pre">kafka-clients</span></code></td>
<td><code class="docutils literal"><span class="pre">{{fullDotVersion}}</span></code></td>
<td>(Required) Kafka client library. Contains built-in serializers/deserializers.</td>
</tr>
<tr class="row-even"><td><code class="docutils literal"><span class="pre">org.apache.kafka</span></code></td>
<td><code class="docutils literal"><span class="pre">kafka-streams-scala</span></code></td>
<td><code class="docutils literal"><span class="pre">{{fullDotVersion}}</span></code></td>
<td>(Optional) Kafka Streams DSL for Scala library to write Scala Kafka Streams applications. When not using SBT you will need to suffix the artifact ID with the correct version of Scala your application is using (<code class="docutils literal"><span class="pre">_2.11</code></span>, <code class="docutils literal"><span class="pre">_2.12</code></span>)</td>
</tr>
</tbody>
</table>
<div class="admonition tip">
<p><b>Tip</b></p>
<p class="last">See the section <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data Types and Serialization</span></a> for more information about Serializers/Deserializers.</p>
</div>
<p>Example <code class="docutils literal"><span class="pre">pom.xml</span></code> snippet when using Maven:</p>
<pre class="brush: xml;">
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>{{fullDotVersion}}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>{{fullDotVersion}}</version>
</dependency>
<!-- Optionally include Kafka Streams DSL for Scala for Scala 2.11 -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams-scala_2.11</artifactId>
<version>{{fullDotVersion}}</version>
</dependency>
</pre>
</div>
<div class="section" id="using-kafka-streams-within-your-application-code">
<h2>Using Kafka Streams within your application code<a class="headerlink" href="#using-kafka-streams-within-your-application-code" title="Permalink to this headline"></a></h2>
<p>You can call Kafka Streams from anywhere in your application code, but usually these calls are made within the <code class="docutils literal"><span class="pre">main()</span></code> method of
your application, or some variant thereof. The basic elements of defining a processing topology within your application
are described below.</p>
<p>First, you must create an instance of <code class="docutils literal"><span class="pre">KafkaStreams</span></code>.</p>
<ul class="simple">
<li>The first argument of the <code class="docutils literal"><span class="pre">KafkaStreams</span></code> constructor takes a topology (either <code class="docutils literal"><span class="pre">StreamsBuilder#build()</span></code> for the
<a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl"><span class="std std-ref">DSL</span></a> or <code class="docutils literal"><span class="pre">Topology</span></code> for the
<a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a>) that is used to define a topology.</li>
<li>The second argument is an instance of <code class="docutils literal"><span class="pre">StreamsConfig</span></code>, which defines the configuration for this specific topology.</li>
</ul>
<p>Code example:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.KafkaStreams</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.kstream.StreamsBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.Topology</span><span class="o">;</span>
<span class="c1">// Use the builders to define the actual processing topology, e.g. to specify</span>
<span class="c1">// from which input topics to read, which stream operations (filter, map, etc.)</span>
<span class="c1">// should be called, and so on. We will cover this in detail in the subsequent</span>
<span class="c1">// sections of this Developer Guide.</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the DSL</span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="c1">//</span>
<span class="c1">// OR</span>
<span class="c1">//</span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the Processor API</span>
<span class="c1">// Use the configuration to tell your application where the Kafka cluster is,</span>
<span class="c1">// which Serializers/Deserializers to use by default, to specify security settings,</span>
<span class="c1">// and so on.</span>
<span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">topology</span><span class="o">,</span> <span class="n">config</span><span class="o">);</span>
</pre></div>
</div>
<p>At this point, internal structures are initialized, but the processing is not started yet.
You have to explicitly start the Kafka Streams thread by calling the <code class="docutils literal"><span class="pre">KafkaStreams#start()</span></code> method:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Start the Kafka Streams threads</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
</pre></div>
</div>
<p>If there are other instances of this stream processing application running elsewhere (e.g., on another machine), Kafka
Streams transparently re-assigns tasks from the existing instances to the new instance that you just started.
For more information, see <a class="reference internal" href="../architecture.html#streams-architecture-tasks"><span class="std std-ref">Stream Partitions and Tasks</span></a> and <a class="reference internal" href="../architecture.html#streams-architecture-threads"><span class="std std-ref">Threading Model</span></a>.</p>
<p>To catch any unexpected exceptions, you can set an <code class="docutils literal"><span class="pre">java.lang.Thread.UncaughtExceptionHandler</span></code> before you start the
application. This handler is called whenever a stream thread is terminated by an unexpected exception:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Java 8+, using lambda expressions</span>
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">((</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
<span class="o">});</span>
<span class="c1">// Java 7</span>
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">.</span><span class="na">UncaughtExceptionHandler</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">uncaughtException</span><span class="o">(</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
<span class="o">}</span>
<span class="o">});</span>
</pre></div>
</div>
<p>To stop the application instance, call the <code class="docutils literal"><span class="pre">KafkaStreams#close()</span></code> method:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Stop the Kafka Streams threads</span>
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
</pre></div>
</div>
<p>To allow your application to gracefully shutdown in response to SIGTERM, it is recommended that you add a shutdown hook
and call <code class="docutils literal"><span class="pre">KafkaStreams#close</span></code>.</p>
<ul>
<li><p class="first">Here is a shutdown hook example in Java 8+:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
<span class="c1">// You can optionally provide a timeout to `close`.</span>
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="n">streams</span><span class="o">::</span><span class="n">close</span><span class="o">));</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p class="first">Here is a shutdown hook example in Java 7:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
<span class="c1">// You can optionally provide a timeout to `close`.</span>
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="k">new</span> <span class="n">Runnable</span><span class="o">()</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">run</span><span class="o">()</span> <span class="o">{</span>
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}));</span>
</pre></div>
</div>
</div></blockquote>
</li>
</ul>
<p>After an application is stopped, Kafka Streams will migrate any tasks that had been running in this instance to available remaining
instances.</p>
</div>
<div class="section" id="testing-a-streams-app">
<a class="headerlink" href="#testing-a-streams-app" title="Permalink to this headline"><h2>Testing a Streams application</a></h2>
Kafka Streams comes with a <code>test-utils</code> module to help you test your application <a href="testing.html">here</a>.
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/config-streams" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>