You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
81 lines
5.5 KiB
81 lines
5.5 KiB
<!-- |
|
Licensed to the Apache Software Foundation (ASF) under one or more |
|
contributor license agreements. See the NOTICE file distributed with |
|
this work for additional information regarding copyright ownership. |
|
The ASF licenses this file to You under the Apache License, Version 2.0 |
|
(the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
Unless required by applicable law or agreed to in writing, software |
|
distributed under the License is distributed on an "AS IS" BASIS, |
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
|
See the License for the specific language governing permissions and |
|
limitations under the License. |
|
--> |
|
|
|
<p> Here is a description of a few of the popular use cases for Apache Kafka®. |
|
For an overview of a number of these areas in action, see <a href="https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying/">this blog post</a>. </p> |
|
|
|
<h4><a id="uses_messaging" href="#uses_messaging">Messaging</a></h4> |
|
|
|
Kafka works well as a replacement for a more traditional message broker. |
|
Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). |
|
In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good |
|
solution for large scale message processing applications. |
|
<p> |
|
In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong |
|
durability guarantees Kafka provides. |
|
<p> |
|
In this domain Kafka is comparable to traditional messaging systems such as <a href="http://activemq.apache.org">ActiveMQ</a> or |
|
<a href="https://www.rabbitmq.com">RabbitMQ</a>. |
|
|
|
<h4><a id="uses_website" href="#uses_website">Website Activity Tracking</a></h4> |
|
|
|
The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. |
|
This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. |
|
These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or |
|
offline data warehousing systems for offline processing and reporting. |
|
<p> |
|
Activity tracking is often very high volume as many activity messages are generated for each user page view. |
|
|
|
<h4><a id="uses_metrics" href="#uses_metrics">Metrics</a></h4> |
|
|
|
Kafka is often used for operational monitoring data. |
|
This involves aggregating statistics from distributed applications to produce centralized feeds of operational data. |
|
|
|
<h4><a id="uses_logs" href="#uses_logs">Log Aggregation</a></h4> |
|
|
|
Many people use Kafka as a replacement for a log aggregation solution. |
|
Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. |
|
Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. |
|
This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. |
|
|
|
In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, |
|
and much lower end-to-end latency. |
|
|
|
<h4><a id="uses_streamprocessing" href="#uses_streamprocessing">Stream Processing</a></h4> |
|
|
|
Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then |
|
aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing. |
|
For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic; |
|
further processing might normalize or deduplicate this content and publish the cleansed article content to a new topic; |
|
a final processing stage might attempt to recommend this content to users. |
|
Such processing pipelines create graphs of real-time data flows based on the individual topics. |
|
Starting in 0.10.0.0, a light-weight but powerful stream processing library called <a href="/documentation/streams">Kafka Streams</a> |
|
is available in Apache Kafka to perform such data processing as described above. |
|
Apart from Kafka Streams, alternative open source stream processing tools include <a href="https://storm.apache.org/">Apache Storm</a> and |
|
<a href="http://samza.apache.org/">Apache Samza</a>. |
|
|
|
<h4><a id="uses_eventsourcing" href="#uses_eventsourcing">Event Sourcing</a></h4> |
|
|
|
<a href="http://martinfowler.com/eaaDev/EventSourcing.html">Event sourcing</a> is a style of application design where state changes are logged as a |
|
time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style. |
|
|
|
<h4><a id="uses_commitlog" href="#uses_commitlog">Commit Log</a></h4> |
|
|
|
Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing |
|
mechanism for failed nodes to restore their data. |
|
The <a href="/documentation.html#compaction">log compaction</a> feature in Kafka helps support this usage. |
|
In this usage Kafka is similar to <a href="https://bookkeeper.apache.org/">Apache BookKeeper</a> project.
|
|
|