Browse Source

MINOR: Fix indentation for several doc pages (#10766)

Fixes the indentation of the code listings for:

api.html
configuration.html
design.html
implementation.html
toc.html
These changes consist of whitespaces added or removed for consistency. It also contains a couple of fixes on unbalanced html tags.

Reviewers: Luke Chen <showuon@gmail.com>, Bill Bejeck <bbejeck@apache.org>
pull/10147/head
Josep Prat 3 years ago committed by GitHub
parent
commit
f5a94d913f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 50
      docs/api.html
  2. 28
      docs/configuration.html
  3. 29
      docs/design.html
  4. 176
      docs/implementation.html
  5. 34
      docs/toc.html

50
docs/api.html

@ -35,11 +35,11 @@ @@ -35,11 +35,11 @@
<p>
To use the producer, you can use the following maven dependency:
<pre class="line-numbers"><code class="language-xml"> &lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<pre class="line-numbers"><code class="language-xml">&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<h3 class="anchor-heading"><a id="consumerapi" class="anchor-link"></a><a href="#consumerapi">2.2 Consumer API</a></h3>
@ -49,11 +49,11 @@ @@ -49,11 +49,11 @@
<a href="/{{version}}/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html" title="Kafka {{dotVersion}} Javadoc">javadocs</a>.
<p>
To use the consumer, you can use the following maven dependency:
<pre class="line-numbers"><code class="language-xml"> &lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<pre class="line-numbers"><code class="language-xml">&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<h3 class="anchor-heading"><a id="streamsapi" class="anchor-link"></a><a href="#streamsapi">2.3 Streams API</a></h3>
@ -66,22 +66,22 @@ @@ -66,22 +66,22 @@
<p>
To use Kafka Streams you can use the following maven dependency:
<pre class="line-numbers"><code class="language-xml"> &lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-streams&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<pre class="line-numbers"><code class="language-xml">&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-streams&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<p>
When using Scala you may optionally include the <code>kafka-streams-scala</code> library. Additional documentation on using the Kafka Streams DSL for Scala is available <a href="/{{version}}/documentation/streams/developer-guide/dsl-api.html#scala-dsl">in the developer guide</a>.
<p>
To use Kafka Streams DSL for Scala for Scala {{scalaVersion}} you can use the following maven dependency:
<pre class="line-numbers"><code class="language-xml"> &lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-streams-scala_{{scalaVersion}}&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<pre class="line-numbers"><code class="language-xml">&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-streams-scala_{{scalaVersion}}&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<h3 class="anchor-heading"><a id="connectapi" class="anchor-link"></a><a href="#connectapi">2.4 Connect API</a></h3>
@ -97,11 +97,11 @@ @@ -97,11 +97,11 @@
The Admin API supports managing and inspecting topics, brokers, acls, and other Kafka objects.
<p>
To use the Admin API, add the following Maven dependency:
<pre class="line-numbers"><code class="language-xml"> &lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
<pre class="line-numbers"><code class="language-xml">&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;{{fullDotVersion}}&lt;/version&gt;
&lt;/dependency&gt;</code></pre>
For more information about the Admin APIs, see the <a href="/{{version}}/javadoc/index.html?org/apache/kafka/clients/admin/Admin.html" title="Kafka {{dotVersion}} Javadoc">javadoc</a>.
<p>

28
docs/configuration.html

@ -43,21 +43,21 @@ @@ -43,21 +43,21 @@
</ul>
To alter the current broker configs for broker id 0 (for example, the number of log cleaner threads):
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 0 --alter --add-config log.cleaner.threads=2</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 0 --alter --add-config log.cleaner.threads=2</code></pre>
To describe the current dynamic broker configs for broker id 0:
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 0 --describe</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 0 --describe</code></pre>
To delete a config override and revert to the statically configured or default value for broker id 0 (for example,
the number of log cleaner threads):
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 0 --alter --delete-config log.cleaner.threads</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-name 0 --alter --delete-config log.cleaner.threads</code></pre>
Some configs may be configured as a cluster-wide default to maintain consistent values across the whole cluster. All brokers
in the cluster will process the cluster default update. For example, to update log cleaner threads on all brokers:
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --alter --add-config log.cleaner.threads=2</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --alter --add-config log.cleaner.threads=2</code></pre>
To describe the currently configured dynamic cluster-wide default configs:
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --describe</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --describe</code></pre>
All configs that are configurable at cluster level may also be configured at per-broker level (e.g. for testing).
If a config value is defined at different levels, the following order of precedence is used:
@ -89,7 +89,7 @@ @@ -89,7 +89,7 @@
encoder configs will not be persisted in ZooKeeper. For example, to store SSL key password for listener <code>INTERNAL</code>
on broker 0:
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-configs.sh --zookeeper localhost:2182 --zk-tls-config-file zk_tls_config.properties --entity-type brokers --entity-name 0 --alter --add-config
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-configs.sh --zookeeper localhost:2182 --zk-tls-config-file zk_tls_config.properties --entity-type brokers --entity-name 0 --alter --add-config
'listener.name.internal.ssl.key.password=key-password,password.encoder.secret=secret,password.encoder.iterations=8192'</code></pre>
The configuration <code>listener.name.internal.ssl.key.password</code> will be persisted in ZooKeeper in encrypted
@ -162,7 +162,7 @@ @@ -162,7 +162,7 @@
In Kafka version 1.1.x, changes to <code>unclean.leader.election.enable</code> take effect only when a new controller is elected.
Controller re-election may be forced by running:
<pre class="line-numbers"><code class="language-bash"> &gt; bin/zookeeper-shell.sh localhost
<pre class="line-numbers"><code class="language-bash">&gt; bin/zookeeper-shell.sh localhost
rmr /controller</code></pre>
<h5>Updating Log Cleaner Configs</h5>
@ -220,18 +220,18 @@ @@ -220,18 +220,18 @@
<h3 class="anchor-heading"><a id="topicconfigs" class="anchor-link"></a><a href="#topicconfigs">3.2 Topic-Level Configs</a></h3>
Configurations pertinent to topics have both a server default as well an optional per-topic override. If no per-topic configuration is given the server default is used. The override can be set at topic creation time by giving one or more <code>--config</code> options. This example creates a topic named <i>my-topic</i> with a custom max message size and flush rate:
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic my-topic --partitions 1 \
--replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic my-topic --partitions 1 \
--replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1</code></pre>
Overrides can also be changed or set later using the alter configs command. This example updates the max message size for <i>my-topic</i>:
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic
--alter --add-config max.message.bytes=128000</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic
--alter --add-config max.message.bytes=128000</code></pre>
To check overrides set on the topic you can do
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --describe</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --describe</code></pre>
To remove an override you can do
<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic
--alter --delete-config max.message.bytes</code></pre>
<pre class="line-numbers"><code class="language-bash">&gt; bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic
--alter --delete-config max.message.bytes</code></pre>
The following are the topic-level configurations. The server's default configuration for this property is given under the Server Default Property heading. A given server default config value only applies to a topic if it does not have an explicit topic config override.

29
docs/design.html

@ -453,15 +453,15 @@ @@ -453,15 +453,15 @@
<p>
Let's discuss a concrete example of such a stream. Say we have a topic containing user email addresses; every time a user updates their email address we send a message to this topic using their user id as the
primary key. Now say we send the following messages over some time period for a user with id 123, each message corresponding to a change in email address (messages for other ids are omitted):
<pre class="line-numbers"><code class="language-text"> 123 => bill@microsoft.com
.
.
.
123 => bill@gatesfoundation.org
.
.
.
123 => bill@gmail.com</code></pre>
<pre class="line-numbers"><code class="language-text">123 => bill@microsoft.com
.
.
.
123 => bill@gatesfoundation.org
.
.
.
123 => bill@gmail.com</code></pre>
Log compaction gives us a more granular retention mechanism so that we are guaranteed to retain at least the last update for each primary key (e.g. <code>bill@gmail.com</code>). By doing this we guarantee that the
log contains a full snapshot of the final value for every key not just keys that changed recently. This means downstream consumers can restore their own state off this topic without us having to retain a complete
log of all changes.
@ -548,7 +548,7 @@ @@ -548,7 +548,7 @@
The log cleaner is enabled by default. This will start the pool of cleaner threads.
To enable log cleaning on a particular topic, add the log-specific property
<pre class="language-text"><code> log.cleanup.policy=compact</code></pre>
<pre class="language-text"><code>log.cleanup.policy=compact</code></pre>
The <code>log.cleanup.policy</code> property is a broker configuration setting defined
in the broker's <code>server.properties</code> file; it affects all of the topics
@ -556,13 +556,13 @@ @@ -556,13 +556,13 @@
<a href="/documentation.html#brokerconfigs">here</a>.
The log cleaner can be configured to retain a minimum amount of the uncompacted "head" of the log. This is enabled by setting the compaction time lag.
<pre class="language-text"><code> log.cleaner.min.compaction.lag.ms</code></pre>
<pre class="language-text"><code>log.cleaner.min.compaction.lag.ms</code></pre>
This can be used to prevent messages newer than a minimum message age from being subject to compaction. If not set, all log segments are eligible for compaction except for the last segment, i.e. the one currently
being written to. The active segment will not be compacted even if all of its messages are older than the minimum compaction time lag.
The log cleaner can be configured to ensure a maximum delay after which the uncompacted "head" of the log becomes eligible for log compaction.
<pre class="language-text"><code> log.cleaner.max.compaction.lag.ms</code></pre>
<pre class="language-text"><code>log.cleaner.max.compaction.lag.ms</code></pre>
This can be used to prevent log with low produce rate from remaining ineligible for compaction for an unbounded duration. If not set, logs that do not exceed min.cleanable.dirty.ratio are not compacted.
Note that this compaction deadline is not a hard guarantee since it is still subjected to the availability of log cleaner threads and the actual compaction time.
@ -575,11 +575,11 @@ @@ -575,11 +575,11 @@
<p>
Kafka cluster has the ability to enforce quotas on requests to control the broker resources used by clients. Two types
of client quotas can be enforced by Kafka brokers for each group of clients sharing a quota:
</p>
<ol>
<li>Network bandwidth quotas define byte-rate thresholds (since 0.9)</li>
<li>Request rate quotas define CPU utilization thresholds as a percentage of network and I/O threads (since 0.11)</li>
</ol>
</p>
<h4 class="anchor-heading">
<a class="anchor-link" id="design_quotasnecessary" href="#design_quotasnecessary"></a>
@ -610,6 +610,7 @@ @@ -610,6 +610,7 @@
</p>
<p>
The order of precedence for quota configuration is:
</p>
<ol>
<li>/config/users/&lt;user&gt;/clients/&lt;client-id&gt;</li>
<li>/config/users/&lt;user&gt;/clients/&lt;default&gt;</li>
@ -620,7 +621,7 @@ @@ -620,7 +621,7 @@
<li>/config/clients/&lt;client-id&gt;</li>
<li>/config/clients/&lt;default&gt;</li>
</ol>
<p>
Broker properties (quota.producer.default, quota.consumer.default) can also be used to set defaults of network bandwidth quotas for client-id groups. These properties are being deprecated and will be removed in a later release.
Default quotas for client-id can be set in Zookeeper similar to the other quota overrides and defaults.
</p>

176
docs/implementation.html

@ -32,29 +32,29 @@ @@ -32,29 +32,29 @@
<h4 class="anchor-heading"><a id="recordbatch" class="anchor-link"></a><a href="#recordbatch">5.3.1 Record Batch</a></h4>
<p> The following is the on-disk format of a RecordBatch. </p>
<p><pre class="line-numbers"><code class="language-java"> baseOffset: int64
batchLength: int32
partitionLeaderEpoch: int32
magic: int8 (current magic value is 2)
crc: int32
attributes: int16
bit 0~2:
0: no compression
1: gzip
2: snappy
3: lz4
4: zstd
bit 3: timestampType
bit 4: isTransactional (0 means not transactional)
bit 5: isControlBatch (0 means not a control batch)
bit 6~15: unused
lastOffsetDelta: int32
firstTimestamp: int64
maxTimestamp: int64
producerId: int64
producerEpoch: int16
baseSequence: int32
records: [Record]</code></pre></p>
<pre class="line-numbers"><code class="language-text">baseOffset: int64
batchLength: int32
partitionLeaderEpoch: int32
magic: int8 (current magic value is 2)
crc: int32
attributes: int16
bit 0~2:
0: no compression
1: gzip
2: snappy
3: lz4
4: zstd
bit 3: timestampType
bit 4: isTransactional (0 means not transactional)
bit 5: isControlBatch (0 means not a control batch)
bit 6~15: unused
lastOffsetDelta: int32
firstTimestamp: int64
maxTimestamp: int64
producerId: int64
producerEpoch: int16
baseSequence: int32
records: [Record]</code></pre>
<p> Note that when compression is enabled, the compressed record data is serialized directly following the count of the number of records. </p>
<p>The CRC covers the data from the attributes to the end of the batch (i.e. all the bytes that follow the CRC). It is located after the magic byte, which
@ -71,27 +71,27 @@ @@ -71,27 +71,27 @@
<h5 class="anchor-heading"><a id="controlbatch" class="anchor-link"></a><a href="#controlbatch">5.3.1.1 Control Batches</a></h5>
<p>A control batch contains a single record called the control record. Control records should not be passed on to applications. Instead, they are used by consumers to filter out aborted transactional messages.</p>
<p> The key of a control record conforms to the following schema: </p>
<p><pre class="line-numbers"><code class="language-java"> version: int16 (current version is 0)
type: int16 (0 indicates an abort marker, 1 indicates a commit)</code></pre></p>
<pre class="line-numbers"><code class="language-text">version: int16 (current version is 0)
type: int16 (0 indicates an abort marker, 1 indicates a commit)</code></pre>
<p>The schema for the value of a control record is dependent on the type. The value is opaque to clients.</p>
<h4 class="anchor-heading"><a id="record" class="anchor-link"></a><a href="#record">5.3.2 Record</a></h4>
<p>Record level headers were introduced in Kafka 0.11.0. The on-disk format of a record with Headers is delineated below. </p>
<p><pre class="line-numbers"><code class="language-java"> length: varint
attributes: int8
bit 0~7: unused
timestampDelta: varlong
offsetDelta: varint
keyLength: varint
key: byte[]
valueLen: varint
value: byte[]
Headers => [Header]</code></pre></p>
<pre class="line-numbers"><code class="language-text">length: varint
attributes: int8
bit 0~7: unused
timestampDelta: varlong
offsetDelta: varint
keyLength: varint
key: byte[]
valueLen: varint
value: byte[]
Headers => [Header]</code></pre>
<h5 class="anchor-heading"><a id="recordheader" class="anchor-link"></a><a href="#recordheader">5.3.2.1 Record Header</a></h5>
<p><pre class="line-numbers"><code class="language-java"> headerKeyLength: varint
headerKey: String
headerValueLength: varint
Value: byte[]</code></pre></p>
<pre class="line-numbers"><code class="language-text">headerKeyLength: varint
headerKey: String
headerValueLength: varint
Value: byte[]</code></pre>
<p>We use the same varint encoding as Protobuf. More information on the latter can be found <a href="https://developers.google.com/protocol-buffers/docs/encoding#varints">here</a>. The count of headers in a record
is also encoded as a varint.</p>
@ -102,41 +102,42 @@ @@ -102,41 +102,42 @@
</p>
<b>Message Set:</b><br>
<p><pre class="line-numbers"><code class="language-java"> MessageSet (Version: 0) => [offset message_size message]
offset => INT64
message_size => INT32
message => crc magic_byte attributes key value
crc => INT32
magic_byte => INT8
attributes => INT8
bit 0~2:
0: no compression
1: gzip
2: snappy
bit 3~7: unused
key => BYTES
value => BYTES</code></pre></p>
<p><pre class="line-numbers"><code class="language-java"> MessageSet (Version: 1) => [offset message_size message]
offset => INT64
message_size => INT32
message => crc magic_byte attributes timestamp key value
crc => INT32
magic_byte => INT8
attributes => INT8
bit 0~2:
0: no compression
1: gzip
2: snappy
3: lz4
bit 3: timestampType
0: create time
1: log append time
bit 4~7: unused
timestamp => INT64
key => BYTES
value => BYTES</code></pre></p>
<pre class="line-numbers"><code class="language-text">MessageSet (Version: 0) => [offset message_size message]
offset => INT64
message_size => INT32
message => crc magic_byte attributes key value
crc => INT32
magic_byte => INT8
attributes => INT8
bit 0~2:
0: no compression
1: gzip
2: snappy
bit 3~7: unused
key => BYTES
value => BYTES</code></pre>
<pre class="line-numbers"><code class="language-text">MessageSet (Version: 1) => [offset message_size message]
offset => INT64
message_size => INT32
message => crc magic_byte attributes timestamp key value
crc => INT32
magic_byte => INT8
attributes => INT8
bit 0~2:
0: no compression
1: gzip
2: snappy
3: lz4
bit 3: timestampType
0: create time
1: log append time
bit 4~7: unused
timestamp => INT64
key => BYTES
value => BYTES</code></pre>
<p>
In versions prior to Kafka 0.10, the only supported message format version (which is indicated in the magic value) was 0. Message format version 1 was introduced with timestamp support in version 0.10.
</p>
<ul>
<li>Similarly to version 2 above, the lowest bits of attributes represent the compression type.</li>
<li>In version 1, the producer should always set the timestamp type bit to 0. If the topic is configured to use log append time,
@ -144,7 +145,6 @@ @@ -144,7 +145,6 @@
the broker will overwrite the timestamp type and the timestamp in the message set.</li>
<li>The highest bits of attributes must be set to 0.</li>
</ul>
</p>
<p>In message format versions 0 and 1 Kafka supports recursive messages to enable compression. In this case the message's attributes must be set
to indicate one of the compression types and the value field will contain a message set compressed with that type. We often refer
to the nested messages as "inner messages" and the wrapping message as the "outer message." Note that the key should be null
@ -163,7 +163,7 @@ @@ -163,7 +163,7 @@
A log for a topic named "my_topic" with two partitions consists of two directories (namely <code>my_topic_0</code> and <code>my_topic_1</code>) populated with data files containing the messages for that topic. The format of the log files is a sequence of "log entries""; each log entry is a 4 byte integer <i>N</i> storing the message length which is followed by the <i>N</i> message bytes. Each message is uniquely identified by a 64-bit integer <i>offset</i> giving the byte position of the start of this message in the stream of all messages ever sent to that topic on that partition. The on-disk format of each message is given below. Each log file is named with the offset of the first message it contains. So the first file created will be 00000000000.kafka, and each additional file will have an integer name roughly <i>S</i> bytes from the previous file where <i>S</i> is the max log file size given in the configuration.
</p>
<p>
The exact binary format for records is versioned and maintained as a standard interface so record batches can be transferred between producer, broker, and client without recopying or conversion when desirable. The previous section included details about the on-disk format of records.</p>
The exact binary format for records is versioned and maintained as a standard interface so record batches can be transferred between producer, broker, and client without recopying or conversion when desirable. The previous section included details about the on-disk format of records.
</p>
<p>
The use of the message offset as the message id is unusual. Our original idea was to use a GUID generated by the producer, and maintain a mapping from GUID to offset on each broker. But since a consumer must maintain an ID for each server, the global uniqueness of the GUID provides no value. Furthermore, the complexity of maintaining the mapping from a random id to an offset requires a heavy weight index structure which must be synchronized with disk, essentially requiring a full persistent random-access data structure. Thus to simplify the lookup structure we decided to use a simple per-partition atomic counter which could be coupled with the partition id and node id to uniquely identify a message; this makes the lookup structure simpler, though multiple seeks per consumer request are still likely. However once we settled on a counter, the jump to directly using the offset seemed natural&mdash;both after all are monotonically increasing integers unique to a partition. Since the offset is hidden from the consumer API this decision is ultimately an implementation detail and we went with the more efficient approach.
@ -186,21 +186,21 @@ @@ -186,21 +186,21 @@
<p> The following is the format of the results sent to the consumer.
<pre class="line-numbers"><code class="language-text"> MessageSetSend (fetch result)
<pre class="line-numbers"><code class="language-text">MessageSetSend (fetch result)
total length : 4 bytes
error code : 2 bytes
message 1 : x bytes
...
message n : x bytes</code></pre>
total length : 4 bytes
error code : 2 bytes
message 1 : x bytes
...
message n : x bytes</code></pre>
<pre class="line-numbers"><code class="language-text"> MultiMessageSetSend (multiFetch result)
<pre class="line-numbers"><code class="language-text">MultiMessageSetSend (multiFetch result)
total length : 4 bytes
error code : 2 bytes
messageSetSend 1
...
messageSetSend n</code></pre>
total length : 4 bytes
error code : 2 bytes
messageSetSend 1
...
messageSetSend n</code></pre>
<h4 class="anchor-heading"><a id="impl_deletes" class="anchor-link"></a><a href="#impl_deletes">Deletes</a></h4>
<p>
Data is deleted one log segment at a time. The log manager applies two metrics to identify segments which are
@ -260,7 +260,7 @@ @@ -260,7 +260,7 @@
</p>
<h4 class="anchor-heading"><a id="impl_zkbroker" class="anchor-link"></a><a href="#impl_zkbroker">Broker Node Registry</a></h4>
<pre class="line-numbers"><code class="language-json"> /brokers/ids/[0...N] --> {"jmx_port":...,"timestamp":...,"endpoints":[...],"host":...,"version":...,"port":...} (ephemeral node)</code></pre>
<pre class="line-numbers"><code class="language-json">/brokers/ids/[0...N] --> {"jmx_port":...,"timestamp":...,"endpoints":[...],"host":...,"version":...,"port":...} (ephemeral node)</code></pre>
<p>
This is a list of all present broker nodes, each of which provides a unique logical broker id which identifies it to consumers (which must be given as part of its configuration). On startup, a broker node registers itself by creating a znode with the logical broker id under /brokers/ids. The purpose of the logical broker id is to allow a broker to be moved to a different physical machine without affecting consumers. An attempt to register a broker id that is already in use (say because two servers are configured with the same broker id) results in an error.
</p>
@ -268,7 +268,7 @@ @@ -268,7 +268,7 @@
Since the broker registers itself in ZooKeeper using ephemeral znodes, this registration is dynamic and will disappear if the broker is shutdown or dies (thus notifying consumers it is no longer available).
</p>
<h4 class="anchor-heading"><a id="impl_zktopic" class="anchor-link"></a><a href="#impl_zktopic">Broker Topic Registry</a></h4>
<pre class="line-numbers"><code class="language-json"> /brokers/topics/[topic]/partitions/[0...N]/state --> {"controller_epoch":...,"leader":...,"version":...,"leader_epoch":...,"isr":[...]} (ephemeral node)</code></pre>
<pre class="line-numbers"><code class="language-json">/brokers/topics/[topic]/partitions/[0...N]/state --> {"controller_epoch":...,"leader":...,"version":...,"leader_epoch":...,"isr":[...]} (ephemeral node)</code></pre>
<p>
Each broker registers itself under the topics it maintains and stores the number of partitions for that topic.

34
docs/toc.html

@ -86,26 +86,26 @@ @@ -86,26 +86,26 @@
<li><a href="#basic_ops_increase_replication_factor">Increasing replication factor</a>
</ul>
<li><a href="#datacenters">6.2 Datacenters</a>
<li><a href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data Mirroring)</a></li>
<li><a href="#georeplication">6.3 Geo-Replication (Cross-Cluster Data Mirroring)</a></li>
<ul>
<li><a href="#georeplication-overview">Geo-Replication Overview</a></li>
<li><a href="#georeplication-flows">What Are Replication Flows</a></li>
<li><a href="#georeplication-mirrormaker">Configuring Geo-Replication</a></li>
<li><a href="#georeplication-starting">Starting Geo-Replication</a></li>
<li><a href="#georeplication-stopping">Stopping Geo-Replication</a></li>
<li><a href="#georeplication-apply-config-changes">Applying Configuration Changes</a></li>
<li><a href="#georeplication-monitoring">Monitoring Geo-Replication</a></li>
<li><a href="#georeplication-overview">Geo-Replication Overview</a></li>
<li><a href="#georeplication-flows">What Are Replication Flows</a></li>
<li><a href="#georeplication-mirrormaker">Configuring Geo-Replication</a></li>
<li><a href="#georeplication-starting">Starting Geo-Replication</a></li>
<li><a href="#georeplication-stopping">Stopping Geo-Replication</a></li>
<li><a href="#georeplication-apply-config-changes">Applying Configuration Changes</a></li>
<li><a href="#georeplication-monitoring">Monitoring Geo-Replication</a></li>
</ul>
<li><a href="#multitenancy">6.4 Multi-Tenancy</a></li>
<li><a href="#multitenancy">6.4 Multi-Tenancy</a></li>
<ul>
<li><a href="#multitenancy-overview">Multi-Tenancy Overview</a></li>
<li><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces)</a></li>
<li><a href="#multitenancy-topic-configs">Configuring Topics</a></li>
<li><a href="#multitenancy-security">Securing Clusters and Topics</a></li>
<li><a href="#multitenancy-isolation">Isolating Tenants</a></li>
<li><a href="#multitenancy-monitoring">Monitoring and Metering</a></li>
<li><a href="#multitenancy-georeplication">Multi-Tenancy and Geo-Replication</a></li>
<li><a href="#multitenancy-more">Further considerations</a></li>
<li><a href="#multitenancy-overview">Multi-Tenancy Overview</a></li>
<li><a href="#multitenancy-topic-naming">Creating User Spaces (Namespaces)</a></li>
<li><a href="#multitenancy-topic-configs">Configuring Topics</a></li>
<li><a href="#multitenancy-security">Securing Clusters and Topics</a></li>
<li><a href="#multitenancy-isolation">Isolating Tenants</a></li>
<li><a href="#multitenancy-monitoring">Monitoring and Metering</a></li>
<li><a href="#multitenancy-georeplication">Multi-Tenancy and Geo-Replication</a></li>
<li><a href="#multitenancy-more">Further considerations</a></li>
</ul>
<li><a href="#config">6.5 Important Configs</a>
<ul>

Loading…
Cancel
Save