src-kafka/raft/README.md

Kafka Raft
=================
Kafka Raft is a sub module of Apache Kafka which features a tailored version of
[Raft Consensus Protocol](https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf).
<p>

Eventually this module will be integrated into the Kafka server. For now,
we have a standalone test server which can be used for performance testing.
Below we describe the details to set this up.

### Run Single Quorum ###
    bin/test-raft-server-start.sh config/raft.properties

### Run Multi Node Quorum ###
Create 3 separate raft quorum properties as the following
(note that the `zookeeper.connect` config is required, but unused):

`cat << EOF >> config/raft-quorum-1.properties`
    
    broker.id=1
    listeners=PLAINTEXT://localhost:9092
    quorum.voters=1@localhost:9092,2@localhost:9093,3@localhost:9094
    log.dirs=/tmp/raft-logs-1
    
    zookeeper.connect=localhost:2181
    EOF

`cat << EOF >> config/raft-quorum-2.properties`
    
    broker.id=2
    listeners=PLAINTEXT://localhost:9093
    quorum.voters=1@localhost:9092,2@localhost:9093,3@localhost:9094
    log.dirs=/tmp/raft-logs-2
    
    zookeeper.connect=localhost:2181
    EOF
    
`cat << EOF >> config/raft-quorum-3.properties`
    
    broker.id=3
    listeners=PLAINTEXT://localhost:9094
    quorum.voters=1@localhost:9092,2@localhost:9093,3@localhost:9094
    log.dirs=/tmp/raft-logs-3
    
    zookeeper.connect=localhost:2181
    EOF
 
Open up 3 separate terminals, and run individual commands:

    bin/test-raft-server-start.sh --config config/raft-quorum-1.properties
    bin/test-raft-server-start.sh --config config/raft-quorum-2.properties
    bin/test-raft-server-start.sh --config config/raft-quorum-3.properties

Once a leader is elected, it will begin writing to an internal
`__cluster_metadata` topic with a steady workload of random data.
You can control the workload using the `--throughput` and `--record-size`
arguments passed to `test-raft-server-start.sh`.
KAFKA-10492; Core Kafka Raft Implementation (KIP-595) (#9130) This is the core Raft implementation specified by KIP-595: https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum. We have created a separate "raft" module where most of the logic resides. The new APIs introduced in this patch in order to support Raft election and such are disabled in the server until the integration with the controller is complete. Until then, there is a standalone server which can be used for testing the performance of the Raft implementation. See `raft/README.md` for details. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Boyang Chen <boyang@confluent.io> Co-authored-by: Boyang Chen <boyang@confluent.io> Co-authored-by: Guozhang Wang <wangguoz@gmail.com> 4 years ago			`Kafka Raft`
			`=================`
			`Kafka Raft is a sub module of Apache Kafka which features a tailored version of`
			`[Raft Consensus Protocol](https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf).`
			`<p>`

			`Eventually this module will be integrated into the Kafka server. For now,`
			`we have a standalone test server which can be used for performance testing.`
			`Below we describe the details to set this up.`

			`### Run Single Quorum ###`
			`bin/test-raft-server-start.sh config/raft.properties`

			`### Run Multi Node Quorum ###`
KAFKA-10601; Add support for append linger to Raft implementation (#9418) The patch adds `quorum.append.linger.ms` behavior to the raft implementation. This gives users a powerful knob to tune the impact of fsync. When an append is accepted from the state machine, it is held in an accumulator (similar to the producer) until the configured linger time is exceeded. This allows the implementation to amortize fsync overhead at the expense of some write latency. The patch also improves our methodology for testing performance. Up to now, we have relied on the producer performance test, but it is difficult to simulate expected controller loads because producer performance is limited by other factors such as the number of producer clients and head-of-line blocking. Instead, this patch adds a workload generator which runs on the leader after election. Finally, this patch brings us nearer to the write semantics expected by the KIP-500 controller. It makes the following changes: - Introduce `RecordSerde<T>` interface which abstracts the underlying log implementation from `RaftClient`. The generic type is carried over to `RaftClient<T>` and is exposed through the read/write APIs. - `RaftClient.append` is changed to `RaftClient.scheduleAppend` and returns the last offset of the expected log append. - `RaftClient.scheduleAppend` accepts a list of records and ensures that the full set are included in a single batch. - Introduce `RaftClient.Listener` with a single `handleCommit` API which will eventually replace `RaftClient.read` in order to surface committed data to the controller state machine. Currently `handleCommit` is only used for records appended by the leader. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com> 4 years ago			`Create 3 separate raft quorum properties as the following`
			(note that the `zookeeper.connect` config is required, but unused):
KAFKA-10492; Core Kafka Raft Implementation (KIP-595) (#9130) This is the core Raft implementation specified by KIP-595: https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum. We have created a separate "raft" module where most of the logic resides. The new APIs introduced in this patch in order to support Raft election and such are disabled in the server until the integration with the controller is complete. Until then, there is a standalone server which can be used for testing the performance of the Raft implementation. See `raft/README.md` for details. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Boyang Chen <boyang@confluent.io> Co-authored-by: Boyang Chen <boyang@confluent.io> Co-authored-by: Guozhang Wang <wangguoz@gmail.com> 4 years ago
			`cat << EOF >> config/raft-quorum-1.properties`

			`broker.id=1`
			`listeners=PLAINTEXT://localhost:9092`
MINOR: Update raft/README.md and minor RaftConfig tweaks (#9484) * Replace quorum.bootstrap.servers and quorum.bootstrap.voters with quorum.voters. * Remove seemingly unused `verbose` config. * Use constant to avoid unnecessary repeated concatenation. Reviewers: Jason Gustafson <jason@confluent.io> 4 years ago			`quorum.voters=1@localhost:9092,2@localhost:9093,3@localhost:9094`
KAFKA-10492; Core Kafka Raft Implementation (KIP-595) (#9130) This is the core Raft implementation specified by KIP-595: https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum. We have created a separate "raft" module where most of the logic resides. The new APIs introduced in this patch in order to support Raft election and such are disabled in the server until the integration with the controller is complete. Until then, there is a standalone server which can be used for testing the performance of the Raft implementation. See `raft/README.md` for details. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Boyang Chen <boyang@confluent.io> Co-authored-by: Boyang Chen <boyang@confluent.io> Co-authored-by: Guozhang Wang <wangguoz@gmail.com> 4 years ago			`log.dirs=/tmp/raft-logs-1`

			`zookeeper.connect=localhost:2181`
			`EOF`

			`cat << EOF >> config/raft-quorum-2.properties`

			`broker.id=2`
			`listeners=PLAINTEXT://localhost:9093`
MINOR: Update raft/README.md and minor RaftConfig tweaks (#9484) * Replace quorum.bootstrap.servers and quorum.bootstrap.voters with quorum.voters. * Remove seemingly unused `verbose` config. * Use constant to avoid unnecessary repeated concatenation. Reviewers: Jason Gustafson <jason@confluent.io> 4 years ago			`quorum.voters=1@localhost:9092,2@localhost:9093,3@localhost:9094`
KAFKA-10492; Core Kafka Raft Implementation (KIP-595) (#9130) This is the core Raft implementation specified by KIP-595: https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum. We have created a separate "raft" module where most of the logic resides. The new APIs introduced in this patch in order to support Raft election and such are disabled in the server until the integration with the controller is complete. Until then, there is a standalone server which can be used for testing the performance of the Raft implementation. See `raft/README.md` for details. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Boyang Chen <boyang@confluent.io> Co-authored-by: Boyang Chen <boyang@confluent.io> Co-authored-by: Guozhang Wang <wangguoz@gmail.com> 4 years ago			`log.dirs=/tmp/raft-logs-2`

			`zookeeper.connect=localhost:2181`
			`EOF`

			`cat << EOF >> config/raft-quorum-3.properties`

			`broker.id=3`
			`listeners=PLAINTEXT://localhost:9094`
MINOR: Update raft/README.md and minor RaftConfig tweaks (#9484) * Replace quorum.bootstrap.servers and quorum.bootstrap.voters with quorum.voters. * Remove seemingly unused `verbose` config. * Use constant to avoid unnecessary repeated concatenation. Reviewers: Jason Gustafson <jason@confluent.io> 4 years ago			`quorum.voters=1@localhost:9092,2@localhost:9093,3@localhost:9094`
KAFKA-10492; Core Kafka Raft Implementation (KIP-595) (#9130) This is the core Raft implementation specified by KIP-595: https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum. We have created a separate "raft" module where most of the logic resides. The new APIs introduced in this patch in order to support Raft election and such are disabled in the server until the integration with the controller is complete. Until then, there is a standalone server which can be used for testing the performance of the Raft implementation. See `raft/README.md` for details. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Boyang Chen <boyang@confluent.io> Co-authored-by: Boyang Chen <boyang@confluent.io> Co-authored-by: Guozhang Wang <wangguoz@gmail.com> 4 years ago			`log.dirs=/tmp/raft-logs-3`

			`zookeeper.connect=localhost:2181`
			`EOF`

			`Open up 3 separate terminals, and run individual commands:`

KAFKA-10601; Add support for append linger to Raft implementation (#9418) The patch adds `quorum.append.linger.ms` behavior to the raft implementation. This gives users a powerful knob to tune the impact of fsync. When an append is accepted from the state machine, it is held in an accumulator (similar to the producer) until the configured linger time is exceeded. This allows the implementation to amortize fsync overhead at the expense of some write latency. The patch also improves our methodology for testing performance. Up to now, we have relied on the producer performance test, but it is difficult to simulate expected controller loads because producer performance is limited by other factors such as the number of producer clients and head-of-line blocking. Instead, this patch adds a workload generator which runs on the leader after election. Finally, this patch brings us nearer to the write semantics expected by the KIP-500 controller. It makes the following changes: - Introduce `RecordSerde<T>` interface which abstracts the underlying log implementation from `RaftClient`. The generic type is carried over to `RaftClient<T>` and is exposed through the read/write APIs. - `RaftClient.append` is changed to `RaftClient.scheduleAppend` and returns the last offset of the expected log append. - `RaftClient.scheduleAppend` accepts a list of records and ensures that the full set are included in a single batch. - Introduce `RaftClient.Listener` with a single `handleCommit` API which will eventually replace `RaftClient.read` in order to surface committed data to the controller state machine. Currently `handleCommit` is only used for records appended by the leader. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com> 4 years ago			`bin/test-raft-server-start.sh --config config/raft-quorum-1.properties`
			`bin/test-raft-server-start.sh --config config/raft-quorum-2.properties`
			`bin/test-raft-server-start.sh --config config/raft-quorum-3.properties`
MINOR: Update raft/README.md and minor RaftConfig tweaks (#9484) * Replace quorum.bootstrap.servers and quorum.bootstrap.voters with quorum.voters. * Remove seemingly unused `verbose` config. * Use constant to avoid unnecessary repeated concatenation. Reviewers: Jason Gustafson <jason@confluent.io> 4 years ago
KAFKA-10601; Add support for append linger to Raft implementation (#9418) The patch adds `quorum.append.linger.ms` behavior to the raft implementation. This gives users a powerful knob to tune the impact of fsync. When an append is accepted from the state machine, it is held in an accumulator (similar to the producer) until the configured linger time is exceeded. This allows the implementation to amortize fsync overhead at the expense of some write latency. The patch also improves our methodology for testing performance. Up to now, we have relied on the producer performance test, but it is difficult to simulate expected controller loads because producer performance is limited by other factors such as the number of producer clients and head-of-line blocking. Instead, this patch adds a workload generator which runs on the leader after election. Finally, this patch brings us nearer to the write semantics expected by the KIP-500 controller. It makes the following changes: - Introduce `RecordSerde<T>` interface which abstracts the underlying log implementation from `RaftClient`. The generic type is carried over to `RaftClient<T>` and is exposed through the read/write APIs. - `RaftClient.append` is changed to `RaftClient.scheduleAppend` and returns the last offset of the expected log append. - `RaftClient.scheduleAppend` accepts a list of records and ensures that the full set are included in a single batch. - Introduce `RaftClient.Listener` with a single `handleCommit` API which will eventually replace `RaftClient.read` in order to surface committed data to the controller state machine. Currently `handleCommit` is only used for records appended by the leader. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com> 4 years ago			`Once a leader is elected, it will begin writing to an internal`
			`__cluster_metadata` topic with a steady workload of random data.
			You can control the workload using the `--throughput` and `--record-size`
			arguments passed to `test-raft-server-start.sh`.