KAFKA-5290; Docs need clarification on meaning of 'committed' to the log
based on conversations with vahidhashemian rajinisivaram apurvam
The docs didn't make clear that what gets committed and what gets not may depend on the producer acks.
Author: Edoardo Comar <ecomar@uk.ibm.com>
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Apurva Mehta <apurva@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#3035 from edoardocomar/DOC-clarification-on-committed
pull/3345/merge
Edoardo Comar8 years agocommitted byJason Gustafson
can fail, cases where there are multiple consumer processes, or cases where data written to disk can be lost).
<p>
Kafka's semantics are straight-forward. When publishing a message we have a notion of the message being "committed" to the log. Once a published message is committed it will not be lost as long as one broker that
replicates the partition to which this message was written remains "alive". The definition of alive as well as a description of which types of failures we attempt to handle will be described in more detail in the
next section. For now let's assume a perfect, lossless broker and try to understand the guarantees to the producer and consumer. If a producer attempts to publish a message and experiences a network error it cannot
be sure if this error happened before or after the message was committed. This is similar to the semantics of inserting into a database table with an autogenerated key.
replicates the partition to which this message was written remains "alive". The definition of committed message, alive partition as well as a description of which types of failures we attempt to handle will be
described in more detail in the next section. For now let's assume a perfect, lossless broker and try to understand the guarantees to the producer and consumer. If a producer attempts to publish a message and
experiences a network error it cannot be sure if this error happened before or after the message was committed. This is similar to the semantics of inserting into a database table with an autogenerated key.
<p>
These are not the strongest possible semantics for publishers. Although we cannot be sure of what happened in the case of a network error, it is possible to allow the producer to generate a sort of "primary key" that
makes retrying the produce request idempotent. This feature is not trivial for a replicated system because of course it must work even (or especially) in the case of a server failure. With this feature it would
@ -298,9 +298,13 @@
@@ -298,9 +298,13 @@
In distributed systems terminology we only attempt to handle a "fail/recover" model of failures where nodes suddenly cease working and then later recover (perhaps without knowing that they have died). Kafka does not
handle so-called "Byzantine" failures in which nodes produce arbitrary or malicious responses (perhaps due to bugs or foul play).
<p>
A message is considered "committed" when all in sync replicas for that partition have applied it to their log. Only committed messages are ever given out to the consumer. This means that the consumer need not worry
about potentially seeing a message that could be lost if the leader fails. Producers, on the other hand, have the option of either waiting for the message to be committed or not, depending on their preference for
tradeoff between latency and durability. This preference is controlled by the acks setting that the producer uses.
We can now more precisely define that a message is considered committed when all in sync replicas for that partition have applied it to their log.
Only committed messages are ever given out to the consumer. This means that the consumer need not worry about potentially seeing a message that could be lost if the leader fails. Producers, on the other hand,
have the option of either waiting for the message to be committed or not, depending on their preference for tradeoff between latency and durability. This preference is controlled by the acks setting that the
producer uses.
Note that topics have a setting for the "minimum number" of in-sync replicas that is checked when the producer requests acknowledgment that a message
has been written to the full set of in-sync replicas. If a less stringent acknowledgement is requested by the producer, then the message can be committed, and consumed,
even if the number of in-sync replicas is lower than the minimum (e.g. it can be as low as just the leader).
<p>
The guarantee that Kafka offers is that a committed message will not be lost, as long as there is at least one in sync replica alive, at all times.