Include group.instance.id in the describe group result for better visibility.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The purpose here is to leverage static membership information during round robin consumer assignment, because persistent member id could help make the assignment remain the same during rebalance.
The comparison logic is changed to:
1. If member A and member B both have group.instance.id, then compare their group.instance.id
2. If member A has group.instance.id, while member B doesn't, then A < B
3. If both member A and B don't have group.instance.id, compare their member.id
In round robin assignor, we use ephemeral member.id to sort the members in order for assignment. This semantic is not stable and could trigger unnecessary shuffle of tasks. By leveraging group.instance.id the static member assignment shall be persist when satisfying following conditions:
1. number of members remain the same across generation
2. static members' identities persist across generation
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This PR intendent to address some typos in https://kafka.apache.org/documentation/streams/developer-guide/processor-api.html page.
Invalid configuration option specified in the example. I've replaced with closest constant TopicConfig.MIN_IN_SYNC_REPLICAS_CONFIG, since LogConfig.MinInSyncReplicasProp() requires Scala stuff
Reference to LogConfig seems to be obsolete, I believe I've moved it to correct line
Apostrophe displayed incorrectly
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
We recently observed a transient failure of this test case:
```
java.lang.AssertionError: Contents of the map shouldn't change expected:<Map(0 -> (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 9 -> (349,349), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 3 -> (343,343), 18 -> (358,358), 16 -> (356,356), 11 -> (351,351), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))> but was:<Map(0 -> (340,340), 5 -> (345,345), 10 -> (350,350), 14 -> (354,354), 1 -> (341,341), 6 -> (346,346), 97 -> (297,297), 9 -> (349,349), 96 -> (296,296), 13 -> (353,353), 2 -> (342,342), 17 -> (357,357), 12 -> (352,352), 7 -> (347,347), 98 -> (298,298), 3 -> (343,343), 18 -> (358,358), 95 -> (295,295), 16 -> (356,356), 11 -> (351,351), 99 -> (299,299), 8 -> (348,348), 19 -> (359,359), 4 -> (344,344), 15 -> (355,355))>
```
The presence of old keys implies that not all old segments had been deleted. We believe the retention check (which runs asynchronously) is executing after the last modified time of all but one of the segments has been changed. This leaves one old segment behind which ultimately causes the above assertion error.
The fix here is to improve the wait condition.Rather than waiting for the number of segments to be 1, we wait for the log start offset to reach the end offset.
Reviewers: David Arthur <mumrah@gmail.com>
1. In ConsumerCoordinator, select the protocol as the common protocol from all configured assignor instances' supported protocols with the highest number.
1.b. In onJoinPrepare: only call onPartitionRevoked with EAGER.
1.a. In onJoinComplete: call onPartitionAssigned with EAGER; call onPartitionRevoked following onPartitionAssigned with COOPERATIVE, and then request re-join if the error indicates so.
1.c. In performAssignment: update the user's assignor returned assignments by excluding all partitions that are still owned by some other members.
2. I've refactored the Subscription / Assignment such that: assigned partitions, error codes, and group instance id are not-final anymore, instead they can be updated. For the last one, it is directly related to the logic of this PR but I felt it is more convienent to go with other fields.
3. Testing: primarily in ConsumerCoordinatorTest, make it parameterized with protocol, and add necessary scenarios for COOPERATIVE protocol.
I intentionally omitted the documentation change since there are some behavioral updates that needs to be finalized in later PRs, and hence I will also only add the docs in later PRs.
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Boyang Chen <boyang@confluent.io>, Sophie Blee-Goldman <sophie@confluent.io>
`ZkUtils` is not used by the broker, has been deprecated since
2.0.0 and it was never intended as a public API. We should
remove it along with `AdminUtils` methods that rely on it.
Reviewers: David Arthur <mumrah@gmail.com>
- include Scala 2.13 in gradle build
- handle future milestone and RC versions of Scala in a better way
- if no Scala version is specified, default to scala 2.12 (bump from 2.11)
- include certain Xlint options (removed by Scala 2.13) for Scala 2.11/2.12 build only
- upgrade versions for dependencies:
- scalaLogging: 3.9.0 -->> 3.9.2
- scalatest: 3.0.7 -->> 3.0.8
- scoverage: 1.3.1 -->> 1.4.0
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Ismael Juma <ismael@juma.me.uk>
The existing implementation triggers warnings in Java 9+ and relies
on internal classes that vary depending on the JDK provider. The proposed
implementation fixes these issues and it's more concise.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Guozhang Wang <wangguoz@gmail.com>
* KAFKA-8106:Reducing the allocation and copying of ByteBuffer when logValidator do validation.
* KAFKA-8106:Reducing the allocation and copying of ByteBuffer when logValidator do validation.
* github comments
* use batch.skipKeyValueIterator
* cleanups
* no need to skip kv for uncompressed iterator
* checkstyle fixes
* fix findbugs
* adding unit tests
* reuse decompression buffer; and using streaming iterator
* checkstyle
* add unit tests
* remove reusing buffer supplier
* fix unit tests
* add unit tests
* use streaming iterator
* minor refactoring
* rename
* github comments
* github comments
* reuse buffer at DefaultRecord caller
* some further optimization
* major refactoring
* further refactoring
* update comment
* github comments
* minor fix
* add jmh benchmarks
* update jmh
* github comments
* minor fix
* github comments
Mocking of WorkerCoordinator was not precise after adding an argument (reason) to AbstractCoordinator#maybeLeaveGroup in KAFKA-8569:
Unit test case for DistributedHerderTest is now precise with respect to the expected argument and succeeds
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
When the log contains out of order message formats (for example v2 message followed by v1 message) and consists of compressed batches typically greater than 1kB in size, it is possible for down-conversion to fail. With compressed batches, we estimate the size of down-converted batches using:
```
private static int estimateCompressedSizeInBytes(int size, CompressionType compressionType) {
return compressionType == CompressionType.NONE ? size : Math.min(Math.max(size / 2, 1024), 1 << 16);
}
```
This almost always underestimates size of down-converted records if the batch is between 1kB-64kB in size. In general, this means we may under estimate the total size required for compressed batches.
Because of an implicit assumption in the code that messages with a lower message format appear before any with a higher message format, we do not grow the buffer we copy the down converted records into when we see a message <= the target message format. This assumption becomes incorrect when the log contains out of order message formats, for example because of leaders flapping while upgrading the message format.
Reviewers: Jason Gustafson <jason@confluent.io>
This adds a new Trogdor fault spec for inducing network latency on a network device for system testing. It operates very similarly to the existing network partition spec by executing the `tc` linux utility.
It has been deprecated since 0.11.0, it was never meant as a publicly
supported API and people should use
`org.apache.kafka.clients.admin.AdminClient` instead. Its presence
causes confusion and people still use them accidentally at times.
`BrokerApiVersionsCommand` uses one method that is not available
in `org.apache.kafka.clients.admin.AdminClient`, we inline it for now.
Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Static members never leave the group, so potentially we could log a flooding number of warning messages in the hb thread. The solution is to only log as warning when we are on dynamic membership.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The AllReplicas was only printing remote replica ids. This change prints
all ids, including local one.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
Kafka should not NPE while loading a deleted partition dir with no log segments. This patch ensures that there will always be at least one segment after initialization.
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Cancel PeriodicProducerExpirationCheck when closing a Log instance, to avoid a memory leak. Add a method to KafkaScheduler to make this possible.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
Sub-task required to allow to define custom processor names with KStreams DSL(KIP-307). This is the 4th PR for KIP-307.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
This patch simplifies the controller election API. We were passing `LeaderIsrAndControllerEpoch` into the election utilities even though we just needed `LeaderAndIsr`. We also remove some unneeded collection copies `doElectLeaderForPartitions`.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This was caused by back-to-back merging of #6854 (which removed the Optional import) and #6936 (which needed the import).
Reviewers: Jason Gustafson <jason@confluent.io>
An attempt to refactor current coordinator logic.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Konstantine Karantasis <konstantine@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
A `Partition` object contain one or many `Replica` objects. These replica
objects in turn can have the "log" if the replica corresponds to the
local node. All the code in Partition or ReplicaManager peek into
replica object to fetch the log if they need to operate on that. As
replica object can represent a local replica or a remote one, this
lead to a bunch of "if-else" code in log fetch and offset update code.
NOTE: In addition to a "log" that is in use during normal operation, if
an alter log directory command is issued, we also create a future log
object. This object catches up with local log and then we switch the log
directory. So temporarily a Partition can have two local logs. Before
this change both logs are inside replica objects.
This change is an attempt to untangle this relationship. In particular
it moves `Log` from `Replica` into `Partition`. So a partition contains
a local log to which all writes go and possibly a "future log" if the
partition is being moved between directories. Additionally, it maintains
a list of remote replicas for offset and "caught up time" data that it uses
for replication protocol.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
Include the topic config `segment.bytes`.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Gwen Shapira
Closes#6945 from vahidhashemian/minor/update_streams_quickstart_doc
1. Add new fields of subscription / assignment and bump up consumer protocol to v2.
2. Update tests to make sure old versioned protocol can be successfully deserialized, and new versioned protocol can be deserialized by old byte code.
Reviewers: Boyang Chen <boyang@confluent.io>, Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
The OffsetFetch requires Topic Describe permission. If a client does not have this, we return TOPIC_AUTHORIZATION_FAILED at the partition level. Currently the consumer does not handle this error explicitly, but raises it as a generic `KafkaException`. For consistency with other APIs and to fix transient test failures in `PlaintextEndToEndAuthorizationTest`, we should raise `TopicAuthorizationFailedException` instead.
Reviewers: Ismael Juma <ismael@juma.me.uk>
De-duplicate the common case in which the prior value is the same as the old value.
Reviewers: Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
- Timeout occurred due to initial slow rebalancing.
- Added code to wait until `KafkaStreams` instance is in state RUNNING to check registration of metrics and in state NOT_RUNNING to check deregistration of metrics.
- I removed all other wait conditions, because they are not needed if `KafkaStreams` instance is in the right state.
Reviewers: Guozhang Wang <wangguoz@gmail.com>