When a `OutOfOrderSequenceException` error is received by an idempotent producer for a partition, the producer bumps its epoch, adjusts the sequence number and the epoch of the in-flight batches of the partitions affected by the `OutOfOrderSequenceException` error. This happens in `TransactionManager#bumpIdempotentProducerEpoch`.
The remaining partitions are treated separately. When the last in-flight batch of a given partition is completed, the sequence number is reset. This happens in `TransactionManager#handleCompletedBatch`.
However, when a given partition does not have in-flight batches when the producer epoch is bumped, its sequence number is not reset. Similarly, when a in-flight batch eventually fails after the producer epoch is bumped, the batch is discarded and sequence number for its partition is not reset. In both cases, it results in having subsequent producer request to use the new producer epoch with the old sequence number and to be rejected by the broker.
With this patch, the producer id/epoch is now stored in the partition state. This ensure that the producer id/epoch of a given partition remains consistent with its sequence number. When the producer epoch is bumped, the producer epoch of the partition is lazily updated and its sequence number is reset accordingly. This happens only when all the in-flight batches have been resolved.
Reviewers: Bob Barrett <bob.barrett@confluent.io>, Jason Gustafson <jason@confluent.io>
Topics processed by the controller and topics newly created will only be given topic IDs if the inter-broker protocol version on the controller is greater than 2.8. This PR also adds a kafka config to specify whether the IBP is greater or equal to 2.8. System tests have been modified to include topic ID checks for upgrade/downgrade tests. This PR also adds a new integration test file for requests/responses that are not gated by IBP (ex: metadata)
Reviewers: dengziming <dengziming1993@gmail.com>, Lucas Bradstreet <lucas@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
If a user doesn't have Persistent Stores, we won't create base dir and state dir and should not try to set permissions on them.
Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
This PR updates the AdminClient to use the DescribeCluster API when available. The clients fails back to the Metadata API otherwise.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Kafka Streams' TaskManager is a central class that grew quite big. This
PR breaks out a new 'task container' class to descope what TaskManager
does. In follow up PRs, we plan to move more methods from TaskManager
to the new 'Tasks.java' class and also improve task-type type safety.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
Document the new properties-based metrics for RocksDB
Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
We have seen recent system test timeouts associated with this test.
Analysis revealed an excessive amount of time spent searching
for test conditions in the logs.
This change addresses the issue by dropping some unnecessary
checks and using a more efficient log search mechanism.
Reviewers: Bill Bejeck <bbejeck@apache.org>, Guozhang Wang <guozhang@apache.org>
This was causing IntelliJ to choose the Vintage runner when running `core` tests
which would then fail during the discovery phase.
Verified that `core` tests work in IntelliJ again after this change.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Ensure that runtime-only dependencies don't cause a different version to be used.
More specifically:
> The relationship is directed, which means that if the runtimeClasspath configuration
has to be resolved, Gradle will first resolve the compileClasspath and then "inject" the
result of resolution as strict constraints into the runtimeClasspath.
For more details, see:
https://docs.gradle.org/6.8/userguide/resolution_strategy_tuning.html#sec:configuration_consistency
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Add a runtime dependency on the jupiter engine to avoid failures
during `./gradlew unitTest/integrationTest/test` for `upgrade-system-tests-*`.
Also remove `connect` and `examples` from the JUnit 5 blocklist since they
contains no tests.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Get controller api version intersection setup for client queries. When the unsupported exception was hit in the EnvelopeResponse, close the client connection to let it rediscover the api version.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch attempts to generalize server initialization for KIP-500. It adds a `Server` trait which `KafkaServer` extends for the legacy Zookeeper server, and a new `KafkaRaftServer` for the new server. I have also added stubs for `KafkaRaftBroker` and `KafkaRaftController` to give a clearer idea how this will be used.
Note that this patch removes `KafkaServerStartable`, which was intended to enable custom startup logic, but was not codified into an official API and is not planned to be supported after KIP-500.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>
We would like to be able to use `KafkaRaftClient` for tooling/debugging use cases. For this, we need the localId to be optional so that the client can be used more like a consumer. This is already supported in the `Fetch` protocol by setting `replicaId=-1`, which the Raft implementation checks for. We just need to alter `QuorumState` so that the `localId` is optional. The main benefit of doing this is that it saves tools the need to generate an arbitrary id (which might cause conflicts given limited Int32 space) and it lets the leader avoid any local state for these observers (such as `ReplicaState` inside `LeaderState`).
Reviewers: Ismael Juma <ismael@juma.me.uk>, Boyang Chen <boyang@confluent.io>
Rename AdminManager to ZkAdminManager to emphasize the fact that it is not used by the KIP-500 code paths.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Boyang Chen <boyang@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
In the [KIP-500 development branch](https://github.com/confluentinc/kafka/tree/kip-500),
we have a separate ControllerApis that shares a lot of functionality with KafkaApis. We
introduced a utility class ApisUtils to pull out the common code. Some things were moved
to RequestChannel as well.
We'd like to upstream this work now so we don't continue to diverge (since KafkaApis is
a frequently modified class). There should be no logical changes in this PR, only shuffling
code around.
Reviewers: Jason Gustafson <jason@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Jose Sancio <jsancio@users.noreply.github.com>, Ismael Juma <ismael@juma.me.uk>
It is helpful to delay initialization of the `RaftClient` configuration including the voter string until after construction. This helps in integration test cases where the voter ports may not be known until sockets are bound.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
Also:
* Remove unused powermock dependency
* Remove "examples" from the JUnit 4 list since one module was already
converted and the other has no tests
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>