Rework UserScramCredentialRecord to store serverKey and StoredKey rather than saltedPassword. This
is necessary to support migration from ZK, since those are the fields we stored in ZK. Update
latest MetadataVersion to IBP_3_5_IV2 and make SCRAM support conditional on this version. Moved
ScramCredentialData.java from org.apache.kafka.image to org.apache.kafka.metadata, which seems more
appropriate.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This patch implemented the second part of KIP-915. It bumps the versions of the value records used by the group coordinator and the transaction coordinator to make them flexible versions. The new versions are not used when writing to the partitions but only when reading from the partitions. This allows downgrades from future versions that will include tagged fields.
Reviewers: David Jacot <djacot@confluent.io>
This patch implemented the first part of KIP-915. It updates the group coordinator and the transaction coordinator to ignores unknown record types while loading their respective state from the partitions. This allows downgrades from future versions that will include new record types.
Reviewers: Alexandre Dupriez <alexandre.dupriez@gmail.com>, David Jacot <djacot@confluent.io>
KafkaStatusBackingStore uses an infinite retry logic on producer send, which can lead to a stack overflow.
To avoid the problem, a background thread was added, and the callback submits the retry onto the background thread.
topic counts.
Introduces the use of persistent data structures in the KRaft metadata image to avoid copying the entire TopicsImage upon every change. Performance that was O(<number of topics in the cluster>) is now O(<number of topics changing>), which has dramatic time and GC improvements for the most common topic-related metadata events. We abstract away the chosen underlying persistent collection library via ImmutableMap<> and ImmutableSet<> interfaces and static factory methods.
Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>, Purshotam Chauhan <pchauhan@confluent.io>
Fix a case where KRaftMetadataCache.getPartitionInfo was not setting all the PartitionInfo fields it
should have been. Add a regression test.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
The CRC field of Record Batch was incorrectly documented as int32 while in reality it's an unsigned uint32 field.
Reviewers: Luke Chen <showuon@gmail.com>
The GRACEFUL_SHUTDOWN_TIMEOUT_MS for the Trogdor JsonRestServer is 100ms.
In heavily loaded CI environments, this timeout can be exceeded. When this happens,
it causes the jettyServer.stop() and jettyServer.destroy() calls to throw exceptions, which
prevents shutdownExecutor.shutdown() from running. This has the effect of causing the JsonRestServer::waitForShutdown method to block for 1 day, which exceeds the 120s
timeout on the CoordinatorTest (and any other test relying on MiniTrogdorCluster).
This change makes it such that the graceful shutdown timeout is less likely to be exceeded,
and when it is, the timeout does not cause the waitForShutdown method to block for much
longer than the graceful shutdown timeout.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This reverts commit d9b139220e.
KIP-878 implementation did not make any progress, so we need to revert
the public API changes which are not functional right now.
Reviewers: Bill Bejeck <bill@confluent.io>
In SharedServer, fix some cases where a volatile variable could change to null while we were using
it, during shutdown. This is mainly a junit test issue, although it could also cause ugly error
messages during shutdown when running the server in a production context.
Fix a race in KafkaEventQueueTest.testSize.
Reviewers: David Arthur <mumrah@gmail.com>
We incorrectly assumed, that `consumer.position()` should always be
served by the consumer locally set position.
However, within `commitNeeded()` we check if first `if(commitNeeded)`
and thus go into the else only if we have not processed data (otherwise,
`commitNeeded` would be true). For this reason, we actually don't know
if the consumer has a valid position or not.
We should just swallow a timeout if the consumer cannot get the position
from the broker, and try the next partition. If any position advances, we
can return true, and if we timeout for all partitions we can return
false.
Reviewers: Michal Cabak (@miccab), John Roesler <john@confluent.io>, Guozhang Wang <guozhand@confluent.io>
This patch updates the `PartitionAssignor` server-side interface used in the new group coordinator for the new consumer group protocol as follow:
- It switches subscription from topic names to topic ids in order to be closer to the server side implementation.
- It switches assignment from Set to Map<Integer, Set> to be closer to the server side implementation.
- It adds getters for all attributes.
- It makes all attributes final private.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Alexandre Dupriez <alexandre.dupriez@gmail.com>, David Jacot <djacot@confluent.io>
Using versioned-stores for global-KTables is not allowed, because a
global-table is bootstrapped on startup, and a stream-globalTable join
does not support temporal semantics.
Furthermore, `suppress()` does not support temporal semantics and thus
cannot be applied to an versioned-KTable.
This PR disallows both use-cases explicitely.
Part of KIP-914.
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Victoria Xia <victoria.xia@confluent.io>
Implements KIP-399.
Extends ProductionExceptionHandler to handle serialization errors, and to allow users to continue processing and dropping the corresponding record on the floor.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
KIP-914 introduced a new boolean isLatest into Change to indicate whether a change update represents the latest for the key. Even though Change is serialized into the table repartition topic, the new boolean does not need to be serialized in, because the table repartition map processor performs an optimization to drop records for which isLatest = false. If not for this optimization, the downstream table aggregate would have to drop such records instead, and isLatest would need to be serialized into the repartition topic.
In light of the possibility that isLatest may need to be serialized into the repartition topic in the future, e.g., if other downstream processors are added which need to distinguish between records for which isLatest = true vs isLatest = false, this PR reserves repartition topic formats which include isLatest. Reserving these formats now comes at no additional cost to users since a rolling bounce is already required for the upcoming release due to KIP-904. If we don't reserve them now and instead have to add them later, then another bounce would be required at that time. Reserving formats is cheap, so we choose to do it now.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Part of KIP-914.
This PR adds an additional boolean isLatest into Change which specifies whether the new value is the latest for its key. For un-versioned stores, isLatest is always true. For versioned stores, isLatest is true if the value has the latest timestamp seen for the key, else false. This boolean will be used by processors such as the table repartition map processor to determine when a record is out-of-order and should be dropped (when processing a versioned table). This PR updates the table repartition map processor accordingly, and also adds test coverage for table filter.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Part of KIP-914.
This PR updates the return type of VersionedKeyValueStore#put(...) from void to long, where the long is the validTo timestamp of the newly put record, with two special values to indicate either that no such timestamp exists (because the record is the latest for its key) or that the put did not take place (because grace period has elapsed).
As part of making this change, VersionedBytesStore introduces its own put(key, value, timestamp) method to avoid method signature conflicts with the existing put(key, value) method from KeyValueStore<Bytes, byte[]> which has void return type. As a result, the previously added NullableValueAndTimestampSerde class is no longer needed so it's also been removed in this PR as cleanup.
Reviewers: Matthias J. Sax <matthias@confluent.io>
This PR adds a method into GraphNode to assist in tracking whether tables are materialized as versioned or unversioned stores. This is needed in order to allow processors which have different behavior on versioned vs unversioned tables to use the correct semantics. Part of KIP-914.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Moved the information about the BooleanSerde addition from general upgrade notes to notes in Streams.
Reviewers: Walker Carlson <wcarlson@confluent.io>, Divij Vaidya <diviv@amazon.com>
This patch adds the `EventAccumulator` which will be used in the runtime of the new group coordinator. The aim of this accumulator is to basically have a queue per __consumer_group partitions and to ensure that events addressed to the same partitions are not processed concurrently. The accumulator is generic so we could reuse it in different context.
Reviewers: Alexandre Dupriez <alexandre.dupriez@gmail.com>, Justine Olshan <jolshan@confluent.io>
Currently, the current-state KRaft related metric reports follower state for a broker while technically it should be reported as an observer as the kafka-metadata-quorum tool does.
Reviewers: Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>
This PR updates foreign-key table-table join processors to ignore out-of-order records from versioned tables, as specified in KIP-914.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Some system tests from kafkatest.tests.core.network_degrade_test are failing due to missing utility iputils-ping.
[DEBUG - 2023-02-04 01:34:56,322 - network_degrade_test - test_latency -
lineno:67]: Ping output: bash: line 1: ping: command not found
Reviewers: Luke Chen <showuon@gmail.com>
This PR updates primary-key table-table join processors to ignore out-of-order records from versioned tables, as specified in KIP-914.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Added check for ongoing transaction
Thread to send and receive verify only add partition to txn requests
Code to send on request thread courtesy of @artemlivshits
Reviewers: Artem Livshits <alivshits@confluent.io>, Jun Rao <junrao@gmail.com>
This PR updates the stream-table join processors, including both KStream-KTable and KStream-GlobalKTable joins, to perform timestamped lookups when the (global) table is versioned, as specified in KIP-914.
Reviewers: Matthias J. Sax <matthias@confluent.io>
1. I've verified and made sure the only case that task would be null and not stream task would be in testing code only, with pausing / resuming topologies; I've revamped the restoration recording func, mainly to make just one method on the Task interface, to make sure we would never get task == null and do not need to cast to StreamTask.
2. Use numRecords directly to avoid calling records.size() that triggers concurrent modifications.
3. Rewrite the TaskMetricsTest to not use the removed impl functions.
4. Found an issue while fixing 1) above, turns out it's related to pausing tasks: if the tasks are paused due to instance / named-topologies are paused while they need restoration, the restoration would never finish, and hence the instance's state would not transit to RUNNING. Similarly, if user paused just one of the named-topology right at the beginning, since the state would not transit to RUNNING, every tasks across all named-topologies would not make progress. We keep the behavior as is to be consistent with and without state-updater.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Lucas Brutschy <lbrutschy@confluent.io>
Although KAFKA-14808 did not affect KRaft mode, it is important to ensure that we have regression
tests in KRaft mode to prevent a similar bug from appearing there in the future. This PR adds two
tests. First, it adds a test that makes sure we handle what happens when a reassignment completes
and none of the new replicas can be made leader. It's important that we dont keep an old replica as
leader. Second, it adds a test that makes sure we handle new reassignments that don't include a
previous assignment replica that was leader.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This patch adds Group, Record and Result.
Reviewers: Jason Gustafson <jason@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>