First pass at an in-memory session store implementation.
Reviewers: Simon Geisler, Damian Guy <damian@confluent.io>, John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
While going through the review of InMemorySessionStore I realized there is also some minor cleanup to be done for the other in-memory stores. This includes trivial changes such as removing unnecessary references to 'this' and moving collection initialization to the declaration. It also fixes some unsafe behavior (registering an iterator from inside its own constructor). In-memory window store iterator classes were made static and some instances of KeyValueIterator missing types were fixed across a handful of tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <bruno@confluent.io>
This is the first diff for the implementation of JoinGroup logic for static membership. The goal of this diff contains:
* Add group.instance.id to be unique identifier for consumer instances, provided by end user;
Modify group coordinator to accept JoinGroupRequest with/without static membership, refactor the logic for readability and code reusability.
* Add client side support for incorporating static membership changes, including new config for group.instance.id, apply stream thread client id by default, and new join group exception handling.
* Increase max session timeout to 30 min for more user flexibility if they are inclined to tolerate partial unavailability than burdening rebalance.
* Unit tests for each module changes, especially on the group coordinator logic. Crossing the possibilities like:
6.1 Dynamic/Static member
6.2 Known/Unknown member id
6.3 Group stable/unstable
6.4 Leader/Follower
The rest of the 345 change will be broken down to 4 separate diffs:
* Avoid kicking out members through rebalance.timeout, only do the kick out through session timeout.
* Changes around LeaveGroup logic, including version bumping, broker logic, client logic, etc.
* Admin client changes to add ability to batch remove static members
* Deprecate group.initial.rebalance.delay
Reviewers: Liquan Pei <liquanpei@gmail.com>, Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
As titled, this PR changed the default reset policy to latest accidentally for system tests, which in fact was earliest.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The controller maintains state across `ControllerContext`, `PartitionStateMachine`, `ReplicaStateMachine`, and `TopicDeletionManager`. None of this state is actually isolated from the rest. For example, topics undergoing deletion are intertwined with the partition and replica states. As a consequence of this, each of these components tends to be dependent on all the rest, which makes testing and reasoning about the system difficult. This is a first step toward untangling all the state. This patch moves it all into `ControllerContext` and removes many of the circular dependencies. So far, this is mostly a direct translation, but in the future we can add additional validation in `ControllerContext` to make sure that state is maintained consistently.
Additionally, this patch adds several mock objects to enable easier testing: `MockReplicaStateMachine` and `MockPartitionStateMachine`. These have simplified logic for updating the current state. This is used to create some new test cases for `TopicDeletionManager`.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jun Rao <junrao@gmail.com>
Add support for Standalone Connect configs in Rest Server extensions
A bug was introduced in 7a42750d that was caught in system tests:
The rest extensions fail if a Standalone worker config is passed,
since it does not have a definition for rebalance timeout.
A new method was introduced on WorkerConfig that by default returns
null for the rebalance timeout, and DistributedConfig overloads this
to return the configured value.
Author: Cyrus Vafadari <cyrus@confluent.io>
Reviewers: Arjun Satish <arjunconfluent.io>, Randall Hauch <rhauch@gmail.com>
In the KafkaProducer#close method we have a debug log statement Kafka producer has been closed then a few lines later a KafkaException can occur.
This could be confusing to users, so this PR simply moves the log statement to after the possible exception to avoid confusing information in the logs.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Streams previously flushed stores in the order of their registration, which is arbitrary. Because stores may forward values upon flush (as in cached state stores), we must flush stores in topological order.
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Replace `headers.isEmpty()` by calls to `isEmpty()` as the latter does a null check on heathers (that is lazily created).
Author: Sebastián Ortega <sebastian.ortega@letgo.com>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Arjun Satish <arjunconfluent.io>, Randall Hauch <rhauch@gmail.com>
Verified that the https links work.
I didn't update the license header in this PR since that touches
so many files. Will file a separate one for that.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Several issues have come to light since the 2.2.0 release:
upon restore, suppress incorrectly set the record metadata using the changelog record, instead of preserving the original metadata
restoring a tombstone incorrectly didn't update the buffer size and min-timestamp
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
We suspect the problem might be a race condition after broker startup where the consumer has yet to find the coordinator and rebalance. The fix here rolls all the brokers first and then waits for the expected exception.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Gwen Shapira
Closes#6608 from hachikuji/KAFKA-7965
This patch fixes a bug in the append logic which can cause duplicate offsets to be appended to the log when the append to the transaction index fails. Rather than incrementing the log end offset after the index append, we do it immediately after the records are written to the log. If the index append later fails, we do two things:
1) We ensure that the last stable offset cannot advance. This guarantees that the aborted data will not be returned to the user until the transaction index contains the corresponding entry.
2) We skip updating the end offset of the producer state. When recovering the log, we will have to reprocess the log and write the index entries.
Reviewers: Jun Rao <junrao@gmail.com>
We should include partition/offset information when we raise exceptions during producer state validation. This saves a lot of the discovery work to figure out where the problem occurred. This patch also includes a new test case to verify additional coordinator fencing cases.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Reduce the total key space cache iterators have to search for segmented byte stores by wrapping several single-segment iterators.
Summary of Benchmarking Results (# records processed as primary indicator)
Session Store:
Only single-key findSessions seems to benefit (~4x improvement) due to conservative scanning of potentially variable-sized keys in key-range findSessions. Could get improvement from key-range findSessions as well if we can tell when/if keys are a fixed size, or pending an efficient custom comparator API from RocksDB
Window Store:
Both single and multi-key fetch saw some improvement; this depended on the size of the time-range in the fetch (in the DSL this would be window size) relative to the retention period. Performance benefits from this patch when the fetch spans multiple segments; hence the larger the time range being searched, the better this will do.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
The fetchSession() method of SessionStore searches for a (single) specific session and returns null if none are found. This is analogous to fetch(key, time) in WindowStore or get(key) in KeyValueStore. MeteredWindowStore and MeteredKeyValueStore both check for a null result before attempting to deserialize, however MeteredSessionStore just blindly deserializes and as a result NPE is thrown when we search for a record that does not exist.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Bruno Cadonna <bruno@confluent.io>
Most of the time, the group coordinator runs on broker 1. Occasionally the group coordinator will be placed on broker 2. If that's the case, the loop starting at line 320 have no chance to check and update `kickedOutConsumerIdx`. A quick fix is to safely do another round of loop to ensure `kickedOutConsumerIdx` always be checked after the last broker restart.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Adds flatTrasformValues methods in KStream
Adds processor supplier and processor for flatTransformValues
Improves API documentation of transformValues
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
* Describe/Delete/Reset offsets on multiple consumer groups at a time (including each group by repeating `--group` parameter)
* Describe/Delete/Reset offsets on ALL consumer groups at a time (add new `--all-groups` option similar to `--all-topics`)
* Reset plan CSV file generation reworked: structure updated to support multiple consumer groups and make sure that CSV file generation is done properly since there are no restrictions on consumer group names and symbols like commas and quotes are allowed.
* Extending data output table format by adding `GROUP` column for all `--describe` queries
Detailed description
* Adds KTable.suppress to the Scala API.
* Fixed count in KGroupedStream, SessionWindowedKStream, and TimeWindowedKStream so that the value serde gets passed down to the KTable returned by the internal mapValues method.
* Suppress API support for Java 1.8 + Scala 2.11
Testing strategy
I added unit tests covering:
* Windowed KTable.count.suppress w/ Suppressed.untilTimeLimit
* Windowed KTable.count.suppress w/ Suppressed.untilWindowCloses
* Non-windowed KTable.count.suppress w/ Suppressed.untilTimeLimit
* Session-windowed KTable.count.suppress w/ Suppressed.untilWindowCloses
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Currently close() only awaits completion of pending produce requests. If there is a transaction ongoing, it may be dropped. For example, if one thread is calling commitTransaction() and another calls close(), then the commit may never happen even if the caller is willing to wait for it (by using a long timeout). What's more, the thread blocking in commitTransaction() will be stuck since the result will not be completed once the producer has shutdown.
This patch ensures that 1) completing transactions are awaited, 2) ongoing transactions are aborted, and 3) pending callbacks are completed before close() returns.
Reviewers: Jason Gustafson <jason@confluent.io>