Several issues have come to light since the 2.2.0 release:
upon restore, suppress incorrectly set the record metadata using the changelog record, instead of preserving the original metadata
restoring a tombstone incorrectly didn't update the buffer size and min-timestamp
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
We suspect the problem might be a race condition after broker startup where the consumer has yet to find the coordinator and rebalance. The fix here rolls all the brokers first and then waits for the expected exception.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Gwen Shapira
Closes#6608 from hachikuji/KAFKA-7965
This patch fixes a bug in the append logic which can cause duplicate offsets to be appended to the log when the append to the transaction index fails. Rather than incrementing the log end offset after the index append, we do it immediately after the records are written to the log. If the index append later fails, we do two things:
1) We ensure that the last stable offset cannot advance. This guarantees that the aborted data will not be returned to the user until the transaction index contains the corresponding entry.
2) We skip updating the end offset of the producer state. When recovering the log, we will have to reprocess the log and write the index entries.
Reviewers: Jun Rao <junrao@gmail.com>
We should include partition/offset information when we raise exceptions during producer state validation. This saves a lot of the discovery work to figure out where the problem occurred. This patch also includes a new test case to verify additional coordinator fencing cases.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Reduce the total key space cache iterators have to search for segmented byte stores by wrapping several single-segment iterators.
Summary of Benchmarking Results (# records processed as primary indicator)
Session Store:
Only single-key findSessions seems to benefit (~4x improvement) due to conservative scanning of potentially variable-sized keys in key-range findSessions. Could get improvement from key-range findSessions as well if we can tell when/if keys are a fixed size, or pending an efficient custom comparator API from RocksDB
Window Store:
Both single and multi-key fetch saw some improvement; this depended on the size of the time-range in the fetch (in the DSL this would be window size) relative to the retention period. Performance benefits from this patch when the fetch spans multiple segments; hence the larger the time range being searched, the better this will do.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
The fetchSession() method of SessionStore searches for a (single) specific session and returns null if none are found. This is analogous to fetch(key, time) in WindowStore or get(key) in KeyValueStore. MeteredWindowStore and MeteredKeyValueStore both check for a null result before attempting to deserialize, however MeteredSessionStore just blindly deserializes and as a result NPE is thrown when we search for a record that does not exist.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Bruno Cadonna <bruno@confluent.io>
Most of the time, the group coordinator runs on broker 1. Occasionally the group coordinator will be placed on broker 2. If that's the case, the loop starting at line 320 have no chance to check and update `kickedOutConsumerIdx`. A quick fix is to safely do another round of loop to ensure `kickedOutConsumerIdx` always be checked after the last broker restart.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Adds flatTrasformValues methods in KStream
Adds processor supplier and processor for flatTransformValues
Improves API documentation of transformValues
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
* Describe/Delete/Reset offsets on multiple consumer groups at a time (including each group by repeating `--group` parameter)
* Describe/Delete/Reset offsets on ALL consumer groups at a time (add new `--all-groups` option similar to `--all-topics`)
* Reset plan CSV file generation reworked: structure updated to support multiple consumer groups and make sure that CSV file generation is done properly since there are no restrictions on consumer group names and symbols like commas and quotes are allowed.
* Extending data output table format by adding `GROUP` column for all `--describe` queries
Detailed description
* Adds KTable.suppress to the Scala API.
* Fixed count in KGroupedStream, SessionWindowedKStream, and TimeWindowedKStream so that the value serde gets passed down to the KTable returned by the internal mapValues method.
* Suppress API support for Java 1.8 + Scala 2.11
Testing strategy
I added unit tests covering:
* Windowed KTable.count.suppress w/ Suppressed.untilTimeLimit
* Windowed KTable.count.suppress w/ Suppressed.untilWindowCloses
* Non-windowed KTable.count.suppress w/ Suppressed.untilTimeLimit
* Session-windowed KTable.count.suppress w/ Suppressed.untilWindowCloses
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Currently close() only awaits completion of pending produce requests. If there is a transaction ongoing, it may be dropped. For example, if one thread is calling commitTransaction() and another calls close(), then the commit may never happen even if the caller is willing to wait for it (by using a long timeout). What's more, the thread blocking in commitTransaction() will be stuck since the result will not be completed once the producer has shutdown.
This patch ensures that 1) completing transactions are awaited, 2) ongoing transactions are aborted, and 3) pending callbacks are completed before close() returns.
Reviewers: Jason Gustafson <jason@confluent.io>
We have not had great experience with listeners. They make the code harder to understand because they result in indirectly maintained circular dependencies. Often this leads to tricky deadlocks when we try to introduce locking. We were able to remove the Metadata listener in KAFKA-7831. Here we do the same for the listener in SubscriptionState.
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
This patch updates the InitProducerId request API to use the generated sources. It also fixes a small bug in the DescribeAclsRequest class where we were using the wrong api key.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Colin McCabe <cmccabe@apache.org>
Due to KAFKA-8159, Streams will throw an unchecked exception when a caching layer or in-memory underlying store is queried over a range of keys from negative to positive. We should add a check for this and log it then return an empty iterator (as the RocksDB stores happen to do) rather than crash
Reviewers: Bruno Cadonna <bruno@confluent.io> Bill Bejeck <bbejeck@gmail.com>
Protocol compatibility can be facilitated if a Struct, that has been defined as an extension of a previous Struct by adding fields at the end of the older version, can read a message of an older version by ignoring the absence of the missing new fields. Reading the missing fields should be allowed by the definition of these fields (they have to be nullable) when supported by the schema.
Reviewers: David Arthur <mumrah@gmail.com>, Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Fixed the ConnectClusterStateImpl.connectors() method and throw an exception on timeout. Added unit test.
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Magesh Nandakumar <magesh.n.kumar@gmail.com>, Robert Yokota <rayokota@gmail.com>, Arjun Satish <wicknicks@users.noreply.github.com>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6384 from C0urante:kafka-8058
ConsumerBounceTest redundantly executes a couple test cases which were included in the abstract class `BaseConsumerTest`. We should try to keep a cleaner separation of testing logic and utility logic so that this does not happen (the build time is long enough without doing unnecessary work). This PR moves the cluster initialization and consumer utilities out of BaseConsumerTest and into a new class AbstractConsumerTest. We then let ConsumerBounceTest extend AbstractConsumerTest.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This PR should help address the flakiness in the ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup test (https://issues.apache.org/jira/browse/KAFKA-7965). I tested this locally and have verified it significantly reduces flakiness - 25/25 tests now pass. Running the test 25 times in trunk, I'd get `18/25` passes.
It does so by reusing the less-flaky consumer integration testing functionality inside `BaseConsumerTest`. Most notably, the test now makes use of the `ConsumerAssignmentPoller` class - each consumer now polls non-stop rather than the more batch-oriented polling we had in `ConsumerBounceTest#waitForRebalance()`.
Reviewers: Jason Gustafson <jason@confluent.io>
Ensure that modification time is checked against the file used to create the SSLContext that is in-use so that SSLContext is updated whenever file is modified and a config update request is received.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Each separate thread should have its own throttle, so that it can sleep
for an appropriate amount of time when needed.
ConnectionStressWorker should avoid recalculating the status after
shutting down the runnables. Otherwise, if one runnable is slow to
stop, it will skew the average down in a way that doesn't reflect
reality. This change moves the status calculation into a separate
periodic runnable that gets shut down cleanly before the other ones.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Gwen Shapira, Stanislav Kozlovski
Closes#6533 from cmccabe/fix_connection_stress_worker