If a node is currently throttled, we should take it out of the running for `leastLoadedNode`. Additionally, current logic seems to favor connecting to new nodes rather than using existing connections which have one or more in flight requests. The javadoc is slightly vague about whether this is expected, but it seems not.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
This patch refactors the implementation of the --version option and moves it into the default command options. This has the benefit of automatically including it in the usage output of the command line tools. Several tools had to be manually updated because they did not use the common options.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
This patch ensures that the log end offset of each partition is initialized consistently with the checkpointed log start offset.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
While working on consolidating the various store unit tests I uncovered some minor "bugs" in the in-memory stores (inconsistencies with the behavior as established by the RocksDB stores).
open iterators should be properly closed in the case the store is closed
fetch/findSessions should always throw NPE if key is null
window end time should be truncated at Long.MAX_VALUE rather than throw exception
(Verified in-memory stores pass all applicable rocksDB tests now, unified unit tests coming in another PR)
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
Any RocksJava object that inherits from org.rocksdb.AbstractNativeReference must be closed explicitly in order to free up the memory of the backing C++ object. The BloomFilter extends RocksObject (which implements AbstractNativeReference) and should be also be closed in RocksDBStore#close to avoid leaking memory.
Reviewers: Bill Bejeck <bbejeck@gmail.com>
During a partial message format upgrade, it is possible for the message format to flap between new and old versions. If we detect that data appended to the log is on an old format, we can clear the leader epoch cache so that we revert to truncation by high watermark. Once the upgrade completes and all replicas are on the same format, we will append to the epoch cache as usual. Note this is related to KAFKA-7897, which handles message format downgrades through configuration.
Reviewers: Jun Rao <junrao@gmail.com>
Fix two problems in Streams:
* Session windows expired prematurely (off-by-one error), since the window end is inclusive, unlike other windows
* Suppress duration for sessions incorrectly waited only the grace period, but session windows aren't closed until gracePeriod + sessionGap
Update the tests accordingly
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Details in the JIRA: https://issues.apache.org/jira/browse/KAFKA-8285
Basically we want to avoid sharing of atomic updates for thread id with multiple stream instances on one JVM.
Reviewers: Raoul de Haard, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This patch fixes a bug in the sending of transactional requests. We need to call `KafkaClient.send` with an updated current time. Failing to do so can result in an `IllegalStateExcepton` which leaves the producer effectively dead since the in-flight correlation id has been set, but no request has been sent. To avoid the same problem in the future, we update the in flight correlationId only after sending the request.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
When shutting down the ReplicaFetcher thread, we may fail to unregister sensors in selector.close(). When that happened, we will fail to start up the ReplicaFetcherThread with the same fetch id again because of the IllegalArgumentException in sensor registration. This issue will cause constant URPs in the cluster because the ReplicaFetchterThread is gone.
This patch addresses this issue by introducing a try-finally block in selector.close() so that we will always unregister the sensors in shutting down ReplicaFetcherThreads.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
When processing multiple key-changing operations during the optimization phase a ConcurrentModificationException is possible.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
* Revert "MINOR: Add unit test for SerDe auto-configuration (#6610)"
This reverts commit 172fbb2dd5.
* Revert "[KAFKA-3729] Auto-configure non-default SerDes passed alongside the topology builder (#6461)"
This reverts commit e56ebbffca.
The two merged PRs introduce a breaking change. Reverting to preserve backward compatibility. Jira ticket reopened.
Reviewers: Ted Yu <yuzhihong@gmail.com>, Guozhang Wang <guozhang@confluent.io>
`ConfigDef.embeddedValidator` should return an Anonymous Object instead of lambda so that we can have a useful `toString()` for methods such as `toRst`.
Reviewers: Jason Gustafson <jason@confluent.io>
In NetworkClient.leastLoadedNode, we invoke `isReady` to check if an established connection exists for the given node. `isReady` checks whether metadata needs to be updated also which wants to make metadata request first priority. However, if the to-be-sent request is metadata request, then we do not have to check this otherwise the loop in `leastLoadedNode` will do a complete iteration until the final node is selected.
Reviewers: Jason Gustafson <jason@confluent.io>
First pass at an in-memory session store implementation.
Reviewers: Simon Geisler, Damian Guy <damian@confluent.io>, John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
While going through the review of InMemorySessionStore I realized there is also some minor cleanup to be done for the other in-memory stores. This includes trivial changes such as removing unnecessary references to 'this' and moving collection initialization to the declaration. It also fixes some unsafe behavior (registering an iterator from inside its own constructor). In-memory window store iterator classes were made static and some instances of KeyValueIterator missing types were fixed across a handful of tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <bruno@confluent.io>
This is the first diff for the implementation of JoinGroup logic for static membership. The goal of this diff contains:
* Add group.instance.id to be unique identifier for consumer instances, provided by end user;
Modify group coordinator to accept JoinGroupRequest with/without static membership, refactor the logic for readability and code reusability.
* Add client side support for incorporating static membership changes, including new config for group.instance.id, apply stream thread client id by default, and new join group exception handling.
* Increase max session timeout to 30 min for more user flexibility if they are inclined to tolerate partial unavailability than burdening rebalance.
* Unit tests for each module changes, especially on the group coordinator logic. Crossing the possibilities like:
6.1 Dynamic/Static member
6.2 Known/Unknown member id
6.3 Group stable/unstable
6.4 Leader/Follower
The rest of the 345 change will be broken down to 4 separate diffs:
* Avoid kicking out members through rebalance.timeout, only do the kick out through session timeout.
* Changes around LeaveGroup logic, including version bumping, broker logic, client logic, etc.
* Admin client changes to add ability to batch remove static members
* Deprecate group.initial.rebalance.delay
Reviewers: Liquan Pei <liquanpei@gmail.com>, Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
As titled, this PR changed the default reset policy to latest accidentally for system tests, which in fact was earliest.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The controller maintains state across `ControllerContext`, `PartitionStateMachine`, `ReplicaStateMachine`, and `TopicDeletionManager`. None of this state is actually isolated from the rest. For example, topics undergoing deletion are intertwined with the partition and replica states. As a consequence of this, each of these components tends to be dependent on all the rest, which makes testing and reasoning about the system difficult. This is a first step toward untangling all the state. This patch moves it all into `ControllerContext` and removes many of the circular dependencies. So far, this is mostly a direct translation, but in the future we can add additional validation in `ControllerContext` to make sure that state is maintained consistently.
Additionally, this patch adds several mock objects to enable easier testing: `MockReplicaStateMachine` and `MockPartitionStateMachine`. These have simplified logic for updating the current state. This is used to create some new test cases for `TopicDeletionManager`.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jun Rao <junrao@gmail.com>
Add support for Standalone Connect configs in Rest Server extensions
A bug was introduced in 7a42750d that was caught in system tests:
The rest extensions fail if a Standalone worker config is passed,
since it does not have a definition for rebalance timeout.
A new method was introduced on WorkerConfig that by default returns
null for the rebalance timeout, and DistributedConfig overloads this
to return the configured value.
Author: Cyrus Vafadari <cyrus@confluent.io>
Reviewers: Arjun Satish <arjunconfluent.io>, Randall Hauch <rhauch@gmail.com>
In the KafkaProducer#close method we have a debug log statement Kafka producer has been closed then a few lines later a KafkaException can occur.
This could be confusing to users, so this PR simply moves the log statement to after the possible exception to avoid confusing information in the logs.
Reviewers: Guozhang Wang <wangguoz@gmail.com>