Add a new exception, NoReassignmentInProgressException. Modify LeaderAndIsrRequest to include the AddingRepicas and RemovingReplicas fields. Add the ListPartitionReassignments and AlterPartitionReassignments RPCs.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>
And remove redundant call. Closing ZKDatabase is necessary to allow the data
directory to be deleted when running on Windows.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch is part of KIP-345. We are aiming to support batch leave group request issued from admin client. This diff is the first effort to bump leave group request version.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
The bug is that we accidentally used the current state timestamp for the group instead of the real current time. When a group is first loaded, this timestamp is not initialized, so this resulted in a `NoSuchElementException`. Additionally this violated the intended uniqueness of the memberId, which could have broken the group instance fencing. Fix is made and unit test to make sure the timestamp is properly encoded within the returned member.id.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Main changes of this PR:
* Deprecate old consumer.internal.PartitionAssignor and add public consumer.ConsumerPartitionAssignor with all OOTB assignors migrated to new interface
* Refactor assignor's assignment/subscription related classes for easier to evolve API
* Removed version number from classes as it is only needed for serialization/deserialization
* Other previously-discussed cleanup included in this PR:
* Remove Assignment.error added in pt 1
* Remove ConsumerCoordinator#adjustAssignment added in pt 2
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Minor refactoring of the places where we delete log segments, to ensure we always remove the in-memory metadata of the segment before performing physical deletion.
Reviewers: Ismael Juma <ismael@juma.me.uk>, NIkhil Bhatia <rite2nikhil@gmail.com>, Jun Rao <junrao@gmail.com>
* Clean up one redundant and one misplaced metric
* Clarify the relationship among these metrics to avoid future confusion
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
LogManager#getOrCreateLog() selects a log dir for the new replica from
_liveLogDirs, if disk failure is discovered at this point, before
LogDirFailureHandler finds out, try using other log dirs before failing
the operation.
Reviewers: Anna Povzner <anna@confluent.io>, Jason Gustafson <jason@confluent.io>
This is a bug fix PR to resolve errors introduced in https://github.com/apache/kafka/pull/6188. The PR fixes 2 things:
1. throttle time should be set on version >= 1 instead of version >= 2
2. `getErrorResponse` should set throwable exception within LeaveGroupResponseData
The patch also adds more unit tests to guarantee correctness for leave group protocol.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
* Added removeOldLeaderMetrics in BrokerTopicStats to remove MessagesInPerSec, BytesInPerSec, BytesOutPerSec for any broker that is no longer a leader of any partition for a particular topic
* Modified ReplicaManager to remove the metrics of any topic that the current broker has no leadership (meaning the broker either becomes a follower for all of the partitions in that topic or stops being a replica)
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jun Rao <junrao@gmail.com>
As part of #5425 the streams default override for producer retries was removed. The documentation was not updated to reflect that change.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
This patch changes maybeSendTransactionalRequest to handle both sending and polling transactional requests (and renames it to maybeSendAndPollTransactionalRequest), and skips the call to poll if no request is actually sent. It also removes the inner loop inside maybeSendAndPollTransactionalRequest and relies on the main Sender loop for retries.
Reviewers: Jason Gustafson <jason@confluent.io>
The timestamp extractor takes a previousTimestamp parameter which should be the partition time. This PR adds back in partition time tracking for the extractor, and renames previousTimestamp --> partitionTime
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Matthias J. Sax <mjsax@apache.org>
If there are **no topics** in a cluster, kafka-topics.sh --describe without a --topic option should return empty list, not throw an exception.
Reviewers: Jason Gustafson <jason@confluent.io>
Api exception types usually have a single argument constructor which accepts the exception message. However, some types actually use this constructor to initialize a field. This inconsistency has led to some cases where exception messages were being incorrectly passed to these constructors and interpreted incorrectly. For example, this leads to confusing messages like the following in the log when we hit a GROUP_MAX_SIZE_REACHED error:
```
Attempt to join group failed due to fatal error: Consumer group The consumer group has reached its max size. already has the configured ...
```
This patch fixes the problem by changing these constructors so that the exception message is provided consistently. This affected `GroupAuthorizationException`, `TopicAuthorizationException`, and `GroupMaxSizeReachedException`.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Ismael Juma <ismael@juma.me.uk>
The two digit formatting we use in `Utils.formatBytes` depends on the english locale. If run from a different locale (e.g. German), the test case fails. This patch uses english explicitly.
Reviewers: Lee Dongjin <dongjin@apache.org>, Jason Gustafson <jason@confluent.io>
The transient failures are usually caused by a timeout in `awaitCommits`. This patch increases the timeout from 15s to 30s.
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Matthias J. Sax <mjsax@apache.org>
This PR is to use KeyValueTimeStamp Object in MockProcessor Test file instead of String and change all the dependency files with broken test cases.
Reviewers: Kamal Chandraprakash, Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Ensure that Producer#send() throws topic metadata exceptions only for the topic being sent to and not for other topics in the producer's metadata instance. Also removes topics from consumer's metadata when a topic is removed using manual assignment.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )
Reviewers: pkleindl <44436474+pkleindl@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
This patch changes the name of the `Resources` field of AlterConfigsResponse to `Responses`. This makes it consistent with AlterConfigsResponse, which has a differently-named but structurally-identical field. Tested with unit tests.
Reviewers: Jason Gustafson <jason@confluent.io>
We have seen some failures recently in `RestServerTest`. It's the usual problem with reliance on static ports.
```
Caused by: java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:8083
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:346)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:308)
at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.server.Server.doStart(Server.java:396)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.kafka.connect.runtime.rest.RestServer.initializeServer(RestServer.java:178)
... 56 more
Caused by: java.net.BindException: Address already in use
```
This patch makes the chosen port dynamic.
Reviewers: Ismael Juma <ismael@juma.me.uk>
The RegexSourceIntegrationTest has some flakiness as it deletes and re-creates the same output topic before each test. This PR reduces the chance for errors by creating a unique output topic for each test.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>
Correct the Flatten SMT to properly handle null key or value `Struct` instances.
Author: Michal Borowiecki <michal.borowiecki@openbet.com>
Reviewers: Arjun Satish <arjun@confluent.io>, Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com>
fix checkpoint file warning by filtering checkpointable offsets per task
clean up state manager hierarchy to prevent similar bugs
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
This patch fixes a compatibility breaking `MemberDescription` constructor change in #6957. It also updates `equals` and `hashCode` for the new `groupInstanceId` field that was added in the same patch.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
The purpose of this patch is to restrict the paths for updating and accessing the high watermark and the last stable offset. By doing so, we can validate that these offsets always remain within the range of the log. We also ensure that we never expose `LogOffsetMetadata` unless it is fully materialized. Finally, this patch makes a few naming changes. In particular, we remove the `highWatermark_=` and `highWatermarkMetadata_=` which are both misleading and cumbersome to test.
Reviewers: David Arthur <mumrah@gmail.com>
The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.
Reviewers: Ismael Juma <ismael@juma.me.uk>
ZooKeeper 3.5.5 is the first stable release in the 3.5.x series. The key new feature
in is TLS support, but there are a few more noteworthy features:
* Dynamic reconfiguration
* Local sessions
* New node types: Container, TTL
* Ability to remove watchers
* Multi-threaded commit processor
* Upgraded to Netty 4.1
See the release notes for more detail:
https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html
In addition to the version bump, we:
* Add `commons-cli` dependency as it's required by `ZooKeeperMain`, but specified as
`provided` in their pom.
* Remove unnecessary `ZooKeeperMainWrapper`, the bug it worked around was fixed
upstream a long time ago.
* Ignore non zero exit in one system test invocation of `ZooKeeperMain`.
`ZooKeeperMainWrapper` always returned `0` and `ZooKeeperService.query` relies
on that for correct behavior.
Reviewers: Jason Gustafson <jason@confluent.io>
We noticed a lot of messages like the following in the broker logs recently:
```
[2019-07-10 02:01:23,946] WARN [ReplicaManager broker=0] While updating the HW for follower -1 for partition connect-storage-topic-connect-cluster-0, the replica could not be found. (kafka.server.ReplicaManager:70)
```
In the KIP-392 PR, we added logic to track the high watermark of followers, but it is invoked even for consumer fetches. This doesn't cause any harm other than all the log noise.
This patch just adds the missing follower check.
Reviewers: David Arthur <mumrah@gmail.com>
Clarify the behavior of `max.poll.interval.ms` for static consumers since it is slightly different from dynamic members.
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Jason Gustafson <jason@confluent.io>
We don't allow changing number of partitions for internal topics. To do
so we check if the topic name belongs to the set of internal topics
directly instead of using the "isInternalTopic" method. This breaks the
encapsulation by making client aware of the fact that internal topics
have special names.
This is a simple change to use the method `Topic::isInternalTopic`
method instead of checking it directly in "alterTopic" command. We
also reduce visibility to `Topic::INTERNAL_TOPICS` to avoid
unnecessary reliance on it in the future.
Reviewers: Jason Gustafson <jason@confluent.io>