This is a bug fix PR to resolve errors introduced in https://github.com/apache/kafka/pull/6188. The PR fixes 2 things:
1. throttle time should be set on version >= 1 instead of version >= 2
2. `getErrorResponse` should set throwable exception within LeaveGroupResponseData
The patch also adds more unit tests to guarantee correctness for leave group protocol.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
* Added removeOldLeaderMetrics in BrokerTopicStats to remove MessagesInPerSec, BytesInPerSec, BytesOutPerSec for any broker that is no longer a leader of any partition for a particular topic
* Modified ReplicaManager to remove the metrics of any topic that the current broker has no leadership (meaning the broker either becomes a follower for all of the partitions in that topic or stops being a replica)
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jun Rao <junrao@gmail.com>
As part of #5425 the streams default override for producer retries was removed. The documentation was not updated to reflect that change.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
This patch changes maybeSendTransactionalRequest to handle both sending and polling transactional requests (and renames it to maybeSendAndPollTransactionalRequest), and skips the call to poll if no request is actually sent. It also removes the inner loop inside maybeSendAndPollTransactionalRequest and relies on the main Sender loop for retries.
Reviewers: Jason Gustafson <jason@confluent.io>
The timestamp extractor takes a previousTimestamp parameter which should be the partition time. This PR adds back in partition time tracking for the extractor, and renames previousTimestamp --> partitionTime
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Matthias J. Sax <mjsax@apache.org>
If there are **no topics** in a cluster, kafka-topics.sh --describe without a --topic option should return empty list, not throw an exception.
Reviewers: Jason Gustafson <jason@confluent.io>
Api exception types usually have a single argument constructor which accepts the exception message. However, some types actually use this constructor to initialize a field. This inconsistency has led to some cases where exception messages were being incorrectly passed to these constructors and interpreted incorrectly. For example, this leads to confusing messages like the following in the log when we hit a GROUP_MAX_SIZE_REACHED error:
```
Attempt to join group failed due to fatal error: Consumer group The consumer group has reached its max size. already has the configured ...
```
This patch fixes the problem by changing these constructors so that the exception message is provided consistently. This affected `GroupAuthorizationException`, `TopicAuthorizationException`, and `GroupMaxSizeReachedException`.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Ismael Juma <ismael@juma.me.uk>
The two digit formatting we use in `Utils.formatBytes` depends on the english locale. If run from a different locale (e.g. German), the test case fails. This patch uses english explicitly.
Reviewers: Lee Dongjin <dongjin@apache.org>, Jason Gustafson <jason@confluent.io>
The transient failures are usually caused by a timeout in `awaitCommits`. This patch increases the timeout from 15s to 30s.
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Matthias J. Sax <mjsax@apache.org>
This PR is to use KeyValueTimeStamp Object in MockProcessor Test file instead of String and change all the dependency files with broken test cases.
Reviewers: Kamal Chandraprakash, Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Ensure that Producer#send() throws topic metadata exceptions only for the topic being sent to and not for other topics in the producer's metadata instance. Also removes topics from consumer's metadata when a topic is removed using manual assignment.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )
Reviewers: pkleindl <44436474+pkleindl@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
This patch changes the name of the `Resources` field of AlterConfigsResponse to `Responses`. This makes it consistent with AlterConfigsResponse, which has a differently-named but structurally-identical field. Tested with unit tests.
Reviewers: Jason Gustafson <jason@confluent.io>
We have seen some failures recently in `RestServerTest`. It's the usual problem with reliance on static ports.
```
Caused by: java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:8083
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:346)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:308)
at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.server.Server.doStart(Server.java:396)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.kafka.connect.runtime.rest.RestServer.initializeServer(RestServer.java:178)
... 56 more
Caused by: java.net.BindException: Address already in use
```
This patch makes the chosen port dynamic.
Reviewers: Ismael Juma <ismael@juma.me.uk>
The RegexSourceIntegrationTest has some flakiness as it deletes and re-creates the same output topic before each test. This PR reduces the chance for errors by creating a unique output topic for each test.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>
Correct the Flatten SMT to properly handle null key or value `Struct` instances.
Author: Michal Borowiecki <michal.borowiecki@openbet.com>
Reviewers: Arjun Satish <arjun@confluent.io>, Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com>
fix checkpoint file warning by filtering checkpointable offsets per task
clean up state manager hierarchy to prevent similar bugs
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
This patch fixes a compatibility breaking `MemberDescription` constructor change in #6957. It also updates `equals` and `hashCode` for the new `groupInstanceId` field that was added in the same patch.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
The purpose of this patch is to restrict the paths for updating and accessing the high watermark and the last stable offset. By doing so, we can validate that these offsets always remain within the range of the log. We also ensure that we never expose `LogOffsetMetadata` unless it is fully materialized. Finally, this patch makes a few naming changes. In particular, we remove the `highWatermark_=` and `highWatermarkMetadata_=` which are both misleading and cumbersome to test.
Reviewers: David Arthur <mumrah@gmail.com>
The rebalance timeout was added to the JoinGroup protocol in version 1. Prior to 2.3,
we handled version 0 JoinGroup requests by setting the rebalance timeout to be equal
to the session timeout. We lost this logic when we converted the API to use the
generated schema definition (#6419) which uses the default value of -1. The impact
of this is that the group rebalance timeout becomes 0, so rebalances finish immediately
after we enter the PrepareRebalance state and kick out all old members. This causes
consumer groups to enter an endless rebalance loop. This patch restores the old
behavior.
Reviewers: Ismael Juma <ismael@juma.me.uk>
ZooKeeper 3.5.5 is the first stable release in the 3.5.x series. The key new feature
in is TLS support, but there are a few more noteworthy features:
* Dynamic reconfiguration
* Local sessions
* New node types: Container, TTL
* Ability to remove watchers
* Multi-threaded commit processor
* Upgraded to Netty 4.1
See the release notes for more detail:
https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html
In addition to the version bump, we:
* Add `commons-cli` dependency as it's required by `ZooKeeperMain`, but specified as
`provided` in their pom.
* Remove unnecessary `ZooKeeperMainWrapper`, the bug it worked around was fixed
upstream a long time ago.
* Ignore non zero exit in one system test invocation of `ZooKeeperMain`.
`ZooKeeperMainWrapper` always returned `0` and `ZooKeeperService.query` relies
on that for correct behavior.
Reviewers: Jason Gustafson <jason@confluent.io>
We noticed a lot of messages like the following in the broker logs recently:
```
[2019-07-10 02:01:23,946] WARN [ReplicaManager broker=0] While updating the HW for follower -1 for partition connect-storage-topic-connect-cluster-0, the replica could not be found. (kafka.server.ReplicaManager:70)
```
In the KIP-392 PR, we added logic to track the high watermark of followers, but it is invoked even for consumer fetches. This doesn't cause any harm other than all the log noise.
This patch just adds the missing follower check.
Reviewers: David Arthur <mumrah@gmail.com>
Clarify the behavior of `max.poll.interval.ms` for static consumers since it is slightly different from dynamic members.
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Jason Gustafson <jason@confluent.io>
We don't allow changing number of partitions for internal topics. To do
so we check if the topic name belongs to the set of internal topics
directly instead of using the "isInternalTopic" method. This breaks the
encapsulation by making client aware of the fact that internal topics
have special names.
This is a simple change to use the method `Topic::isInternalTopic`
method instead of checking it directly in "alterTopic" command. We
also reduce visibility to `Topic::INTERNAL_TOPICS` to avoid
unnecessary reliance on it in the future.
Reviewers: Jason Gustafson <jason@confluent.io>
A bug in `WorkerConfigTransformer` prevents the connector configuration reload when the ConfigData TTL expires.
The issue boils down to the fact that `worker.herder().restartConnector` is receiving a null callback.
```
[2019-06-17 14:34:12,320] INFO Scheduling a restart of connector workshop-incremental in 60000 ms (org.apache.kafka.connect.runtime.WorkerConfigTransformer:88)
[2019-06-17 14:34:12,321] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:227)
java.lang.NullPointerException
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1187)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1183)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:273)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:219)
```
This patch adds a callback which just logs the error.
Reviewers: Robert Yokota <rayokota@gmail.com>, Jason Gustafson <jason@confluent.io>
Users often use the RocksDBConfigSetter to modify parameters such as cache or block size, which must be set through the BlockBasedTableConfig object. Rather than creating a new object in the config setter, however, users should most likely retrieve a reference to the existing one so as to not lose the other defaults (eg the BloomFilter)
There have been notes from the community that it is not obvious this should be done, nor is it immediately clear how to do so. This PR updates the RocksDBConfigSetter docs to hopefully improve things.
I also piggybacked a few minor cleanups in the docs
Reviewers: Kamal Chandraprakash, Jim Galasyn <jim.galasyn@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
ZkUtils was removed so we don't need this anymore.
Also:
* Fix ZkSecurityMigrator and ReplicaManagerTest not to
reference ZkClient classes.
* Remove references to zkclient in various `log4j.properties`
and `import-control.xml`.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
We recently saw a few failing tests recently due to the static reliance on port 5000. For example:
```
org.apache.kafka.common.KafkaException: Socket server failed to bind to localhost:5000: Address already in use.
at kafka.network.Acceptor.openServerSocket(SocketServer.scala:605)
at kafka.network.Acceptor.<init>(SocketServer.scala:481)
at kafka.network.SocketServer.createAcceptor(SocketServer.scala:253)
at kafka.network.SocketServer.$anonfun$createControlPlaneAcceptorAndProcessor$1(SocketServer.scala:234)
at kafka.network.SocketServer.$anonfun$createControlPlaneAcceptorAndProcessor$1$adapted(SocketServer.scala:232)
at scala.Option.foreach(Option.scala:438)
at kafka.network.SocketServer.createControlPlaneAcceptorAndProcessor(SocketServer.scala:232)
at kafka.network.SocketServer.startup(SocketServer.scala:119)
at kafka.network.SocketServerTest.withTestableServer(SocketServerTest.scala:1139)
at kafka.network.SocketServerTest.testControlPlaneRequest(SocketServerTest.scala:198)
```
This patch fixes the failing tests to dynamically select the port.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Follow on to #6731, this PR adds broker-side support for [KIP-392](https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica) (fetch from followers).
Changes:
* All brokers will handle FetchRequest regardless of leadership
* Leaders can compute a preferred replica to return to the client
* New ReplicaSelector interface for determining the preferred replica
* Incremental fetches will include partitions with no records if the preferred replica has been computed
* Adds new JMX to expose the current preferred read replica of a partition in the consumer
Two new conditions were added for completing a delayed fetch. They both relate to communicating the high watermark to followers without waiting for a timeout:
* For regular fetches, if the high watermark changes within a single fetch request
* For incremental fetch sessions, if the follower's high watermark is lower than the leader
A new JMX attribute `preferred-read-replica` was added to the `kafka.consumer:type=consumer-fetch-manager-metrics,client-id=some-consumer,topic=my-topic,partition=0` object. This was added to support the new system test which verifies that the fetch from follower behavior works end-to-end. This attribute could also be useful in the future when debugging problems with the consumer.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
This PR fixes a wrong input stream name in PipeDemo's javadoc.
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Jason Gustafson <jason@confluent.io>
Add a "useConfiguredPartitioner" boolean to specify testing with the configured partitioner, rather than overriding the partitioner in the test.
Add a "skipFlush" boolean to specify skipping the flush operation when producing. This is helpful when testing some scenarios where linger.ms is greater than 0.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
The Pipe.java file should exist within the myapps package directory.
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>