This test sometimes fails with
```
kafka.tools.MirrorMaker$NoRecordsException
at kafka.tools.MirrorMaker$ConsumerWrapper.receive(MirrorMaker.scala:483)
at kafka.tools.MirrorMakerIntegrationTest$$anonfun$testCommaSeparatedRegex$1.apply$mcZ$sp(MirrorMakerIntegrationTest.scala:92)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:738)
```
The test should catch `NoRecordsException` instead of `TimeoutException`.
Reviewers: Ismael Juma <ismael@juma.me.uk>
On the follower side, for the empty `LogAppendInfo` retrieved from the leader, fetcherLagStats set the wrong lag for fetcherLagStats due to `nextOffset` is zero.
There were several reported incidents where the log is rolled to a new segment with the same base offset as an active segment, causing KafkaException: Trying to roll a new log segment for topic partition X-N with start offset M while it already exists. In the cases we have seen, this happens to an empty log segment where there is long idle time before the next append and somehow we get to a state where offsetIndex.isFull() returns true due to _maxEntries == 0. This PR recovers from this state by deleting and recreating the segment and all of its associated index files.
Reviewers: Jason Gustafson <jason@confluent.io>
Delayed fetch operations acquire leaderIsrUpdate read lock of one or more Partitions from the fetch request when attempting to complete the fetch operation. While appending new records, complete fetch requests after releasing leaderIsrUpdate of the Partition to which records were appended to avoid deadlocks in request handler threads.
Reviewers: Jason Gustafson <jason@confluent.io>, Jun Rao <junrao@gmail.com>
EndToEndLatency tool produces a dummy record in case the topic does not exist. This behavior was introduced in this PR https://github.com/apache/kafka/pull/5319 as part of updating the tool to use latest consumer API. However, if we run the tool with producer acks == 1, the high watermark may not be updated before we reset consumer offsets to latest. In rare cases when this happens, the tool will throw an exception in the for loop where the consumer will unexpectedly consume the dummy record. As a result, we occasionally see Benchmark.test_end_to_end_latency system test failures.
This PR checks if topic exists, and creates the topic using AdminClient if it does not exist.
Author: Anna Povzner <anna@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5950 from apovzner/fix-EndToEndLatency
- Remove ZKUtils usage from various tests
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Sriharsha Chintalapani <sriharsha@apache.org>, Ismael Juma <ismael@juma.me.uk>, Satish Duggana <satishd@apache.org>, Jun Rao <junrao@gmail.com>, Ryanne Dolan <ryannedolan@gmail.com>
Closes#5480 from omkreddy/zkutils
Some deserializer needs the topic name to be able to correctly deserialize the payload of the message.
Console consumer works great with Deserializer<String> however it calls deserializer with topic set as null.
This breaks the API and the topic information is available in the ConsumerRecord.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Gardner Vickers <gardner@vickers.me>, Jun Rao <junrao@gmail.com>
Changes made as part of this [KIP-374](https://cwiki.apache.org/confluence/x/FgSQBQ) and [KAFKA-7418](https://issues.apache.org/jira/browse/KAFKA-7418)
- Checking for empty args or help option in command file to print Usage
- Added new class to enforce help option to all commands
- Refactored few lines (ex `PreferredReplicaLeaderElectionCommand`) to
make use of `CommandDefaultOptions` attributes.
- Made the changes in help text wordings
Run the unit tests in local(Windows) few Linux friendly tests are failing but
not any functionality, verified `--help` and no option response by running
Scala classes, since those all are having `main` method.
Author: Srinivas Reddy <srinivas96alluri@gmail.com>
Author: Srinivas Reddy <mrsrinivas@users.noreply.github.com>
Author: Srinivas <srinivas96alluri@gmail.com>
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Manikumar Reddy <manikumar.reddy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
Closes#5910 from mrsrinivas/KIP-374
While metrics like Min, Avg and Max make sense to respective use Double.MAX_VALUE, 0.0 and Double.MIN_VALUE as default values to ease computation logic, exposing those values makes reading them a bit misleading. For instance, how would you differentiate whether your -avg metric has a value of 0 because it was given samples of 0 or no samples were fed to it?
It makes sense to standardize on the output of these metrics with something that clearly denotes that no values have been recorded.
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Improve the default group id behavior by:
* changing the default consumer group to null, where no offset commit or fetch, or group management operations are allowed
* deprecating the use of empty (`""`) consumer group on the client
Reviewers: Jason Gustafson <jason@confluent.io>
ReplicaFetcherThread.shutdown attempts to close the fetcher's Selector while the thread is running. This in unsafe and can result in `Selector.close()` failing with an exception. The exception is caught and logged at debug level, but this can lead to socket leak if the shutdown is due to dynamic config update rather than broker shutdown. This PR changes the shutdown logic to close Selector after the replica fetcher thread is shutdown, with a wakeup() and flag to terminate blocking sends first.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
We are seeing some timeouts in tests which depend on the awaitCommitCallback (e.g.
SaslMultiMechanismConsumerTest.testCoordinatorFailover). After some investigation,
we found that it is caused by a disconnect when attempting the async commit.
To fix the problem, we have added simple retry logic to the test utility.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>
- Use Xlint:all with 3 exclusions (filed KAFKA-7613 to remove the exclusions)
- Use the same javac options when compiling tests (seems accidental that
we didn't do this before)
- Replaced several deprecated method calls with non-deprecated ones:
- `KafkaConsumer.poll(long)` and `KafkaConsumer.close(long)`
- `Class.newInstance` and `new Integer/Long` (deprecated since Java 9)
- `scala.Console` (deprecated in Scala 2.11)
- `PartitionData` taking a timestamp (one of them seemingly a bug)
- `JsonMappingException` single parameter constructor
- Fix unnecessary usage of raw types in several places.
- Add @SuppressWarnings for deprecations, unchecked and switch fallthrough in
several places.
- Scala clean-ups (var -> val, ETA expansion warnings, avoid reflective calls)
- Use lambdas to simplify code in a few places
- Add @SafeVarargs, fix varargs usage and remove unnecessary `Utils.mkList` method
Reviewers: Matthias J. Sax <mjsax@apache.org>, Manikumar Reddy <manikumar.reddy@gmail.com>, Randall Hauch <rhauch@gmail.com>, Bill Bejeck <bill@confluent.io>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
There seems to be no reason to keep this around since it is not used outside
of testing and AbstractIterator is basically the same thing.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Instead of calling deleteSnapshotsAfterRecoveryPointCheckpoint for allLogs, invoking it only for the logs being truncated.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
EasyMock 4.0.x includes a change that relies on the caller for inferring
the return type of mock creator methods. Updated a number of Scala
tests for compilation and execution to succeed.
The versions of EasyMock and PowerMock in this PR include full support
for Java 11.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
* Add logic to retry the BrokerInfo registration into ZooKeeper
In case the ZooKeeper session has been regenerated and the broker
tries to register the BrokerInfo into Zookeeper, this code deletes
the current BrokerInfo from Zookeeper and creates it again, just if
the znode ephemeral owner belongs to the Broker which tries to register
himself again into ZooKeeper
* Add test to validate the BrokerInfo re-registration into ZooKeeper
This PR avoids sending out full UpdateMetadataReuqest in the following scenarios:
1. On broker startup, send out full UpdateMetadataRequest to newly added brokers and only send out UpdateMetadataReuqest with empty partition states to existing brokers.
2. On broker failure, if it doesn't require leader election, only include the states of partitions that are hosted by the dead broker(s) in the UpdateMetadataReuqest instead of including all partition states.
This PR also introduces a minor optimization in the MetadataCache update to avoid copying the previous partition states upon receiving UpdateMetadataRequest with no partition states.
Reviewers: Jun Rao <junrao@gmail.com>
This patch fixes two issues:
1) Currently if a broker received StopReplicaRequest with delete=true for the same offline replica, the first StopRelicaRequest will show KafkaStorageException and the second StopRelicaRequest will show ReplicaNotAvailableException. This is because the first StopRelicaRequest will remove the mapping (tp -> ReplicaManager.OfflinePartition) from ReplicaManager.allPartitions before returning KafkaStorageException, thus the second StopRelicaRequest will not find this partition as offline.
This result appears to be inconsistent. And since the replica is already offline and broker will not be able to delete file for this replica, the StopReplicaRequest should fail without making any change and broker should still remember that this replica is offline.
2) Currently if broker receives StopReplicaRequest with delete=true, the broker will attempt to remove future replica for the partition, which will cause KafkaStorageException in the StopReplicaResponse if this replica does not have future replica. It is problematic to always return KafkaStorageException in the response if future replica does not exist.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#5533 from lindong28/KAFKA-7313
As part of KIP-320, the ListOffsets API should return the leader epoch of any fetched offset. We either get this epoch from the log itself for a timestamp query or from the epoch cache if we are searching the earliest or latest offset in the log. When handling queries for the latest offset, we have elected to choose the current leader epoch, which is consistent with other handling (e.g. OffsetsForTimes).
Reviewers: Jun Rao <junrao@gmail.com>
We've been seeing some hanging builds recently (see KAFKA-7553). Consistently the culprit seems to be a test case in PlaintextConsumerTest. This patch doesn't fix the underlying issue, but it eliminates a few places where these test cases could block:
1. It replaces several calls to the deprecated `poll(long)` which can block indefinitely in the worst case in order to join the group with `poll(Duration)` which respects the timeout.
2. It also fixes a consume utility in `TestUtils` which can block for a long time depending on the number of records that are expected to be consumed.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin Patrick McCabe <colin@cmccabe.xyz>
I've seen this test fail with
```
java.lang.AssertionError: Throttled replication of 6352ms should be < 6000ms
```
A contributing factor is that it starts counting the time it took for replication
before the replication itself has started. `createServer()` initializes ZK and
other systems before it starts up the replication thread.
I ran the test 25 times locally both ways.
Average `throttledTook` before the change: 5341.75
Mean `throttledTook` after the change: 5256.92
Note that those are the results from `./gradlew core:test --tests kafka.server.ReplicationQuotasTest.shouldThrottleOldSegments`. I've noticed that if
I run the whole test class `ReplicationQuotasTest`, the `throttledTook` is close
~4100.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch makes two improvements to internal metadata handling logic and testing:
1. It reduce dependence on the public object `Cluster` for internal metadata propagation since it is not easy to evolve. As an example, we need to propagate leader epochs from the metadata response to `Metadata`, but it is not straightforward to do this without exposing it in `PartitionInfo` since that is what `Cluster` uses internally. By doing this change, we are able to remove some redundant `Cluster` building logic.
2. We want to make the metadata handling in `MockClient` simpler and more consistent. Currently we have mix of metadata update mechanisms which are internally inconsistent with each other and do not match the implementation in `NetworkClient`.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
In the `consume` method, the consumer subscribes the topic, so no need to do
the same thing before the method call. Also include minor clean-up in `consume`.
Reviewers: Ismael Juma <ismael@juma.me.uk>
`testMarksPartitionsAsOfflineAndPopulatesUncleanableMetrics` sometimes fails
because the 15 second timeout expires. Inspecting the error message from the build
failure, we see that this timeout happens in the writeDups() calls which call roll().
```text
[2018-10-23 15:18:51,018] ERROR Error while flushing log for log-1 in dir /tmp/kafka-8190355063195903574 with offset 74 (kafka.server.LogDirFailureChannel:76)
java.nio.channels.ClosedByInterruptException
...
at kafka.log.Log.roll(Log.scala:1550)
...
at kafka.log.AbstractLogCleanerIntegrationTest.writeDups(AbstractLogCleanerIntegrationTest.scala:132)
...
```
After investigating, I saw that this test would call Log#roll() around 60 times every run.
Increasing the segmentSize config to `2048` reduces the number of Log#roll() calls
while ensuring that there are multiple rolls still.
I saw that most other LogCleaner tests also call roll() ~90 times, so I've changed the
default to be `2048`. I've also made the one test which requires a smaller segmentSize
to set it via the args.
Reviewers: Ismael Juma <ismael@juma.me.uk>
KIP-368 implementation to enable periodic re-authentication of SASL clients. Also adds a broker configuration option to terminate client connections that do not re-authenticate within the configured interval.
This line prints out (when empty):
```
[2018-10-23 12:19:59,977] INFO [Controller id=0] Removed ArrayBuffer() from list of shutting down brokers. (kafka.controller.KafkaController)
```
Use `mkString` to eliminate `ArrayBuffer` and only log if not empty.
Reviewers: Ismael Juma <ismael@juma.me.uk>
FetchResponse should return the partitionData's lastStabeleOffset
Author: lambdaliu <lambdaliu@tencent.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Dhruvil Shah <dhruvil@confluent.io>, Dong Lin <lindong28@gmail.com>
Closes#5835 from lambdaliu/KAFKA-7535
`ControllerIntegrationTest#waitUntilControllerEpoch` sometimes fails with the following error:
```
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at kafka.controller.ControllerIntegrationTest$$anonfun$waitUntilControllerEpoch$1.apply$mcZ$sp(ControllerIntegrationTest.scala:312)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:779)
at kafka.controller.ControllerIntegrationTest.waitUntilControllerEpoch(ControllerIntegrationTest.scala:312)
at kafka.controller.ControllerIntegrationTest.testEmptyCluster(ControllerIntegrationTest.scala:51)
```
We should retry until the value is defined or it times out.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Make sure that the transaction state is properly cleared when the
`transactionalId-expiration` task fails. Operations on that transactional
id would otherwise return a `CONCURRENT_TRANSACTIONS` error
and appear "untouchable" to transaction state changes, preventing
transactional producers from operating until a broker restart or
transaction coordinator change.
Unit tested by verifying that having the `transactionalId-expiration` task
won't leave the transaction metadata in a pending state if the replica
manager returns an error.
Reviewers: Jason Gustafson <jason@confluent.io>
Just a doc change
Author: John Eismeier <john.eismeier@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4573 from jeis2497052/trunk
After KAFKA-6051, we close leaderEndPoint in replica fetcher thread initiateShutdown to try to preempt in-progress fetch request and accelerate repica fetcher thread shutdown. However, leaderEndpoint can throw an Exception when the replica fetcher thread is still actively fetching, which can cause ReplicaManager to fail to shutdown cleanly. This PR catches the exceptions thrown in "leaderEndpoint.close()" instead of letting it throw up in the call stack.
Author: Zhanxiang (Patrick) Huang <hzxa21@hotmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5808 from hzxa21/KAFKA-7464
This patch adds logging of topic config overrides during creation or during the handling of alter config requests. Also did some minor cleanup to avoid redundant validation logic when adding partitions.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#5812 from hachikuji/minor-log-topic-creation-configs
Add CreatePartitionsRequest.PartitionDetails similar to CreateTopicsRequest.TopicDetails to avoid references from `common.requests` package to `clients`.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Adds `client.dns.lookup=resolve_canonical_bootstrap_servers_only` option to perform full dns resolution of bootstrap addresses
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Sriharsha Chintalapani <sriharsha@apache.org>, Edoardo Comar <ecomar@uk.ibm.com>, Mickael Maison <mickael.maison@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Implementation of KIP-302: Based on the new client configuration `client.dns.lookup`, a NetworkClient can use InetAddress.getAllByName to find all IPs and iterate over them when they fail to connect. Only uses either IPv4 or IPv6 addresses similar to the default mode.
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>