EasyMock 4.0.x includes a change that relies on the caller for inferring
the return type of mock creator methods. Updated a number of Scala
tests for compilation and execution to succeed.
The versions of EasyMock and PowerMock in this PR include full support
for Java 11.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
* Add logic to retry the BrokerInfo registration into ZooKeeper
In case the ZooKeeper session has been regenerated and the broker
tries to register the BrokerInfo into Zookeeper, this code deletes
the current BrokerInfo from Zookeeper and creates it again, just if
the znode ephemeral owner belongs to the Broker which tries to register
himself again into ZooKeeper
* Add test to validate the BrokerInfo re-registration into ZooKeeper
The problem is the concurrent metadata updates in the foreground and in the heartbeat thread. Changed the code to use ConsumerNetworkClient.poll, which enforces mutual exclusion when accessing the underlying client.
[KAFKA-7431](https://issues.apache.org/jira/browse/KAFKA-7431)
Changes made to improve the code readability:
- Removed `throws Exception` from the place where there won't be an
exception
- Removed type arguments where those can be inferred explicitly by compiler
- Rewritten Anonymous classes to Java 8 with lambdas
Author: Srinivas Reddy <mrsrinivas@users.noreply.github.com>
Author: Srinivas Reddy <srinivas96alluri@gmail.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Ryanne Dolan <ryannedolan@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5681 from mrsrinivas/cleanup-connect-uts
Currently PushHttpMetricsReporter will convert value from KafkaMetric.metricValue() to double. This will not work for non-numerical metrics such as version in AppInfoParser whose value can be string. This has caused issue for PushHttpMetricsReporter which in turn caused system test kafkatest.tests.client.quota_test.QuotaTest.test_quota to fail.
Since we allow metric value to be object, PushHttpMetricsReporter should also read metric value as object and pass it to the http server.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5886 from lindong28/KAFKA-7560
This PR avoids sending out full UpdateMetadataReuqest in the following scenarios:
1. On broker startup, send out full UpdateMetadataRequest to newly added brokers and only send out UpdateMetadataReuqest with empty partition states to existing brokers.
2. On broker failure, if it doesn't require leader election, only include the states of partitions that are hosted by the dead broker(s) in the UpdateMetadataReuqest instead of including all partition states.
This PR also introduces a minor optimization in the MetadataCache update to avoid copying the previous partition states upon receiving UpdateMetadataRequest with no partition states.
Reviewers: Jun Rao <junrao@gmail.com>
We seemed to be missing the usual rolling upgrade instructions so I've added them and emphasized the impact for downgrades after bumping the inter-broker protocol version.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Dong Lin <lindong28@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#5857 from hachikuji/KAFKA-7481
Currently, the startup timeout is hardcoded to be 60 seconds in Connect's test service. Modifying it to be passable via init. This can safely be backported as well.
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
This patch fixes two issues:
1) Currently if a broker received StopReplicaRequest with delete=true for the same offline replica, the first StopRelicaRequest will show KafkaStorageException and the second StopRelicaRequest will show ReplicaNotAvailableException. This is because the first StopRelicaRequest will remove the mapping (tp -> ReplicaManager.OfflinePartition) from ReplicaManager.allPartitions before returning KafkaStorageException, thus the second StopRelicaRequest will not find this partition as offline.
This result appears to be inconsistent. And since the replica is already offline and broker will not be able to delete file for this replica, the StopReplicaRequest should fail without making any change and broker should still remember that this replica is offline.
2) Currently if broker receives StopReplicaRequest with delete=true, the broker will attempt to remove future replica for the partition, which will cause KafkaStorageException in the StopReplicaResponse if this replica does not have future replica. It is problematic to always return KafkaStorageException in the response if future replica does not exist.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#5533 from lindong28/KAFKA-7313
This fixes the Connect standalone system tests. See branch builder: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2021/
This should be backported to the 2.0 branch, since that's when the tests were first
modified to use the external property file.
Reviewers: Magesh Nandakumar <magesh.n.kumar@gmail.com>, Ismael Juma <ismael@juma.me.uk>
As part of KIP-320, the ListOffsets API should return the leader epoch of any fetched offset. We either get this epoch from the log itself for a timestamp query or from the epoch cache if we are searching the earliest or latest offset in the log. When handling queries for the latest offset, we have elected to choose the current leader epoch, which is consistent with other handling (e.g. OffsetsForTimes).
Reviewers: Jun Rao <junrao@gmail.com>
We've been seeing some hanging builds recently (see KAFKA-7553). Consistently the culprit seems to be a test case in PlaintextConsumerTest. This patch doesn't fix the underlying issue, but it eliminates a few places where these test cases could block:
1. It replaces several calls to the deprecated `poll(long)` which can block indefinitely in the worst case in order to join the group with `poll(Duration)` which respects the timeout.
2. It also fixes a consume utility in `TestUtils` which can block for a long time depending on the number of records that are expected to be consumed.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin Patrick McCabe <colin@cmccabe.xyz>
I've seen this test fail with
```
java.lang.AssertionError: Throttled replication of 6352ms should be < 6000ms
```
A contributing factor is that it starts counting the time it took for replication
before the replication itself has started. `createServer()` initializes ZK and
other systems before it starts up the replication thread.
I ran the test 25 times locally both ways.
Average `throttledTook` before the change: 5341.75
Mean `throttledTook` after the change: 5256.92
Note that those are the results from `./gradlew core:test --tests kafka.server.ReplicationQuotasTest.shouldThrottleOldSegments`. I've noticed that if
I run the whole test class `ReplicationQuotasTest`, the `throttledTook` is close
~4100.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch makes two improvements to internal metadata handling logic and testing:
1. It reduce dependence on the public object `Cluster` for internal metadata propagation since it is not easy to evolve. As an example, we need to propagate leader epochs from the metadata response to `Metadata`, but it is not straightforward to do this without exposing it in `PartitionInfo` since that is what `Cluster` uses internally. By doing this change, we are able to remove some redundant `Cluster` building logic.
2. We want to make the metadata handling in `MockClient` simpler and more consistent. Currently we have mix of metadata update mechanisms which are internally inconsistent with each other and do not match the implementation in `NetworkClient`.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Scala 2.12 has better support for newer Java versions and includes additional
compiler warnings that are helpful during development. In addition, Scala 2.11
hasn't been supported by the Scala community for a long time, the soon to be
released Spark 2.4.0 will finally support Scala 2.12 (this was the main reason
preventing many from upgrading to Scala 2.12) and Scala 2.13 is at the RC stage.
It's time to start recommending the Scala 2.12 build as we prepare support for
Scala 2.13 and start thinking about removing support for Scala 2.11.
In the meantime, Jenkins will continue to build all supported Scala versions (including
Scala 2.11) so the PR and trunk jobs will fail if people accidentally use methods
introduced in Scala 2.12.
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>
It seems a tipo was made on a avariable for the log dir with commit 81e789ae3dc6ea8369db181c5aef440491d74f19. Then Windows tries to access by default the /log directory which of cause not exists.
klesta490 Is it ok so? Or was the tilde ~ intentional?
To test:
On Windows with the a fresh downloaded Kafka, adapt the properties:
* dataDir in config/zookeeper.properties with what you want, windows-compatible
* log.dirs in config/server.properties with what you want, windows-compatible
and executes:
bin\windows\zookeeper-server-start.bat config\zookeeper.properties
Author: u214578 <florian.hof@sbb.ch>
Reviewers: Vladimír Kleštinec <klestinec@gmail.com>, Vahid Hashemian <vahid.hashemian@gmail.com>
Closes#5837 from florianhof/feature/fix_windows_log_param
In the `consume` method, the consumer subscribes the topic, so no need to do
the same thing before the method call. Also include minor clean-up in `consume`.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Sets StreamsConfig.STATED_DIR_CONFIG to temp directory in
SuppressionIntegrationTest, to match StreamsTestUtils.
This is a similar fix to #5826.
Reviewers: Ismael Juma <ismael@juma.me.uk>
`testMarksPartitionsAsOfflineAndPopulatesUncleanableMetrics` sometimes fails
because the 15 second timeout expires. Inspecting the error message from the build
failure, we see that this timeout happens in the writeDups() calls which call roll().
```text
[2018-10-23 15:18:51,018] ERROR Error while flushing log for log-1 in dir /tmp/kafka-8190355063195903574 with offset 74 (kafka.server.LogDirFailureChannel:76)
java.nio.channels.ClosedByInterruptException
...
at kafka.log.Log.roll(Log.scala:1550)
...
at kafka.log.AbstractLogCleanerIntegrationTest.writeDups(AbstractLogCleanerIntegrationTest.scala:132)
...
```
After investigating, I saw that this test would call Log#roll() around 60 times every run.
Increasing the segmentSize config to `2048` reduces the number of Log#roll() calls
while ensuring that there are multiple rolls still.
I saw that most other LogCleaner tests also call roll() ~90 times, so I've changed the
default to be `2048`. I've also made the one test which requires a smaller segmentSize
to set it via the args.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Set `StreamsConfig.STATED_DIR_CONFIG` in `SuppressScenarioTest`, as
with `StreamsTestUtils`. I have deliberately avoided using `StreamsTestUtils` as
this test sets bogus config parameters, but still fails if the default
`STATE_DIR_CONFIG` does not exist.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, John Roesler <john@confluent.io>, Ismael Juma <ismael@juma.me.uk>
KIP-368 implementation to enable periodic re-authentication of SASL clients. Also adds a broker configuration option to terminate client connections that do not re-authenticate within the configured interval.
This line prints out (when empty):
```
[2018-10-23 12:19:59,977] INFO [Controller id=0] Removed ArrayBuffer() from list of shutting down brokers. (kafka.controller.KafkaController)
```
Use `mkString` to eliminate `ArrayBuffer` and only log if not empty.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Author: Viktor Somogyi <viktorsomogyi@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Andras Katona <41361962+akatona84@users.noreply.github.com>, Dong Lin <lindong28@gmail.com>
Closes#5685 from viktorsomogyi/upgrade-notes-for-serializer-consolidation
FetchResponse should return the partitionData's lastStabeleOffset
Author: lambdaliu <lambdaliu@tencent.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Dhruvil Shah <dhruvil@confluent.io>, Dong Lin <lindong28@gmail.com>
Closes#5835 from lambdaliu/KAFKA-7535
`ControllerIntegrationTest#waitUntilControllerEpoch` sometimes fails with the following error:
```
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at kafka.controller.ControllerIntegrationTest$$anonfun$waitUntilControllerEpoch$1.apply$mcZ$sp(ControllerIntegrationTest.scala:312)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:779)
at kafka.controller.ControllerIntegrationTest.waitUntilControllerEpoch(ControllerIntegrationTest.scala:312)
at kafka.controller.ControllerIntegrationTest.testEmptyCluster(ControllerIntegrationTest.scala:51)
```
We should retry until the value is defined or it times out.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Corrects an error in the system tests:
```
07:55:45 [ERROR:2018-10-23 07:55:45,738]: Failed to import kafkatest.tests.connect.connect_test, which may indicate a broken test that cannot be loaded: NameError: name 'EXTERNAL_CONFIGS_FILE' is not defined
```
The constant is defined in the [services/connect.py](https://github.com/apache/kafka/blob/trunk/tests/kafkatest/services/connect.py#L43) file in the `ConnectServiceBase` class, but the problem is in the [tests/connect/connect_test.py](https://github.com/apache/kafka/blob/trunk/tests/kafkatest/tests/connect/connect_test.py#L50) `ConnectStandaloneFileTest`, which does *not* extend the `ConnectServiceBase class`. Suggestions welcome to be able to reuse that variable without duplicating the literal (as in this PR).
System test run with this PR: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2004/
If approved, this should be merged as far back as the `2.0` branch.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5832 from rhauch/fix-connect-externals-tests
Make sure that the transaction state is properly cleared when the
`transactionalId-expiration` task fails. Operations on that transactional
id would otherwise return a `CONCURRENT_TRANSACTIONS` error
and appear "untouchable" to transaction state changes, preventing
transactional producers from operating until a broker restart or
transaction coordinator change.
Unit tested by verifying that having the `transactionalId-expiration` task
won't leave the transaction metadata in a pending state if the replica
manager returns an error.
Reviewers: Jason Gustafson <jason@confluent.io>
Just a doc change
Author: John Eismeier <john.eismeier@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4573 from jeis2497052/trunk
Decrease the lower bound for expected available memory, as thread
scheduling entails that a variable amount of deallocation happens by
the point of assertion.
Also make minor clarifications to test logic and comments.
The passing rate improved from 98% to 100% locally after these
changes (100+ runs).
Reviewers: Ismael Juma <ismael@juma.me.uk>
### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
After KAFKA-6051, we close leaderEndPoint in replica fetcher thread initiateShutdown to try to preempt in-progress fetch request and accelerate repica fetcher thread shutdown. However, leaderEndpoint can throw an Exception when the replica fetcher thread is still actively fetching, which can cause ReplicaManager to fail to shutdown cleanly. This PR catches the exceptions thrown in "leaderEndpoint.close()" instead of letting it throw up in the call stack.
Author: Zhanxiang (Patrick) Huang <hzxa21@hotmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5808 from hzxa21/KAFKA-7464