Implementation of KIP-302: Based on the new client configuration `client.dns.lookup`, a NetworkClient can use InetAddress.getAllByName to find all IPs and iterate over them when they fail to connect. Only uses either IPv4 or IPv6 addresses similar to the default mode.
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Nikolay Izhikov <nizhikov@apache.org>
PR #2267 Introduced support for Zstandard compression. The relevant test expects values for `num_nodes` and `num_producers` based on the (now-incremented) count of compression types.
Passed the affected, previously-failing test:
`ducker-ak test tests/kafkatest/tests/client/compression_test.py`
Reviewers: Jason Gustafson <jason@confluent.io>
- SslFactoryTest should use SslFactory to create SSLEngine
- Use Mockito instead of EasyMock in `ConsoleConsumerTest` as one of
the tests mocks a standard library class and the latest released EasyMock
version can't do that when Java 11 is used.
- Avoid mocking `ConcurrentMap` in `SourceTaskOffsetCommitterTest`
for similar reasons. As it happens, mocking is not actually needed here.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
* Fix `ZkUtils.getReassignmentJson` to pass Java map to `Json.encodeAsString`
* Allow new file creation in ReplicationQuotasTestRig test
Reviewers: Ismael Juma <ismael@juma.me.uk>
Development of EasyMock and PowerMock has stagnated while Mockito
continues to be actively developed. With the new Java release cadence,
it's a problem to depend on libraries that do bytecode manipulation
and are not actively maintained. In addition, Mockito is also
easier to use.
While updating the tests, I attempted to go from failing test to
passing test. In cases where the updated test passed on the first
attempt, I artificially broke it to ensure the test was still doing its
job.
I included a few improvements that were helpful while making these
changes:
1. Better exception if there are no nodes in `leastLoadedNodes`
2. Always close the producer in `KafkaProducerTest`
3. requestsInFlight producer metric should not hold a reference to
`Sender`
Finally, `Metadata` is no longer final so that we don't need
`PowerMock` to mock it. It's an internal class, so it's OK.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5691 from ijuma/kafka-7438-mockito
Satish Duggana <sduggana@hortonworks.com>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Ensure that `TestUtils.waitUntilTrue(..)` is blocked on both send completed and a new leader being assigned
Author: Gardner Vickers <gardner@vickers.me>
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Dong Lin <lindong28@gmail.com>
Closes#5695 from gardnervickers/log-dir-failure-test-fix
The thread no longer dies. When encountering an unexpected error, it marks the partition as "uncleanable" which means it will not try to clean its logs in subsequent runs.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Jun Rao <junrao@gmail.com>
OAuthBearerLoginModule is used both on the server-side and client-side (similar to login modules for other mechanisms). OAUTHBEARER tokens are client credentials used only on the client-side to authenticate with servers, but the current implementation requires tokens to be provided on the server-side even if OAUTHBEARER is not used for inter-broker communication. This commit makes tokens optional for server-side login context to allow brokers to be configured without a token when OAUTHBEARER is not used for inter-broker communication.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Jun Rao <junrao@gmail.com>
This patch contains the broker-side support for the fencing improvements from KIP-320. This includes the leader epoch validation in the ListOffsets, OffsetsForLeaderEpoch, and Fetch APIs as well as the changes needed in the fetcher threads to maintain and use the current leader epoch. The client changes from KIP-320 will be left for a follow-up.
One notable change worth mentioning is that we now require the read lock in `Partition` in order to read from the log or to query offsets. This is necessary to ensure the safety of the leader epoch validation. Additionally, we forward all leader epoch changes to the replica fetcher thread and go through the truncation phase. This is needed to ensure the fetcher always has the latest epoch and to guarantee that we cannot miss needed truncation if we missed an epoch change.
Reviewers: Jun Rao <junrao@gmail.com>
The parent classloader of the DelegatingClassLoader and therefore the classloading scheme used by Connect does not have to be fixed to the System classloader.
Setting it the same as the one that was used to load the DelegatingClassLoader class itself is more flexible and, while in most cases will result in the System classloader to be used, it will also work in othr managed environments that control classloading differently (OSGi, and others).
The fix is minimal and the mainstream use is tested via system tests.
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
This patch adds checks before reading the first record of a control batch. If the batch is empty, it is treated as having already been cleaned. In the case of LogCleaner this means it is safe to discard. In the case of ProducerStateManager it means it shouldn't cause state to be stored because the relevant transaction has already been cleaned. In the case of Fetcher, it just preempts the check for an abort. In the case of GroupMetadataManager, it doesn't process the offset as a commit. The patch also adds isControl to the output of DumpLogSegments. Changes were tested with unit tests, except the DumpLogSegments change which was tested manually.
In `ConsumerBuilder.build`, if `awaitInitialPositions` raises an exception, the consumer will not be closed properly. We should add the consumer instance to the `consumers` collection immediately after construction.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Pricing for m3.xlarge: On-Demand is at $0.266. Reserved is at about $0.16 (40% discount). And Spot is at $0.0627 (76% discount relative to On-Demand, or 60% discount relative to Reserved). Insignificant fluctuation in the past 3 months.
Ran on branch builder and works as expected -- each worker is created using spot instances (https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1982/console)
This can be safely backported to 0.10.2 (tested using https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1983/)
Author: Max Zheng <maxzheng.os@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5707 from maxzheng/minor-switch@trunk
In recent PRs, we have been confused about the proper usage of
StatefulProcessorNode (#5731 , #5737 )
This change disambiguates it.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Various converters (AvroConverter and JsonConverter) produce a
SchemaAndValue consisting of a logical schema type and a java.util.Date.
This is a fix for SchemaProjector to properly handle the Date.
Author: Robert Yokota <rayokota@gmail.com>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5736 from rayokota/KAFKA-7476
In unrelated recent work, I noticed some warnings about the missing type parameters on ProcessorParameters.
While investigating it, it seems like there was a bug in the creation of repartition topics.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
This patch ensures that the leader epoch cache is updated when a broker becomes leader with the latest epoch and the log end offset as its starting offset. This guarantees that the leader will be able to provide the right truncation point even if the follower has data from leader epochs which the leader itself does not have. This situation can occur when there are back to back leader elections.
Additionally, we have made the following changes:
1. The leader epoch cache enforces monotonically increase epochs and starting offsets among its entry. Whenever a new entry is appended which violates requirement, we remove the conflicting entries from the cache.
2. Previously we returned an unknown epoch and offset if an epoch is queried which comes before the first entry in the cache. Now we return the smallest . For example, if the earliest entry in the cache is (epoch=5, startOffset=10), then a query for epoch 4 will return (epoch=4, endOffset=10). This ensures that followers (and consumers in KIP-320) can always determine where the correct starting point is for the active log range on the leader.
Reviewers: Jun Rao <junrao@gmail.com>
Reviewers: Johne Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
During the consumer group rebalance, when the joining group phase finishes, the heartbeat delayed operation of the consumer that fails to rejoin the group should be removed from the purgatory. Otherwise, even though the member ID of the consumer has been removed from the group, its heartbeat delayed operation is still registered in the purgatory and the heartbeat delayed operation is going to timeout and then another unnecessary rebalance is triggered because of it.
Author: Lincong Li <lcli@linkedin.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5556 from Lincong/remove_heartbeat_delayedOperation
KIP-372 (allow naming all internal topics) was designed and developed concurrently with suppression.
Since suppression introduces a new internal topic, it also needs to be nameable.
Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This is Part 4 of suppression (durability)
Part 1 was #5567 (the API)
Part 2 was #5687 (the tests)
Part 3 was #5693 (in-memory buffering)
Implement a changelog for the suppression buffer so that the buffer state may be recovered on restart or recovery.
As of this PR, suppression is suitable for general usage.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
In some tests, the check monitoring the JMX tool log output doesn’t quite wait long enough before failing. Increasing the timeout from 10 to 20 seconds.
`ConsumerCoordinator.onJoinPrepare()` currently makes multiple copies of the set of assigned partitions. We can let `subscriptions.assignedPartitions()` return a view of the underlying partition set, copy it only once and re-use the copied value.
Author: radai-rosenblatt <radai.rosenblatt@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Dong Lin <lindong28@gmail.com>
Closes#5124 from radai-rosenblatt/copy-all-the-things