Why
Current set of integration tests leak files in the /tmp directory which makes it cumbersome if you don't restart the machine often.
Fix
Replace the usage of File.createTempFile with existing TestUtils.tempFile method across the test files. TestUtils.tempFile automatically performs a clean up of the temp files generated in /tmp/ folder.
Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <mlists@juma.me.uk>
Also includes a minor quality-of-life improvement to clarify why some internal REST requests to workers may fail while that worker is still starting up.
Reviewers: Tom Bentley <tbentley@redhat.com>, Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
This implements KIP-830: https://cwiki.apache.org/confluence/display/KAFKA/KIP-830%3A+Allow+disabling+JMX+Reporter
It adds a new configuration `auto.include.jmx.reporter` that can be set to false to disable the JMX Reporter. This configuration is deprecated and will be removed in the next major version.
Reviewers: Tom Bentley <tbentley@redhat.com>, Christo Lolov <christo_lolov@yahoo.com>
Cache the Kafka cluster Id once it has been retrieved to avoid creating many Admin clients at startup.
Reviewers: Chris Egerton <fearthecellos@gmail.com>
This PR is created on top of #10904 and includes commits from original author for attribution.
## Testing
1. `./gradlew connect:runtime:unitTest --tests WorkerGroupMemberTest` is successful.
2. Verified that test is run as part of `./gradlew connect:runtime:unitTest` (see report in the PR)
Reviewers: Ismael Juma <ismael@juma.me.uk>
Co-authored-by: Chun-Hao Tang <tang7526@gmail.com>
Catches valid 404 exceptions, triggered by any HTTP request to a nonexistent path on the Connect REST API, higher in the code to not to log an ERROR log which can be seen as a false alarm
Reviewers: Chris Egerton <fearthecellos@gmail.com>
We should not treat UNKNOWN_MEMBER_ID as an unexpected error in the Admin client. In MirrorMaker, check the result of committing offsets and log an useful error message in case that failed with UNKNOWN_MEMBER_ID.
Reviewers: Chris Egerton <fearthecellos@gmail.com>
The HeaderConverter interface extends Closeable, but we weren't closing them anywhere before. This change causes header converters to be closed as part of task shutdown.
Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, Chris Egerton <fearthecellos@gmail.com>
In KAFKA-13310, we tried to fix a issue that consumer#poll(duration) will be returned after the provided duration. It's because if rebalance needed, we'll try to commit current offset first before rebalance synchronously. And if the offset committing takes too long, the consumer#poll will spend more time than provided duration. To fix that, we change commit sync with commit async before rebalance (i.e. onPrepareJoin).
However, in this ticket, we found the async commit will keep sending a new commit request during each Consumer#poll, because the offset commit never completes in time. The impact is that the existing consumer will be kicked out of the group after rebalance timeout without joining the group. That is, suppose we have consumer A in group G, and now consumer B joined the group, after the rebalance, only consumer B in the group.
Besides, there's also another bug found during fixing this bug. Before KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry when retriable error until timeout. After KAFKA-13310, we thought we have retry, but we'll retry after partitions revoking. That is, even though the retried offset commit successfully, it still causes some partitions offsets un-committed, and after rebalance, other consumers will consume overlapping records.
Reviewers: RivenSun <riven.sun@zoom.us>, Luke Chen <showuon@gmail.com>
Make sure to ack all records where produce failed, when a connector's `errors.tolerance` config property is set to `all`. Acking is essential so that the task will continue to commit future record offsets properly and remove the records from internal tracking, preventing a memory leak.
(cherry picked and slightly modified from commit 63e06aafd0)
Reviewers: Chris Egerton <fearthecellos@gmail.com>, Randall Hauch <rhauch@gmail.com>
Implements embedded end-to-end integration tests for KIP-618, and brings together previously-decoupled logic from upstream PRs.
Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>
This causes the artificial reductions in the Connect REST request timeout to be more isolated. Specifically, they now only take place in the tests that need them (instead of any tests that happen to be running after the reduction has taken place and before it has been reset), and they are only performed for the requests that are expected to time out, before being immediately reset. This should help reduce spurious test failures (especially in slow environments like Jenkins) for all Connect integration tests that interact with the REST API, not just the BlockingConnectorTest test suite.
Reviewers: Bruno Cadonna <cadonna@apache.org>
Implements support for per-connector offsets topics as described in KIP-618.
Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>
Use the newly added function to replace the old addMetric function that may throw illegal argument exceptions.
Although in some cases concurrency should not be possible they do not necessarily remain always true in the future, so it's better to use the new API just to be less error-prone.
Reviewers: Bruno Cadonna <cadonna@apache.org>
New gradle task `connect:runtime:genConnectOpenAPIDocs` that generates `connect_rest.yaml` under `docs/generated`.
This task is executed when `siteDocsTar` runs.
Implements the behavior described in KIP-618: using a transactional producer for writes to the config topic that should only be performed by the leader of the cluster.
Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>
As an initial step to address the notoriously flaky BlockingConnectorTest test suite, we can try increasing test timeouts.
This approach may not be sufficient, and even if it is, it's still suboptimal. Although it may address flakiness on Jenkins, it will make genuine failures harder to detect when testing local changes. Additionally, if the workload on Jenkins continues to increase, we'll probably have to bump these timeouts in the future again at some point.
Potential next steps, for this PR and beyond:
Stop leaking threads that block during test runs
Instead of artificially reducing the REST request timeout at the beginning of every test, reduce it selectively right before issuing a REST request that is expected to time out, and then immediately reset it.
Eliminate artificial reduction of the REST request timeout entirely, as it may be negatively impacting other Connect integration tests that are being run concurrently.
Test repeatedly on Jenkins, ideally at least 50 times
Gather information on the number of CPU cores available to each Jenkins node and the distribution of how many threads are allocated over a given time period (maybe a day?); this is especially relevant since local testing indicates that these tests all do much better when parallelism is reduced, which shouldn't be too surprising considering that each Connect integration test spins up separate threads for at least one Zookeeper node, one Kafka broker, one Connect worker, and usually at least one connector and one task.
I'd like to test these changes as a first step before investigating any of the above (except maybe items 1 and 2, which should be fairly straightforward). To trigger new runs I plan on pushing empty commits or, if those do not trigger new Jenkins runs, dummy commits. If this is objectionable let me know and hopefully we can find a suitable alternative.
Reviewers: Kvicii <Karonazaba@gmail.com>, Bruno Cadonna <cadonna@apache.org>
Minor change to use ' and not LEFT SINGLE QUOTATION MARK in this log message, as it's the only place we are using such a quote and it can break ingestion pipelines
Reviewers: Kvicii <Karonazaba@gmail.com>, Divij Vaidya <diviv@amazon.com>, Konstantine Karantasis <k.karantasis@gmail.com>
The goals here include:
1. Create an overloaded variant of the IncrementalCooperativeAssignor::performTaskAssignment method that is more testing friendly
2. Simplify the parameter list for the IncrementalCooperativeAssignor::handleLostAssignments method, which in turn simplifies the logic for testing this class
3. Capture repeated Java 8 streams logic in simple, reusable, easily-verifiable utility methods added to the ConnectUtils class
Reviewers: Luke Chen <showuon@gmail.com>