https://issues.apache.org/jira/browse/KAFKA-9069
Make `getTopicMetadata` in AdminClientIntegrationTest always read metadata from controller to get a consistent view.
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, José Armando García Sancio <jsancio@gmail.com>
Closes#7619 from huxihx/KAFKA-9069
This PR is a follow-up of #7087, fixing typos, styles, etc.
cc/ big-andy-coates ijuma
Author: Lee Dongjin <dongjin@apache.org>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#7217 from dongjinleekr/feature/trivial-admin-javadoc
This ends up spamming the logs and doesn't seem to be providing much useful information, rather than logging the full task (which includes the entire topology description) we should just log the task id.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
`assignmentSnapshot` may take a moment to get initialized in some cases, especially when
Kafka Connect is started from scratch. While `assignmentSnapshot` is not initialized, return 0 in both `measure()` methods.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Edoardo Comar <ecomar@uk.ibm.com>
Newer versions of Gradle handle this automatically. Tested with Gradle 5.6.
Credit to @granthenke for the tip.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
The --enable-autocommit argument is a flag. It does not take a parameter. This was broken in #7724.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>
If a field is not marked as ignorable, we should raise an exception if it has been set to a non-default value. This check already exists in `Message.write`, so this patch adds it to `Message.toStruct`. Additionally, we fix several fields which should have been marked ignorable and we fix some related test assertions.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>, Colin Patrick McCabe <cmccabe@apache.org>
Two tests using 50k replicas on 8 brokers:
* Do a rolling restart with clean shutdown, delete topics
* Run produce bench and consumer bench on a subset of topics
Reviewed-By: David Jacot <djacot@confluent.io>, Vikas Singh <vikas@confluent.io>, Jason Gustafson <jason@confluent.io>
Convert StreamTableJoinIntegrationTest to use the ToplogyTestDriver to eliminate flakiness and speed up the build.
Reviewers: John Roesler <john@confluent.io>
This test was taking more than 5 minutes because the producer writes were timing out. The problem was that broker 100 was being shutdown before broker 101, which meant that the partition was still offline after broker 100 was restarted. The producer timeouts were not detected because the produce future was not checked. After the fix, test time drops to about 15s.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch adds a basic downgrade system test. It verifies that producing and consuming continues to work before and after the downgrade.
Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>
Fixes#7717, which did not actually achieve its intended effect. The event manager failed to process the new event because we disabled the rate metric, which it expected to be present.
Reviewers: Ismael Juma <ismael@juma.me.uk
For some reason, PR builds are failing due to the `rat` license
check even though it should ignore files included in `.gitignore`.
Reviewers: Jason Gustafson <jason@confluent.io>
Given we need to follow the Apache rule of not checking
any binaries into the source code, Kafka has always had
a bit of a tricky Gradle bootstrap.
Using ./gradlew as users expect doesn’t work and a
local and compatible version of Gradle was required to
generate the wrapper first.
This patch changes the behavior of the wrapper task to
instead generate a gradlew script that can bootstrap the
jar itself. Additionally it adds a license, removes the bat
script, and handles retries.
The documentation in the readme was also updated.
Going forward patches that upgrade gradle should run
`gradle wrapper` before checking in the change.
With this change users using ./gradlew can be sure they
are always building with the correct version of Gradle.
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Ismael Juma <ismael@juma.me.uk
Create a controller event for handling UpdateMetadata responses and log a message when a response contains an error.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>
Recently, system tests test_rebalance_[simple|complex] failed
repeatedly with a verfication error. The cause was most probably
the missing clean-up of a state directory of one of the processors.
A node is cleaned up when a service on that node is started and when
a test is torn down.
If the clean-up flag clean_node_enabled of a EOS Streams service is
unset, the clean-up of the node is skipped.
The clean-up flag of processor1 in the EOS tests should stay set before
its first start, so that the node is cleaned before the service is started.
Afterwards for the multiple restarts of processor1 the cleans-up flag should
be unset to re-use the local state.
After the multiple restarts are done, the clean-up flag of processor1 should
again be set to trigger node clean-up during the test teardown.
A dirty node can lead to test failures when tests from Streams EOS tests are
scheduled on the same node, because the state store would not start empty
since it reads the local state that was not cleaned up.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Andrew Choi <andchoi@linkedin.com>, Bill Bejeck <bbejeck@gmail.com>
Allow null as a valid default for tagged fields. Fix a bunch of cases where this would previously result in null pointer dereferences.
Also allow inferring FieldSpec#versions based on FieldSpec#taggedVersions. Prefix 'key' with an underscore when it is used in the generated code, to avoid potential name collisions if someone names an RPC field "key".
Allow setting setting hexadecimal constants and 64-bit contstants.
Add a lot more test cases to SimpleExampleMessage.json.
Reviewers: Jason Gustafson <jason@confluent.io>
The latter was previously hardcoding logDirCount instead of using the
method defined in the superclass since it was unnecessarily duplicating
logic.
Also tweak IntegrationTestHarness and remove unnecessary method
override from SaslPlainPlaintextConsumerTest.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
After a number of last minute bugs were found stemming from the incremental closing of lost tasks in StreamsRebalanceListener#onPartitionsLost, a safer approach to this edge case seems warranted. We initially wanted to be as "future-proof" as possible, and avoid baking further protocol assumptions into the code that may be broken as the protocol evolves. This meant that rather than simply closing all active tasks and clearing all associated state in #onPartitionsLost(lostPartitions) we would loop through the lostPartitions/lost tasks and remove them one by one from the various data structures/assignments, then verify that everything was empty in the end. This verification in particular has caused us significant trouble, as it turns out to be nontrivial to determine what should in fact be empty, and if so whether it is also being correctly updated.
Therefore, before worrying about it being "future-proof" it seems we should make sure it is "present-day-proof" and implement this callback in the safest possible way, by blindly clearing and closing all active task state. We log all the relevant state (at debug level) before clearing it, so we can at least tell from the logs whether/which emptiness checks were being violated.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Andrew Choi <andchoi@linkedin.com>
Refactors metrics according to KIP-444
Introduces ProcessorNodeMetrics as a central provider for processor node metrics
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
* Add rate limiting to tc
* Feedback from PR
* Add a sanity test for tc
* Add iperf to vagrant scripts
* Dynamically determine the network interface
* Add some temp code for testing on AWS
* Temp: use hostname instead of external IP
* Temp: more AWS debugging
* More AWS WIP
* More AWS temp
* Lower latency some
* AWS wip
* Trying this again now that ping should work
* Add cluster decorator to tests
* Fix broken import
* Fix device name
* Fix decorator arg
* Remove errant import
* Increase timeouts
* Fix tbf command, relax assertion on latency test
* Fix log line
* Final bit of cleanup
* Newline
* Revert Trogdor retry count
* PR feedback
* More PR feedback
* Feedback from PR
* Remove unused argument
This patch removes the explicit version check pattern we used in `getErrorResponse`, which is a pain to maintain (as seen by KAFKA-9200). We already check that requests have a valid version range in the `AbstractRequest` constructor.
Reviewers: Andrew Choi <andrewchoi5@users.noreply.github.com>, Ismael Juma <ismael@juma.me.uk>
Force completion of delayed operations when receiving a StopReplica request. In the case of a partition reassignment, we cannot rely on receiving a LeaderAndIsr request in order to complete these operations because the leader may no longer be a replica. Previously when this happened, the delayed operations were left to eventually timeout.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>
Co-Authored-By: Kun Du <kidkun@users.noreply.github.com>
While investigating KAFKA-9180, I noticed that we had no
unit test coverage. It turns out that the behavior was
correct, so we just fix the test coverage issue.
Also updated .gitignore with jmh-benchmarks/generated.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
ListOffsetResponse getErrorResponse is missing a a case for version 5, introduced
by 152292994e and released in 2.3.0.
```
java.lang.IllegalArgumentException: Version 5 is not valid. Valid versions for ListOffsetRequest are 0 to 5
at org.apache.kafka.common.requests.ListOffsetRequest.getErrorResponse(ListOffsetRequest.java:282)
at kafka.server.KafkaApis.sendErrorOrCloseConnection(KafkaApis.scala:3062)
at kafka.server.KafkaApis.sendErrorResponseMaybeThrottle(KafkaApis.scala:3045)
at kafka.server.KafkaApis.handleError(KafkaApis.scala:3027)
at kafka.server.KafkaApis.handle(KafkaApis.scala:209)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:78)
at java.lang.Thread.run(Thread.java:748)
```
Reviewers: Ismael Juma <ismael@juma.me.uk>
When we roll a new segment, the log offset metadata tied to the high watermark may
need to be updated. This is needed when the high watermark is equal to the log end
offset at the time of the roll. Otherwise, we risk exposing uncommitted data early.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Author: Arvind Thirunarayanan <athirunar@confluent.io>
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>
Closes#7694 from arvindth/ath-debugdeps-task
This patch creates a `BaseAdminIntegrationTest` to be the root for all integration test extensions. Most of the existing tests will only be tested in `PlaintextAdminIntegrationTest`, which extends from `BaseAdminIntegrationTest`. This should cut off about 30 minutes from the overall build time.
Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ismael@juma.me.uk>
When Trogdor wants to clear all the faults injected to Kibosh, it sends the empty JSON object {}. However, Kibosh expects {"faults":[]} instead. Kibosh should handle the empty JSON object, since that's consistent with how Trogdor handles empty JSON fields in general (if they're empty, they can be omitted). We should also have a test for this.
Reviewers: David Arthur <mumrah@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
The current Utils::readFileAsString method creates a FileChannel and
memory maps file and copies its content to a String and returns it. But
that means that we need to know the size of the file in advance. This
precludes us from reading files whose size is not known in advance, i.e.
any file opened with flag S_IFIFO.
This change updates the method to use stream to read the content of the file.
It has couple of practical advantages:
Allows bash process substitution to pass in strings as file. So we can
say `./bin/kafka-reassign-partitions.sh --reassignment-json-file <(echo
"reassignment json")
When adding systest for commands that take file, we don't have to
create real physical file. Instead we can just dump the content of the
file on the command line.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This is a followup PR for #7520 to address issue of multiple calls to get() as it was pointed out by @bbejeck in #7520 (comment)
Reviewers: Bill Bejeck <bbejeck@gmail.com>
Explicitly mention that max.request.size validates uncompressed record sizes and max.message.bytes validates compressed record sizes.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Sometimes to be backwards compatible regarding metrics the simplest
solution is to create an empty sensor. Recording an empty sensor on
the hot path may negatively impact performance. With hasMetrics()
recordings of empty sensors on the hot path can be avoided without
being to invasive.
Reviewers: Bill Bejeck <bbejeck@gmail.com>
Fixed a small typo on the Processor API page of the Kafka Streams developer guide docs. ("buildeer" changed to "builder")
Reviewers: Bill Bejeck <bbejeck@gmail.com>