Add logic in ConsumerBounceTest to check the error code in FindCoordinator responses and retry if needed. This should help with transient failures or at least get us closer to the actual problem.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Allow the Trogdor agent to execute external commands. The agent communicates with the external commands via stdin, stdout, and stderr.
Based on a patch by Xi Yang <xi@confluent.io>
Reviewers: David Arthur <mumrah@gmail.com>
* Enable heap dumps on OOM with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file.bin> in the major services in system tests
* Collect the heap dump from the predefined location as part of the result logs for each service
* Change Connect service to delete the whole root directory instead of individual expected files
* Tested by running the full suite of system tests
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6158 from kkonstantine/KAFKA-7834
Rather than using sed, use built-in Python regular expressions to strip
the SNAPSHOT expression from the pom.xml files. Sed has different flags
on different platforms, such as Linux. Using Python directly here is
more compatible, as well as being more efficient, and not requiring an
rm command afterwards.
When running release_notes.py, use the current Python interpreter.
This is needed to prevent attempting to run release_notes.py with
Python 3 on some systems. release_notes.py will not (yet) work with
Python 3.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Magnus Edenhill <magnus@edenhill.se>, David Arthur <mumrah@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6198 from cmccabe/release_py
This PR adds a upgrade notes and changes examples to use the bootstrap-server.
Author: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Reviewers: Srinivas <srinivas96alluri@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6118 from viktorsomogyi/topiccommand-adminclient-doc
The default backoff of 1000ms when there are no partitions to fetch can cause `shouldExecuteThrottledReassignment` to fail due to it taking too long. So we reduce
it to 100ms.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk
Split the Gradle invocation in the jenkins.sh script into two commands so
we can fail fast for validation checks such as compile errors and checkstyle
errors.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Reviewers: Damian Guy <damian@confluent.io>, John Roesler <john@confluent.io>, Boyang Chen <bchen11@outlook.com>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
This patch introduces a new config - "group.max.size", which caps the maximum size any group can reach. It has a default value of Int.MAX_VALUE. Once a group is of the maximum size, subsequent JoinGroup requests receive a MAX_SIZE_REACHED error.
In the case where the config is changed and a Coordinator broker with the new config loads an old group that is over the threshold, members are kicked out of the group and a rebalance is forced.
Reviewers: Vahid Hashemian <vahid.hashemian@gmail.com>, Boyang Chen <bchen11@outlook.com>, Gwen Shapira <cshapi@gmail.com>, Jason Gustafson <jason@confluent.io>
The PR adds --bootstrap-server and --admin.config options to TopicCommand and implements an alternative, AdminClient based way of topic management.
As testing I've duplicated the existing tests and made them working with the AdminClient options.
Author: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Reviewers: Andras Katona <41361962+akatona84@users.noreply.github.com>, Sandor Murakozi <smurakozi@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#5683 from viktorsomogyi/topiccommand-adminclient
Explicitly seek KafkaBasedLog’s consumer to the beginning of the topic partitions, rather than potentially use committed offsets (which would be unexpected) if group.id is set or rely upon `auto.offset.reset=earliest` if the group.id is null.
This should not change existing behavior but should remove some potential issues introduced with KIP-287 if `group.id` is not set in the consumer configurations. Note that even if `group.id` is set, we still always want to consume from the beginning.
Reviewers: Jason Gustafson <jason@confluent.io>
Limit the number of new connections processed in each iteration of each
Processor. Block Acceptor if the connection queue is full on all Processors.
Added a metric to track accept blocked time percent. See KIP-402 for details.
Reviewers: Ismael Juma <ismael@juma.me.uk>
#2972 tried to fix a bug about flushing operation, but it was not complete, since findSessions(key, earliestEnd, latestStart) does not guarantee to only return a single entry since its semantics are to return any sessions whose end > earliestEnd and whose start < latestStart.
I've tried various ways to fix it completely and I ended up having to add a single-point query to the public ReadOnlySessionStore API for the exact needed semantics. It is used for flushing to read the old values (otherwise the wrong old values will be sent downstreams, hence it is a correctness issue) and also for getting the value for value-getters (it is for perf only).
Reviewers: Guozhang Wang <guozhang@confluent.io>, Ismael Juma <ismael@confluent.io>, Jorge Quilcate Otoya <quilcate.jorge@gmail.com>, John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>
See also KIP-183.
This implements the following algorithm:
AdminClient sends ElectPreferredLeadersRequest.
KafakApis receives ElectPreferredLeadersRequest and delegates to
ReplicaManager.electPreferredLeaders()
ReplicaManager delegates to KafkaController.electPreferredLeaders()
KafkaController adds a PreferredReplicaLeaderElection to the EventManager,
ReplicaManager.electPreferredLeaders()'s callback uses the
delayedElectPreferredReplicasPurgatory to wait for the results of the
election to appear in the metadata cache. If there are no results
because of errors, or because the preferred leaders are already leading
the partitions then a response is returned immediately.
In the EventManager work thread the preferred leader is elected as follows:
The EventManager runs PreferredReplicaLeaderElection.process()
process() calls KafkaController.onPreferredReplicaElectionWithResults()
KafkaController.onPreferredReplicaElectionWithResults()
calls the PartitionStateMachine.handleStateChangesWithResults() to
perform the election (asynchronously the PSM will send LeaderAndIsrRequest
to the new and old leaders and UpdateMetadataRequest to all brokers)
then invokes the callback.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jun Rao <junrao@gmail.com>
Upgrade from 171 to 202. Unpack and install directly from a cached tgz rather than going via the installer deb from webupd8. The installer is still on 8u919 while we want 202.
Testing via kafka branch builder job
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2305/
Author: Jarek Rudzinski <jarek@confluent.io>
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Alex Diachenko <sansanichfb@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6165 from jarekr/trunk-jdk8-from-tgz
The problem is that the sequence number is an Int and should wrap around when it reaches the Int.MaxValue. The bug here is it doesn't wrap around and become negative and raises an error.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch fixes a few overflow issues with wrapping sequence numbers in the broker's producer state tracking.
Reviewers: Jason Gustafson <jason@confluent.io>
* Allow the Trogdor agent to be started in "exec mode", where it simply
runs a single task and exits after it is complete.
* For AgentClient and CoordinatorClient, allow the user to pass the path
to a file containing JSON, instead of specifying the JSON object in the
command-line text itself. This means that we can get rid of the bash
scripts whose only function was to load task specs into a bash string
and run a Trogdor command.
* Print dates and times in a human-readable way, rather than as numbers
of milliseconds.
* When listing tasks or workers, output human-readable tables of
information.
* Allow the user to filter on task ID name, task ID pattern, or task
state.
* Support a --json flag to provide raw JSON output if desired.
Reviewed-by: David Arthur <mumrah@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
This PR enables BloomFilters for RocksDB to speed up point lookups.
The request for this has been around for some time - https://issues.apache.org/jira/browse/KAFKA-4850
For testing, I've done the following
Ran the standard streams suite of unit and integration tests
Kicked off the simple benchmark test with bloom filters enabled
Kicked off the simple benchmark test with bloom filters not enabled
Kicked off streams system tests
Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>
When using ./ducker-ak test on Jenkins, the script complains that there is no TTY. To fix this, we should skip passing -t to docker exec. We do not need a pseudo-TTY to run the tests. Similarly, we should skip passing -i, since we do not need to keep stdin open.
The down command should have a force option, specified as -f or --force.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
[KIP-297](https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations#KIP-297:ExternalizingSecretsforConnectConfigurations-PublicInterfaces) introduced the `ConfigProvider` mechanism, which was primarily intended for externalizing secrets provided in connector configurations. However, when querying the Connect REST API for the configuration of a connector or its tasks, those secrets are still exposed. The changes here prevent the Connect REST API from ever exposing resolved configurations in order to address that. rhauch has given a more thorough writeup of the thinking behind this in [KAFKA-5117](https://issues.apache.org/jira/browse/KAFKA-5117)
Tested and verified manually. If these changes are approved unit tests can be added to prevent a regression.
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6129 from C0urante/hide-provided-connect-configs
The presence of the buildSrc subproject is causing problems when we try
to run installAll, jarAll, and the other "all" targets. It's easier
just to make the generator code a regular subproject and use the
JavaExec gradle task to run the code. This also makes it more
straightforward to run the generator unit tests.
Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Co-authored-by: Colin P. Mccabe <cmccabe@confluent.io>
Co-authored-by: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
* Join ssl principal mapping rules correctly before evaluating.
Java properties splits the configuration array on commas, and that leads to rules containing commas being split before being evaluated. This commit adds a code change to re-join those strings into full rules before evaluating them. The function assumes every rule is either DEFAULT or begins with the prefix RULE: