This change updates ConsoleProducer, ConsumerPerformance, VerifiableProducer, and VerifiableConsumer classes to add and prefer the --bootstrap-server flag for defining the connection point of the Kafka cluster. This change is part of KIP-499: https://cwiki.apache.org/confluence/display/KAFKA/KIP-499+-+Unify+connection+name+flag+for+command+line+tool.
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
This change mainly have 2 components:
1. extend the existing transactions_test.py to also try out new sendTxnOffsets(groupMetadata) API to make sure we are not introducing any regression or compatibility issue
a. We shrink the time window to 10 seconds for the txn timeout scheduler on broker so that we could trigger expiration earlier than later
2. create a completely new system test class called group_mode_transactions_test which is more complicated than the existing system test, as we are taking rebalance into consideration and using multiple partitions instead of one. For further breakdown:
a. The message count was done on partition level, instead of global as we need to visualize
the per partition order throughout the test. For this sake, we extend ConsoleConsumer to print out the data partition as well to help message copier interpret the per partition data.
b. The progress count includes the time for completing the pending txn offset expiration
c. More visibility and feature improvements on TransactionMessageCopier to better work under either standalone or group mode.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The log context is useful when debugging applications which have multiple clients. This patch propagates the context to the channel builders and the SASL authenticator.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
* Adjust build and documentation.
* Use lambda syntax for SAM types in `core`, `streams-scala` and
`connect-runtime` modules.
* Remove `runnable` and `newThread` from `CoreUtils` as lambda
syntax for SAM types make them unnecessary.
* Remove stale comment in `FunctionsCompatConversions`,
`KGroupedStream`, `KGroupedTable' and `KStream` about Scala 2.11,
the conversions are needed for Scala 2.12 too.
* Deprecate `org.apache.kafka.streams.scala.kstream.Suppressed`
and use `org.apache.kafka.streams.kstream.Suppressed` instead.
* Use `Admin.create` instead of `AdminClient.create`. Static methods
in Java interfaces can be invoked since Scala 2.12. I noticed that
MirrorMaker 2 uses `AdminClient.create`, but I did not change them
as Connectors have restrictions on newer client APIs.
* Improve efficiency in a few `Gauge` implementations by avoiding
unnecessary intermediate collections.
* Remove pointless `Option.apply` in `ZookeeperClient`
`SessionState` metric.
* Fix unused import/variable and other compiler warnings.
* Reduce visibility of some vals/defs.
Reviewers: Manikumar Reddy <manikumar@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Gwen Shapira <gwen@confluent.io>
This patch adds a basic downgrade system test. It verifies that producing and consuming continues to work before and after the downgrade.
Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>
* Add rate limiting to tc
* Feedback from PR
* Add a sanity test for tc
* Add iperf to vagrant scripts
* Dynamically determine the network interface
* Add some temp code for testing on AWS
* Temp: use hostname instead of external IP
* Temp: more AWS debugging
* More AWS WIP
* More AWS temp
* Lower latency some
* AWS wip
* Trying this again now that ping should work
* Add cluster decorator to tests
* Fix broken import
* Fix device name
* Fix decorator arg
* Remove errant import
* Increase timeouts
* Fix tbf command, relax assertion on latency test
* Fix log line
* Final bit of cleanup
* Newline
* Revert Trogdor retry count
* PR feedback
* More PR feedback
* Feedback from PR
* Remove unused argument
The consumer's `committed` API does not return an entry in the response map for a requested partition if there is no committed offset. The transactional message copier, which is used in the transaction system test, did not account for this. If the first transaction attempted by the copier was randomly aborted, then we would not seek to the beginning as expected, which means we would fail to copy some of the records.
This patch fixes the problem by iterating over the assignment rather than the result of `committed` when resetting offsets. It also adds enables additional logging in the transaction message copier service to make finding problems easier in the future.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#7653 from hachikuji/fix-transaction-system-test
1. Add the overloaded functions.
2. Update the code in Streams to use the batch API for better latency (this applies to both active StreamsTask for initialize the offsets, as well as the StandbyTasks for updating offset limits).
3. Also update all unit test to replace the deprecated APIs.
Reviewers: Christopher Pettitt <cpettitt@confluent.io>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Bill Bejeck <bill@confluent.io>
This creates a test that generates sustained connections against Kafka. There
are three different components we can stress with this, KafkaConsumer,
KafkaProducer, and AdminClient. This test tries use minimal bandwidth per
connection to reduce overhead impacts.
This test works by creating a threadpool that creates connections and then
maintains a central pool of connections at a specified keepalive rate. The
keepalive action varies by which component is being stressed:
* KafkaProducer: Sends one single produce record. The configuration for
the produce request uses the same key/value generator as the ProduceBench
test.
* KafkaConsumer: Subscribes to a single partition, seeks to the end, and
then polls a minimal number of records. Each consumer connection is its
own consumer group, and defaults to 1024 bytes as FETCH_MAX_BYTES to keep
traffic to a minimum.
* AdminClient: Makes an API call to get the nodes in the cluster.
NOTE: This test is designed to be run alongside a ProduceBench test for a
specific topic, due to the way the Consumer test polls a single partition.
There may be no data returned by the consumer test if this is run on its own.
The connection should still be kept alive, but with no data returned.
Author: Scott Hendricks <scott.hendricks@confluent.io>
Reviewers: Stanislav Kozlovski, Gwen Shapira
Closes#7289 from scott-hendricks/trunk
When sending bad records, the Trogdor task will fail if the final record produced is bad. Instead we should catch the exception to allow the task to finish since sending bad records is a valid use case.
Reviewers: Tu V. Tran <tuvtran97@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Add a new RandomComponentPayloadGenerator that gives a payload based on random selection of another PayloadGenerator. Additionally, add an example that uses a non-default PayloadGenerator configuration to TROGDOR.md.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Add a "useConfiguredPartitioner" boolean to specify testing with the configured partitioner, rather than overriding the partitioner in the test.
Add a "skipFlush" boolean to specify skipping the flush operation when producing. This is helpful when testing some scenarios where linger.ms is greater than 0.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This adds a new Trogdor fault spec for inducing network latency on a network device for system testing. It operates very similarly to the existing network partition spec by executing the `tc` linux utility.
* Remove deprecated class Slf4jRequestLog: use Slf4jRequestLogWriter, CustomRequestLog instread.
1. Remove '@SuppressWarnings("deprecation")' from RestServer#initializeResources, JsonRestServer#start.
2. Remove unused JsonRestServer#httpRequest.
* Fix deprecated class usage: SslContextFactory -> SslContextFactory.[Server, Client]
1. Split SSLUtils#createSslContextFactory into SSLUtils#create[Server, Client]SideSslContextFactory: each method instantiates SslContextFactory.[Server, Client], respectively.
2. SSLUtils#configureSslContextFactoryAuthentication is called from SSLUtils#createServerSideSslContextFactory only.
3. Update SSLUtilsTest following splittion; for client-side SSL Context Factory, SslContextFactory#get[Need, Want]ClientAuth is always false. (SSLUtilsTest#testCreateClientSideSslContextFactory)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
For static members join/rejoin, we encode the current timestamp in the new member.id. The format looks like group.instance.id-timestamp.
During consumer/broker interaction logic (Join, Sync, Heartbeat, Commit), we shall check the whether group.instance.id is known on group. If yes, we shall match the member.id stored on static membership map with the request member.id. If mismatching, this indicates a conflict consumer has used same group.instance.id, and it will receive a fatal exception to shut down.
Right now the only missing part is the system test. Will work on it offline while getting the major logic changes reviewed.
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This is the first diff for the implementation of JoinGroup logic for static membership. The goal of this diff contains:
* Add group.instance.id to be unique identifier for consumer instances, provided by end user;
Modify group coordinator to accept JoinGroupRequest with/without static membership, refactor the logic for readability and code reusability.
* Add client side support for incorporating static membership changes, including new config for group.instance.id, apply stream thread client id by default, and new join group exception handling.
* Increase max session timeout to 30 min for more user flexibility if they are inclined to tolerate partial unavailability than burdening rebalance.
* Unit tests for each module changes, especially on the group coordinator logic. Crossing the possibilities like:
6.1 Dynamic/Static member
6.2 Known/Unknown member id
6.3 Group stable/unstable
6.4 Leader/Follower
The rest of the 345 change will be broken down to 4 separate diffs:
* Avoid kicking out members through rebalance.timeout, only do the kick out through session timeout.
* Changes around LeaveGroup logic, including version bumping, broker logic, client logic, etc.
* Admin client changes to add ability to batch remove static members
* Deprecate group.initial.rebalance.delay
Reviewers: Liquan Pei <liquanpei@gmail.com>, Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Each separate thread should have its own throttle, so that it can sleep
for an appropriate amount of time when needed.
ConnectionStressWorker should avoid recalculating the status after
shutting down the runnables. Otherwise, if one runnable is slow to
stop, it will skew the average down in a way that doesn't reflect
reality. This change moves the status calculation into a separate
periodic runnable that gets shut down cleanly before the other ones.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Gwen Shapira, Stanislav Kozlovski
Closes#6533 from cmccabe/fix_connection_stress_worker
Optimize ConnectionStressWorker by avoiding creating a new
ChannelBuilder each time we want to open a new connection.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Gwen Shapira
Closes#6518 from cmccabe/optimize-connection-stress-worker
doneFuture is supposed to be completed with an empty string (meaning success) or a non-empty string which is the error message. Currently, due to exception.getMessage sometimes returning null or an empty string, this is not working correctly. This patch fixes that.
Reviewers: David Arthur <mumrah@gmail.com>
This patch adds a TimeIntervalTransactionsGenerator class which enables the Trogdor ProduceBench worker to commit transactions based on a configurable millisecond time interval.
Also, we now handle 409 create task responses in the coordinator command-line client by printing a more informative message
Reviewers: Colin P. McCabe <cmccabe@apache.org>
RoundTripWorker to should use a long field for maxMessages rather than an int. The consumer group used should unique as well.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
TopicDescription and ConsumerGroupDescription in org.apache.kafka.clients.admin. are part of the public API, so we should retain the existing public constructor. Changed the new constructor with authorized operations to be package-private to avoid maintaining more public constructors since we only expect admin client to use this.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
It is best to use a growing thread pool for worker cleanups. This lets us ensure that we close workers as fast as possible and not get slowed down on blocking cleanups.
Reviewers: Colin McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
SecurityTest.test_client_ssl_endpoint_validation_failure is failing because it greps for 'SSLHandshakeException in the consumer and producer log files. With the fix for KAKFA-7773, the test uses the VerifiableConsumer instead of the ConsoleConsumer, which does not log the exception stack trace to the service log. This patch catches exceptions in the VerifiableConsumer and logs them in order to fix the test. Tested by running the test locally.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Allow the Trogdor agent to execute external commands. The agent communicates with the external commands via stdin, stdout, and stderr.
Based on a patch by Xi Yang <xi@confluent.io>
Reviewers: David Arthur <mumrah@gmail.com>
* Allow the Trogdor agent to be started in "exec mode", where it simply
runs a single task and exits after it is complete.
* For AgentClient and CoordinatorClient, allow the user to pass the path
to a file containing JSON, instead of specifying the JSON object in the
command-line text itself. This means that we can get rid of the bash
scripts whose only function was to load task specs into a bash string
and run a Trogdor command.
* Print dates and times in a human-readable way, rather than as numbers
of milliseconds.
* When listing tasks or workers, output human-readable tables of
information.
* Allow the user to filter on task ID name, task ID pattern, or task
state.
* Support a --json flag to provide raw JSON output if desired.
Reviewed-by: David Arthur <mumrah@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
Update the Trogdor StringExpander regex to handle an epilogue. Previously the regex would use a lazy quantifier at the end, which meant it would not catch anything after the range expression. Add a unit test.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
* Update KafkaAdminClient#describeTopics to throw UnknownTopicOrPartitionException.
* Remove unused method: WorkerUtils#getMatchingTopicPartitions.
* Add some JavaDoc.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>, Ryanne Dolan <ryannedolan@gmail.com>
The Trogdor Coordinator now overwrites a task's startMs to the time it received it if startMs is in the past.
The Trogdor Agent now correctly expires a task after the expiry time (startMs + durationMs) passes. Previously, it would ignore startMs and expire after durationMs milliseconds of local start of the task.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
Switch to lambda when ever possible instead of old anonymous way
in tools module
Author: Srinivas Reddy <srinivas96alluri@gmail.com>
Author: Srinivas Reddy <mrsrinivas@users.noreply.github.com>
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6013 from mrsrinivas/tools-switch-to-java8
Current code will fall into non-stop loop and send more message to broker, and Stat in PerfCallback method record will throw ArrayIndexOutOfBoundsException
KAFKA-7597: Add configurable transaction support to ProduceBenchWorker. In order to get support for serializing Optional<> types to JSON, add a new library: jackson-datatype-jdk8. Once Jackson 3 comes out, this library will not be needed.
Reviewers: Colin McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>