In system tests, it is useful to have the thread dumps if a broker cannot be stopped using SIGTERM.
Reviewers: Xavier Léauté <xl+github@xvrl.net>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
- Removed Scala consumers (`SimpleConsumer` and `ZooKeeperConsumerConnector`)
and their tests.
- Removed Scala request/response/message classes.
- Removed any mention of new consumer or new producer in the code
with the exception of MirrorMaker where the new.consumer option was
never deprecated so we have to keep it for now. The non-code
documentation has not been updated either, that will be done
separately.
- Removed a number of tools that only made sense in the context
of the Scala consumers (see upgrade notes).
- Updated some tools that worked with both Scala and Java consumers
so that they only support the latter (see upgrade notes).
- Removed `BaseConsumer` and related classes apart from `BaseRecord`
which is used in `MirrorMakerMessageHandler`. The latter is a pluggable
interface so effectively public API.
- Removed `ZkUtils` methods that were only used by the old consumers.
- Removed `ZkUtils.registerBroker` and `ZKCheckedEphemeral` since
the broker now uses the methods in `KafkaZkClient` and no-one else
should be using that method.
- Updated system tests so that they don't use the Scala consumers except
for multi-version tests.
- Updated LogDirFailureTest so that the consumer offsets topic would
continue to be available after all the failures. This was necessary for it
to work with the Java consumer.
- Some multi-version system tests had not been updated to include
recently released Kafka versions, fixed it.
- Updated findBugs and checkstyle configs not to refer to deleted
classes and packages.
Reviewers: Dong Lin <lindong28@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
This could be backported to older branches to reduce the extra log warning messages there, too.
Running Connect system tests in this branch builder job: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1773/
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5151 from rhauch/kafka-7009
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Vahid Hashemian <vahidhashemian@us.ibm.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
* Removed Scala producers, request classes, kafka.tools.ProducerPerformance, encoders,
tests.
* Updated ConsoleProducer to remove Scala producer support (removed `BaseProducer`
and several options that are not used by the Java producer).
* Updated a few Scala consumer tests to use the new producer (including a minor
refactor of `produceMessages` methods in `TestUtils`).
* Updated `ClientUtils.fetchTopicMetadata` to use `SimpleConsumer` instead of
`SyncProducer`.
* Removed `TestKafkaAppender` as it looks useless and it defined an `Encoder`.
* Minor import clean-ups
No new tests added since behaviour should remain the same after these changes.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5045 from ijuma/kafka-6921-remove-old-producer
This PR does the following:
* Remove the StreamsRepeatingIntegerKeyProducerService and the associated Java class
* Add a parameter to VerifiableProducer.java to enable sending keys when specified
* Update the corresponding Python file verifiable_producer.py to support the new parameter.
Reviewers: Matthias J Sax <matthias@confluentio>, Guozhang Wang <wangguoz@gmail.com>
This PR fixes some regressions introduced into streams system tests and sets the upgrade tests to ignore until PR #4636 is merged as it has the fixes for the upgrade tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
- adds Streams upgrade tests for 1.1 release
- introduces metadata version 3
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
SimpleBenchmark:
1.a Do not rely on manual num.records / bytes collection on atomic integers.
1.b Rely on config files for num.threads, bootstrap.servers, etc.
1.c Add parameters for key skewness and value size.
1.d Refactor the tests for loading phase, adding tumbling-windowed count.
1.e For consumer / consumeproduce, collect metrics on consumer instead.
1.f Force stop the test after 3 minutes, this is based on empirical numbers of 10M records.
Other tests: use config for kafka bootstrap servers.
streams_simple_benchmark.py: only use scale 1 for system test, remove yahoo from benchmark tests.
Note that the JMX based metrics is more accurate than the manually collected metrics.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
1. Use JmxMixin for SimpleBenchmark (will remove the self reporting in #4744), only when loading phase is false (i.e. we are in fact starting the streams app).
2. Reported the full jmx reported metrics in log files, and in the returned data only return the max values: this is because we want to skip the warming up and cooling down periods that will have lower rate numbers, while max represents the actual rate at full speed.
3. Incorporates two other improves to JMXTool: #1241 and #2950
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Rohan Desai <desai.p.rohan@gmail.com>
This is a general change and is re-requisite to allow streams benchmark test with different streams tests. For the streams benchmark itself I will have a separate PR for switching configs. Details:
1. Create a "streams.properties" file under PERSISTENT_ROOT before all the streams test. For now it will only contain a single config of state.dir pointing to PERSISTENT_ROOT.
2. For all the system test related code, replace the main function parameter of state.dir with propsFilename, then inside the function load the props from the file and apply overrides if necessary.
3. Minor fixes.
Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
Ensures Kafka chroot exists in ZK when starting KafkaService so commands that use ZK and are executed before the first Kafka broker starts do not fail due to the missing chroot.
Also uses chroot with one test that also has security parameterizations so Kafka's test suite exercises these combinations. Previously no tests were exercising chroots.
Changes were validated using sanity_checks which include the chroot-ed test as well as some non-chroot-ed tests.
Some changes required to get the Streams system tests working via Docker
To test:
TC_PATHS="tests/kafkatest/tests/streams" bash tests/docker/run_tests.sh
That command will take about 3.5 hours, and should pass. Note there are a couple of ignored tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Damian Guy <damian@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
ensure that brokers are registered at ZK before start() returns
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Damian Guy <damian@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Currently port forwarding is setup for HttpMetricsCollector when the Service's start_node method is called, but not canceled during stop. This hasn't presented a problem so far because we don't have tests that use this *and* restart the service. However, if a test/service does that, it will throw an exception since the port is already bound.
This just does the cleanup when stopping so a subsequent attempt to start again will succeed.
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1320 is a test run for a Test that uses ProducerPerformanceService, which in turn uses HttpMetricsCollector to validate the change.
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>
Closes#4604 from ewencp/cleanup-reverse-port-forward
Avoid loosing log/stdout/stderr files on restart
Reenables tests
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
System test for two cases:
* Starting a multi-node streams application with the broker down initially, broker starts and confirm rebalance completes and streams application still able to process records.
* Multi-node streams app running, broker goes down, stop stream instance(s) confirm after broker comes back remaining streams instance(s) still function.
Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Allow Connect Service in system tests to start asynchronously.
Specifically, allow for three startup conditions:
1. No condition - start async and return immediately.
2. Semi-async - start immediately after plugins have been discovered successfully.
3. Sync - start returns after the worker has completed startup. This is the current mode, but its condition is improved by checking that the port of Connect's REST interface is open, rather than that a log line has appeared in the logs.
An associated system test run has been started here:
https://jenkins.confluent.io/job/system-test-confluent-platform-branch-builder/586/
ewencp rhauch, I'd appreciate your review.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4423 from kkonstantine/MINOR-Add-async-and-different-sync-startup-modes-in-ConnectService-test-class
* Implement process stop faults via SIGSTOP / SIGCONT
* Implement RoundTripWorkload, which both sends messages, and confirms that they are received at least once.
* Allow Trogdor tasks to block until other Trogdor tasks are complete.
* Add CreateTopicsWorker, which can be a building block for a lot of tests.
* Simplify how TaskSpec subclasses in ducktape serialize themselves to JSON.
* Implement some fault injection tests in round_trip_workload_test.py
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4323 from cmccabe/KAFKA-5849
Fix an omission where Kibosh was not getting installed on Vagrant
instances running in AWS.
Fix an issue where the Dockerfile was unable to download old Apache
Kafka releases. See the discussion on KAFKA-6233.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4240 from cmccabe/KAFKA-6247
For ducktape: add Kibosh to the testing Dockerfile.
Create files_unreadable_fault_spec.py.
For trogdor: create FilesUnreadableFaultSpec.java.
Add a unit test of using the Kibosh service.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4195 from cmccabe/KAFKA-5811
Previously, Trogdor only handled "Faults." Now, Trogdor can handle
"Tasks" which may be either faults, or workloads to execute in the
background.
The Agent and Coordinator have been refactored from a
mutexes-and-condition-variables paradigm into a message passing
paradigm. No locks are necessary, because only one thread can access
the task state or worker state. This makes them a lot easier to reason
about.
The MockTime class can now handle mocking deferred message passing
(adding a message to an ExecutorService with a delay). I added a
MockTimeTest.
MiniTrogdorCluster now starts up Agent and Coordinator classes in
paralle in order to minimize junit test time.
RPC messages now inherit from a common Message.java class. This class
handles implementing serialization, equals, hashCode, etc.
Remove FaultSet, since it is no longer necessary.
Previously, if CoordinatorClient or AgentClient hit a networking
problem, they would throw an exception. They now retry several times
before giving up. Additionally, the REST RPCs to the Coordinator and
Agent have been changed to be idempotent. If a response is lost, and
the request is resent, no harm will be done.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#4073 from cmccabe/KAFKA-6060
With these changes, we are ensuring that the partitions being reassigned are from non-zero offsets. We also ensure that every message in the log has producerId and sequence number.
This means that it successfully reproduces https://issues.apache.org/jira/browse/KAFKA-6003.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4029 from apurvam/KAFKA-6016-add-idempotent-producer-to-reassign-partitions
- improve tests to get rid of calls to `sleep` in Python
- fixed some flaky test conditions
- improve debugging
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3542 from mjsax/failing-eos-system-tests
The string representation of TopicPartition was changed to be
{topic}-{partitition} consistently in the following commit:
f6f56a645b
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3890 from ijuma/fix-replica-verification-test
Also throw an exception if a null keystore type is seen
in `SecurityStore`. This should never happen.
The default keystore type has changed in Java 9 (
http://openjdk.java.net/jeps/229), so we need to
be explicit to have consistent behaviour across
Java versions.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3808 from ijuma/set-jks-explicitly-in-system-tests
The previous timeout was 10 seconds, but system test failures have occurred when Zookeeper has started after about 11 seconds. Increasing the timeout to 30 seconds, since most of the time this extra time will not be required, and when it is it will prevent a failed system test.
In addition to merging to `trunk`, please backport to the `0.11.x` and `0.10.2.x` branches.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#3774 from rhauch/MINOR-Increase-timeout-of-zookeeper-service-in-system-tests
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3715 from cmccabe/kafka_service_print_node_hostname_on_failure
ewencp would be great to cherry-pick this back into 0.11.x if possible
Author: Xavier Léauté <xavier@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#3645 from xvrl/system-test-cluster-id
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>, Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Onur Karaman <okaraman@linkedin.com>
Closes#2929 from lindong28/KAFKA-4763