This is a system test for doing a rolling upgrade of a topology with a named repartition topic.
1. An initial Kafka Streams application is started on 3 nodes. The topology has one operation forcing a repartition and the repartition topic is explicitly named.
2. Each node is started and processing of data is validated
3. Then one node is stopped (full stop is verified)
4. A property is set signaling the node to add operations to the topology before the repartition node which forces a renumbering of all operators (except repartition node)
5. Restart the node and confirm processing records
6. Repeat the steps for the other 2 nodes completing the rolling upgrade
I ran two runs of the system test with 25 repeats in each run for a total of 50 test runs.
All test runs passed
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This change adds some basic system tests for delegation token based authentication:
- basic delegation token creation
- producing with a delegation token
- consuming with a delegation token
- expiring a delegation token
- producing with an expired delegation token
New files:
- delegation_tokens.py: a wrapper around kafka-delegation-tokens.sh - executed in container where a secure Broker is running (taking advantage of automatic cleanup)
- delegation_tokens_test.py: basic test to validate the lifecycle of a delegation token
Changes were made in the following file to extend their functionality:
- config_property was updated to be able to configure Kafka brokers with delegation token related settings
- jaas.conf template because a broker needs to support multiple login modules when delegation tokens are used
- consule-consumer and verifiable_producer to override KAFKA_OPTS (to specify custom jaas.conf) and the client properties (to authenticate with delegation token).
Author: Attila Sasvari <asasvari@apache.org>
Reviewers: Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Andras Katona <41361962+akatona84@users.noreply.github.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#5660 from asasvari/KAFKA-4544
This is a new system test testing for optimizing an existing topology. This test takes the following steps
1. Start a Kafka Streams application that uses a selectKey then performs 3 groupByKey() operations and 1 join creating four repartition topics
2. Verify all instances start and process data
3. Stop all instances and verify stopped
4. For each stopped instance update the config for TOPOLOGY_OPTIMIZATION to all then restart the instance and verify the instance has started successfully also verifying Kafka Streams reduced the number of repartition topics from 4 to 1
5. Verify that each instance is processing data from the aggregation, reduce, and join operation
Stop all instances and verify the shut down is complete.
6. For testing I ran two passes of the system test with 25 repeats for a total of 50 test runs.
All test runs passed
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
KAFKA-7597: Add configurable transaction support to ProduceBenchWorker. In order to get support for serializing Optional<> types to JSON, add a new library: jackson-datatype-jdk8. Once Jackson 3 comes out, this library will not be needed.
Reviewers: Colin McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
The Kafka Streams system tests fail with some regularity due to a timeout starting the broker.
The initial start is quite quick, but many of our tests involve stopping and restarting nodes with data already loaded, and also while processing is ongoing.
Under these conditions, it seems to be normal for the broker to take about 25 seconds to start, which makes the 30 second timeout pretty close for comfort.
I have seen many test failures in which the broker successfully started within a couple of seconds after the tests timed out and already initiated the failure/shut-down sequence.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The function `setup_producer_and_consumer` is unused in the system test
framework, which incorrectly suggests subclasses should implement
it. It is not required or even referenced by the framework, so
the requirement should be removed.
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>
Add threads with separate consumers to ConsumeBenchWorker. Update the Trogdor test scripts and documentation with the new functionality.
Reviewers: Colin McCabe <cmccabe@apache.org>
Currently, the startup timeout is hardcoded to be 60 seconds in Connect's test service. Modifying it to be passable via init. This can safely be backported as well.
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
This fixes the Connect standalone system tests. See branch builder: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2021/
This should be backported to the 2.0 branch, since that's when the tests were first
modified to use the external property file.
Reviewers: Magesh Nandakumar <magesh.n.kumar@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Corrects an error in the system tests:
```
07:55:45 [ERROR:2018-10-23 07:55:45,738]: Failed to import kafkatest.tests.connect.connect_test, which may indicate a broken test that cannot be loaded: NameError: name 'EXTERNAL_CONFIGS_FILE' is not defined
```
The constant is defined in the [services/connect.py](https://github.com/apache/kafka/blob/trunk/tests/kafkatest/services/connect.py#L43) file in the `ConnectServiceBase` class, but the problem is in the [tests/connect/connect_test.py](https://github.com/apache/kafka/blob/trunk/tests/kafkatest/tests/connect/connect_test.py#L50) `ConnectStandaloneFileTest`, which does *not* extend the `ConnectServiceBase class`. Suggestions welcome to be able to reuse that variable without duplicating the literal (as in this PR).
System test run with this PR: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2004/
If approved, this should be merged as far back as the `2.0` branch.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5832 from rhauch/fix-connect-externals-tests
1. In test_upgrade_downgrade_brokers, allow duplicates to happen.
2. In test_version_probing_upgrade, grep the generation numbers from brokers at the end, and check if they can ever be synchronized.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
PR #2267 Introduced support for Zstandard compression. The relevant test expects values for `num_nodes` and `num_producers` based on the (now-incremented) count of compression types.
Passed the affected, previously-failing test:
`ducker-ak test tests/kafkatest/tests/client/compression_test.py`
Reviewers: Jason Gustafson <jason@confluent.io>
In some tests, the check monitoring the JMX tool log output doesn’t quite wait long enough before failing. Increasing the timeout from 10 to 20 seconds.
Removed ignore annotations from the upgrade tests. This PR includes the following changes for updating the upgrade tests:
* Uploaded new versions 0.10.2.2, 0.11.0.3, 1.0.2, 1.1.1, and 2.0.0 (in the associated scala versions) to kafka-packages
* Update versions in version.py, Dockerfile, base.sh
* Added new versions to StreamsUpgradeTest.test_upgrade_downgrade_brokers including version 2.0.0
* Added new versions StreamsUpgradeTest.test_simple_upgrade_downgrade test excluding version 2.0.0
* Version 2.0.0 is excluded from the streams upgrade/downgrade test as StreamsConfig needs an update for the new version, requiring a KIP. Once the community votes the KIP in, a minor follow-up PR can be pushed to add the 2.0.0 version to the upgrade test.
* Fixed minor bug in kafka-run-class.sh for classpath in upgrade/downgrade tests across versions.
* Follow on PRs for 0.10.2x, 0.11.0x, 1.0.x, 1.1.x, and 2.0.x will be pushed soon with the same updates required for the specific version.
Reviewers: Eno Thereska <eno.thereska@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Currently, the only way in system tests to add a new variable to the `jaas.conf` template file is to directly edit the path the config is constructed by adding new keyword arguments.
This wasn't necessarily a big problem, since you'd only need edit the `security_config.py` file as JAAS settings should come from the security settings.
Now, with the addition of [KIP-342](https://cwiki.apache.org/confluence/display/KAFKA/KIP-342%3A+Add+support+for+Custom+SASL+extensions+in+OAuthBearer+authentication), the OAuthBearer JAAS config supports arbitrary values in the form of SASL extensions. This patch exposes a more convenient API to overrides these values in system tests.
Reviewers: Jason Gustafson <jason@confluent.io>
Fix system tests from earlier #5445 by moving to the `ConnectSystemBase` class the creation & cleanup of a file that can be used as externalized secrets in connector configs.
Reviewers: Arjun Satish <arjun@confluent.io>, Robert Yokota <rayokota@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>, Jason Gustafson <jason@confluent.io>
This is a fix to #5226 to account for config properties that have an
equal char in the value. Otherwise if there is one
equal char in the value the following error occurs:
dictionary update sequence element #XX has length 3; 2 is required
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Ismael Juma <ismael@juma.me.uk>
* Updated TestLogCleaning tool to use Java consumer and rename as LogCompactionTester.
* Enabled the log cleaner in every system test.
* Removed configs from "kafka.properties" with default values and `socket.receive.buffer.bytes`
as the override did not seem necessary.
* Updated `kafka.py` logic to handle duplicates between `kafka.properties` and `server_prop_overrides`.
* Updated Gradle build so that classes from `kafka-clients` test jar can be used in
system tests.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Ismael Juma <ismael@juma.me.uk>
Added a system test which creates a file sink with json converter and attempts to feed it bad records. The bad records should land in the DLQ if it is enabled, and the task should be killed or bad records skipped based on test parameters.
Signed-off-by: Arjun Satish <arjunconfluent.io>
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: Arjun Satish <arjun@confluent.io>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5456 from wicknicks/error-handling-sys-test
If a property requires validation, it should be pretransformed if it is a variable reference, in order to have a value that will properly pass the validation.
Author: Robert Yokota <rayokota@gmail.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5445 from rayokota/KAFKA-7225-pretransform-validated-props
The original way of stopping the minikdc process sometimes misfires because the process arg string is very long, and `ps` is not able to find the correct process. Using the `kill_java_processes` method is more reliable for finding and killing java processes.
In system tests, it is useful to have the thread dumps if a broker cannot be stopped using SIGTERM.
Reviewers: Xavier Léauté <xl+github@xvrl.net>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
1. At the beginning of assign, we first check that all the non-repartition source topics are included in the metadata. If not, we log an error at the leader and set an error in the Assignment userData bytes, indicating that leader cannot complete assignment and the error code would indicate the root cause of it.
2. Upon receiving the assignment, if the error is not NONE the streams will shutdown itself with a log entry re-stating the root cause interpreted from the error code.
Author: tedyu <yuzhihong@gmail.com>
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Closes#5322 from tedyu/trunk
#5253 broke standby restoration for windowed stores.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Added two additional test cases to quota_test.py, which run between brokers and clients with different throttling behaviors. More specifically,
1. clients with new throttling behavior (i.e., post-KIP-219) and brokers with old throttling behavior (i.e., pre-KIP-219)
2. clients with old throttling behavior and brokers with new throttling behavior
Author: Jon Lee <jonlee@linkedin.com>
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5294 from jonlee2/kafka-6944
- Removed Scala consumers (`SimpleConsumer` and `ZooKeeperConsumerConnector`)
and their tests.
- Removed Scala request/response/message classes.
- Removed any mention of new consumer or new producer in the code
with the exception of MirrorMaker where the new.consumer option was
never deprecated so we have to keep it for now. The non-code
documentation has not been updated either, that will be done
separately.
- Removed a number of tools that only made sense in the context
of the Scala consumers (see upgrade notes).
- Updated some tools that worked with both Scala and Java consumers
so that they only support the latter (see upgrade notes).
- Removed `BaseConsumer` and related classes apart from `BaseRecord`
which is used in `MirrorMakerMessageHandler`. The latter is a pluggable
interface so effectively public API.
- Removed `ZkUtils` methods that were only used by the old consumers.
- Removed `ZkUtils.registerBroker` and `ZKCheckedEphemeral` since
the broker now uses the methods in `KafkaZkClient` and no-one else
should be using that method.
- Updated system tests so that they don't use the Scala consumers except
for multi-version tests.
- Updated LogDirFailureTest so that the consumer offsets topic would
continue to be available after all the failures. This was necessary for it
to work with the Java consumer.
- Some multi-version system tests had not been updated to include
recently released Kafka versions, fixed it.
- Updated findBugs and checkstyle configs not to refer to deleted
classes and packages.
Reviewers: Dong Lin <lindong28@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
This could be backported to older branches to reduce the extra log warning messages there, too.
Running Connect system tests in this branch builder job: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1773/
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5151 from rhauch/kafka-7009
In BrokerCompatibilityTest.java, when older versioned broker is used (0.10.1, 0.10.2), LIST_OFFSET is not supported as well. Hence in the verification phase, there is a possibility that consumer hit the UnsupportedVersionException earlier than Streams actually hits it:
org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support LIST_OFFSETS with version in range [2,3]. The supported range is [0,1].
While the test is waiting for
org.apache.kafka.common.errors.UnsupportedVersionException: Cannot create a v0 FindCoordinator request because we require features supported only in 1 or later.
Both are valid errors to expect (the former is from consumer while the latter is from producer of the streams app).
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Vahid Hashemian <vahidhashemian@us.ibm.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Implementation of [KIP-174](https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig)
Configuration properties 'internal.key.converter' and 'internal.value.converter'
are deprecated, and default to org.apache.kafka.connect.json.JsonConverter.
Warnings are logged if values are specified for either, or if properties that
appear to configure instances of internal converters (i.e., ones prefixed with
either 'internal.key.converter.' or 'internal.value.converter.') are given.
The property 'schemas.enable' is also defaulted to false for internal
JsonConverter instances (both for keys and values) if it isn't specified.
Documentation and code have also been updated with deprecation notices and
annotations, respectively.
Unit tests have been updated in `PluginsTest` to account for the new defaults for `schemas.enable` for internal key/value converters, and to ensure that (for the time being), internal key/value converters are still configurable despite being deprecated.
Author: Chris Egerton <chrise@confluent.io>
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4693 from C0urante/kafka-5540
* Removed Scala producers, request classes, kafka.tools.ProducerPerformance, encoders,
tests.
* Updated ConsoleProducer to remove Scala producer support (removed `BaseProducer`
and several options that are not used by the Java producer).
* Updated a few Scala consumer tests to use the new producer (including a minor
refactor of `produceMessages` methods in `TestUtils`).
* Updated `ClientUtils.fetchTopicMetadata` to use `SimpleConsumer` instead of
`SyncProducer`.
* Removed `TestKafkaAppender` as it looks useless and it defined an `Encoder`.
* Minor import clean-ups
No new tests added since behaviour should remain the same after these changes.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5045 from ijuma/kafka-6921-remove-old-producer
test_broker_type_bounce_at_start tries to validate that when the controller is down, the streams client will always fail trying to create the topic; with the current behavior of admin client it is actually not always true: the actual behavior depends on the admin client internals as well as when the controller becomes unavailable during the leader assign partitions phase. I'd suggest at least ignore this test for now until the admin client has more stable (personally I'd even suggest removing this test as its coverage benefits is smaller than its introduced issues to me).
Also adding a few more log4j entries as a result of investigating this issue.
Reviewers: Matthias J. Sax <matthias@confluent.io>