In system tests, it is useful to have the thread dumps if a broker cannot be stopped using SIGTERM.
Reviewers: Xavier Léauté <xl+github@xvrl.net>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
1. At the beginning of assign, we first check that all the non-repartition source topics are included in the metadata. If not, we log an error at the leader and set an error in the Assignment userData bytes, indicating that leader cannot complete assignment and the error code would indicate the root cause of it.
2. Upon receiving the assignment, if the error is not NONE the streams will shutdown itself with a log entry re-stating the root cause interpreted from the error code.
Author: tedyu <yuzhihong@gmail.com>
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Closes#5322 from tedyu/trunk
#5253 broke standby restoration for windowed stores.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Added two additional test cases to quota_test.py, which run between brokers and clients with different throttling behaviors. More specifically,
1. clients with new throttling behavior (i.e., post-KIP-219) and brokers with old throttling behavior (i.e., pre-KIP-219)
2. clients with old throttling behavior and brokers with new throttling behavior
Author: Jon Lee <jonlee@linkedin.com>
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5294 from jonlee2/kafka-6944
- Removed Scala consumers (`SimpleConsumer` and `ZooKeeperConsumerConnector`)
and their tests.
- Removed Scala request/response/message classes.
- Removed any mention of new consumer or new producer in the code
with the exception of MirrorMaker where the new.consumer option was
never deprecated so we have to keep it for now. The non-code
documentation has not been updated either, that will be done
separately.
- Removed a number of tools that only made sense in the context
of the Scala consumers (see upgrade notes).
- Updated some tools that worked with both Scala and Java consumers
so that they only support the latter (see upgrade notes).
- Removed `BaseConsumer` and related classes apart from `BaseRecord`
which is used in `MirrorMakerMessageHandler`. The latter is a pluggable
interface so effectively public API.
- Removed `ZkUtils` methods that were only used by the old consumers.
- Removed `ZkUtils.registerBroker` and `ZKCheckedEphemeral` since
the broker now uses the methods in `KafkaZkClient` and no-one else
should be using that method.
- Updated system tests so that they don't use the Scala consumers except
for multi-version tests.
- Updated LogDirFailureTest so that the consumer offsets topic would
continue to be available after all the failures. This was necessary for it
to work with the Java consumer.
- Some multi-version system tests had not been updated to include
recently released Kafka versions, fixed it.
- Updated findBugs and checkstyle configs not to refer to deleted
classes and packages.
Reviewers: Dong Lin <lindong28@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
This could be backported to older branches to reduce the extra log warning messages there, too.
Running Connect system tests in this branch builder job: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1773/
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5151 from rhauch/kafka-7009
In BrokerCompatibilityTest.java, when older versioned broker is used (0.10.1, 0.10.2), LIST_OFFSET is not supported as well. Hence in the verification phase, there is a possibility that consumer hit the UnsupportedVersionException earlier than Streams actually hits it:
org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support LIST_OFFSETS with version in range [2,3]. The supported range is [0,1].
While the test is waiting for
org.apache.kafka.common.errors.UnsupportedVersionException: Cannot create a v0 FindCoordinator request because we require features supported only in 1 or later.
Both are valid errors to expect (the former is from consumer while the latter is from producer of the streams app).
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Vahid Hashemian <vahidhashemian@us.ibm.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Implementation of [KIP-174](https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig)
Configuration properties 'internal.key.converter' and 'internal.value.converter'
are deprecated, and default to org.apache.kafka.connect.json.JsonConverter.
Warnings are logged if values are specified for either, or if properties that
appear to configure instances of internal converters (i.e., ones prefixed with
either 'internal.key.converter.' or 'internal.value.converter.') are given.
The property 'schemas.enable' is also defaulted to false for internal
JsonConverter instances (both for keys and values) if it isn't specified.
Documentation and code have also been updated with deprecation notices and
annotations, respectively.
Unit tests have been updated in `PluginsTest` to account for the new defaults for `schemas.enable` for internal key/value converters, and to ensure that (for the time being), internal key/value converters are still configurable despite being deprecated.
Author: Chris Egerton <chrise@confluent.io>
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4693 from C0urante/kafka-5540
* Removed Scala producers, request classes, kafka.tools.ProducerPerformance, encoders,
tests.
* Updated ConsoleProducer to remove Scala producer support (removed `BaseProducer`
and several options that are not used by the Java producer).
* Updated a few Scala consumer tests to use the new producer (including a minor
refactor of `produceMessages` methods in `TestUtils`).
* Updated `ClientUtils.fetchTopicMetadata` to use `SimpleConsumer` instead of
`SyncProducer`.
* Removed `TestKafkaAppender` as it looks useless and it defined an `Encoder`.
* Minor import clean-ups
No new tests added since behaviour should remain the same after these changes.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5045 from ijuma/kafka-6921-remove-old-producer
test_broker_type_bounce_at_start tries to validate that when the controller is down, the streams client will always fail trying to create the topic; with the current behavior of admin client it is actually not always true: the actual behavior depends on the admin client internals as well as when the controller becomes unavailable during the leader assign partitions phase. I'd suggest at least ignore this test for now until the admin client has more stable (personally I'd even suggest removing this test as its coverage benefits is smaller than its introduced issues to me).
Also adding a few more log4j entries as a result of investigating this issue.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Due to #4644 the consumer connector logs will be much more clean with fewer "broker may not be available" entries. We need to reduce the required frequency from 100 to a smaller number.
I've thought about reducing to just 1, but it may still be transient (i.e. even if broker is starting up you may see a few entries) so I reduced it to 10.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR does the following:
* Remove the StreamsRepeatingIntegerKeyProducerService and the associated Java class
* Add a parameter to VerifiableProducer.java to enable sending keys when specified
* Update the corresponding Python file verifiable_producer.py to support the new parameter.
Reviewers: Matthias J Sax <matthias@confluentio>, Guozhang Wang <wangguoz@gmail.com>
If not pinned, the following error will happen:
Traceback (most recent call last):
File "/usr/bin/pip", line 9, in <module>
from pip import main
ImportError: cannot import name main
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This PR fixes some regressions introduced into streams system tests and sets the upgrade tests to ignore until PR #4636 is merged as it has the fixes for the upgrade tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
There are a couple minor additions in this PR:
1. Add a new test for window store, to range query upon receiving each record.
2. In the non-windowed state store case, add a get call before the put call.
3. Enable caching by default to be consistent with other Join / Aggregate cases, where caching is enabled by default.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
- adds Streams upgrade tests for 1.1 release
- introduces metadata version 3
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
SimpleBenchmark:
1.a Do not rely on manual num.records / bytes collection on atomic integers.
1.b Rely on config files for num.threads, bootstrap.servers, etc.
1.c Add parameters for key skewness and value size.
1.d Refactor the tests for loading phase, adding tumbling-windowed count.
1.e For consumer / consumeproduce, collect metrics on consumer instead.
1.f Force stop the test after 3 minutes, this is based on empirical numbers of 10M records.
Other tests: use config for kafka bootstrap servers.
streams_simple_benchmark.py: only use scale 1 for system test, remove yahoo from benchmark tests.
Note that the JMX based metrics is more accurate than the manually collected metrics.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
The standby-task test failed due to standby task distribution not be exactly equal. I think this will be the case from time to time, so I've updated test to make sure the standby task assignment count is not zero.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
1. Use JmxMixin for SimpleBenchmark (will remove the self reporting in #4744), only when loading phase is false (i.e. we are in fact starting the streams app).
2. Reported the full jmx reported metrics in log files, and in the returned data only return the max values: this is because we want to skip the warming up and cooling down periods that will have lower rate numbers, while max represents the actual rate at full speed.
3. Incorporates two other improves to JMXTool: #1241 and #2950
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Rohan Desai <desai.p.rohan@gmail.com>
This is a general change and is re-requisite to allow streams benchmark test with different streams tests. For the streams benchmark itself I will have a separate PR for switching configs. Details:
1. Create a "streams.properties" file under PERSISTENT_ROOT before all the streams test. For now it will only contain a single config of state.dir pointing to PERSISTENT_ROOT.
2. For all the system test related code, replace the main function parameter of state.dir with propsFilename, then inside the function load the props from the file and apply overrides if necessary.
3. Minor fixes.
Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
Ensures Kafka chroot exists in ZK when starting KafkaService so commands that use ZK and are executed before the first Kafka broker starts do not fail due to the missing chroot.
Also uses chroot with one test that also has security parameterizations so Kafka's test suite exercises these combinations. Previously no tests were exercising chroots.
Changes were validated using sanity_checks which include the chroot-ed test as well as some non-chroot-ed tests.
Some changes required to get the Streams system tests working via Docker
To test:
TC_PATHS="tests/kafkatest/tests/streams" bash tests/docker/run_tests.sh
That command will take about 3.5 hours, and should pass. Note there are a couple of ignored tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Damian Guy <damian@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
ensure that brokers are registered at ZK before start() returns
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Damian Guy <damian@confluent.io>, Guozhang Wang <guozhang@confluent.io>
**NOTE: This should be backported to the `1.1` branch, and is currently a blocker for 1.1.**
The `connect_test.py::ConnectStandaloneFileTest.test_file_source_and_sink` system test is failing with the SASL configuration without a sufficient explanation. During the test, the Connect worker fails to start, but the Connect log contains no useful information. There are actual several things compounding to cause the failure and make it difficult to understand the problem.
First, the `tests/kafkatest/tests/connect/templates/connect_standalone.properties` is only adding in the broker's security configuration with the `producer.` and `consumer.` prefixes, but is not adding them with no prefix. The worker uses the AdminClient to connect to the broker to get the Kafka cluster ID and to manage the three internal topics, and the AdminClient is configured via top-level properties. Because the SASL test requires the clients all connect using SASL, the lack of broker security configs means the AdminClient was attempting and failing to connect to the broker. This is corrected by adding the broker's security configuration to the Connect worker configuration file at the top-level. (This was already being done in the `connect_distributed.properties` file.)
Second, the default `request.timeout.ms` for the AdminClient (and the other clients) is 120 seconds, so the AdminClient was retrying for 120 seconds before it would give up and thrown an error. However, the test was only waiting for 60 seconds before determining that the service failed to start. This can be corrected by setting `request.timeout.ms=10000` in the Connect distributed and standalone worker configurations.
Third, the Connect workers were recently changed to lookup the Kafka cluster ID before it started the herder. This is unlike the older uses of the AdminClient to find and manage the internal topics, where failure to connect was not necessarily logged correctly but nevertheless still skipped over, relying upon broker auto-topic creation to create the internal topics. (This may be why the test did not fail prior to the recent change to always require a successful AdminClient connection.) Although the worker never got this far in its startup process, the fact that we missed such an error since the prior releases means that failure to connect with the AdminClient was not being properly reported.
The `ConnectStandaloneFileTest.test_file_source_and_sink` system tests were run locally prior to this fix, and they failed as with the nightlies. Once these fixes were made, the locally run system tests passed.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <me@ewencp.org>
Closes#4610 from rhauch/kafka-6577-trunk
fixes error message handling for test consumer client and KafkaStreams instance
updates expected error message
fixes race condition in system test code and avoids starting Streams processor twice
Author: Matthias J. Sax <matthias@confluent.io.>
Reviewer: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Currently port forwarding is setup for HttpMetricsCollector when the Service's start_node method is called, but not canceled during stop. This hasn't presented a problem so far because we don't have tests that use this *and* restart the service. However, if a test/service does that, it will throw an exception since the port is already bound.
This just does the cleanup when stopping so a subsequent attempt to start again will succeed.
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1320 is a test run for a Test that uses ProducerPerformanceService, which in turn uses HttpMetricsCollector to validate the change.
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>
Closes#4604 from ewencp/cleanup-reverse-port-forward
Change the string in the test condition to the one that is logged
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4599 from dguy/broker-compatibility
Another fall-through of `headers.converter` and `batch.size` properties. Here in `FileStreamSourceConnector` tests
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#4590 from kkonstantine/MINOR-Fix-file-source-task-config-in-system-tests
Avoid loosing log/stdout/stderr files on restart
Reenables tests
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>