fixes lgmt.com warnings
cleanup PrintForeachAction and Printed
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Sebastian Bauersfeld <sebastianbauersfeld@gmx.de>, Damian Guy <damian@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
ensure that brokers are registered at ZK before start() returns
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Damian Guy <damian@confluent.io>, Guozhang Wang <guozhang@confluent.io>
This patch reverts the removal of the --execute option in the offset reset tool and the change to the default behavior when no options were present. For consistency, this patch adds the --execute flag to the streams reset tool, but keeps its current default behavior. A note has been added to both of these commands to warn the user that future default behavior will be to prompt before acting.
Test cases were not actually validating that offsets were committed when the --execute option was present, so I have fixed that and added basic assertions for the dry-run behavior. I also removed some duplicated test boilerplate.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Any exception thrown by calls within a `main()` method are not logged unless explicitly done so. This change simply adds a try-catch block around most of the content of the distributed and standalone `main()` methods.
**NOTE: This should be backported to the `1.1` branch, and is currently a blocker for 1.1.**
The `connect_test.py::ConnectStandaloneFileTest.test_file_source_and_sink` system test is failing with the SASL configuration without a sufficient explanation. During the test, the Connect worker fails to start, but the Connect log contains no useful information. There are actual several things compounding to cause the failure and make it difficult to understand the problem.
First, the `tests/kafkatest/tests/connect/templates/connect_standalone.properties` is only adding in the broker's security configuration with the `producer.` and `consumer.` prefixes, but is not adding them with no prefix. The worker uses the AdminClient to connect to the broker to get the Kafka cluster ID and to manage the three internal topics, and the AdminClient is configured via top-level properties. Because the SASL test requires the clients all connect using SASL, the lack of broker security configs means the AdminClient was attempting and failing to connect to the broker. This is corrected by adding the broker's security configuration to the Connect worker configuration file at the top-level. (This was already being done in the `connect_distributed.properties` file.)
Second, the default `request.timeout.ms` for the AdminClient (and the other clients) is 120 seconds, so the AdminClient was retrying for 120 seconds before it would give up and thrown an error. However, the test was only waiting for 60 seconds before determining that the service failed to start. This can be corrected by setting `request.timeout.ms=10000` in the Connect distributed and standalone worker configurations.
Third, the Connect workers were recently changed to lookup the Kafka cluster ID before it started the herder. This is unlike the older uses of the AdminClient to find and manage the internal topics, where failure to connect was not necessarily logged correctly but nevertheless still skipped over, relying upon broker auto-topic creation to create the internal topics. (This may be why the test did not fail prior to the recent change to always require a successful AdminClient connection.) Although the worker never got this far in its startup process, the fact that we missed such an error since the prior releases means that failure to connect with the AdminClient was not being properly reported.
The `ConnectStandaloneFileTest.test_file_source_and_sink` system tests were run locally prior to this fix, and they failed as with the nightlies. Once these fixes were made, the locally run system tests passed.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <me@ewencp.org>
Closes#4610 from rhauch/kafka-6577-trunk
fixes error message handling for test consumer client and KafkaStreams instance
updates expected error message
fixes race condition in system test code and avoids starting Streams processor twice
Author: Matthias J. Sax <matthias@confluent.io.>
Reviewer: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Currently port forwarding is setup for HttpMetricsCollector when the Service's start_node method is called, but not canceled during stop. This hasn't presented a problem so far because we don't have tests that use this *and* restart the service. However, if a test/service does that, it will throw an exception since the port is already bound.
This just does the cleanup when stopping so a subsequent attempt to start again will succeed.
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1320 is a test run for a Test that uses ProducerPerformanceService, which in turn uses HttpMetricsCollector to validate the change.
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>
Closes#4604 from ewencp/cleanup-reverse-port-forward
Reading the configuration field names from ProducerConfig class and taking the key and value serializer names from class name directly instead of hardcoding.
Update `KafkaController.brokerInfo` when listeners are updated since this value is used to register the broker in ZooKeeper if there ZK session expires. Also added test to verify values in ZK after session expiry.
This patch fixes a bug in the validation of the inter-broker protocol and the message format version. We should allow the configured message format api version to be greater than the inter-broker protocol api version as long as the actual message format versions are equal. For example, if the message format version is set to 1.0, it is fine for the inter-broker protocol version to be 0.11.0 because they both use message format v2.
I have added a unit test which checks compatibility for all combinations of the message format version and the inter-broker protocol version.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4583 from hachikuji/KAFKA-6328-REOPENED
Add docs for the ReplicationBytesInPerSec and ReplicationBytesOutPerSec metrics. These metrics were introduced in KIP-153 (KAFKA-5194) but the docs were not updated.
Built site-docs, and viewed them in a web browser to confirm.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Enabling scans for super types in reflections is required in order to discover Connect plugins.
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Working on increasing the coverage of stores in unit tests.
Started with `InMemoryKeyValueLoggedStore`
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Change the string in the test condition to the one that is logged
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4599 from dguy/broker-compatibility
Sending the response code of an http request issued via `RestClient` in Connect to stdout seems like a unconventional choice.
This PR redirects the responds code with a message in the logs at DEBUG level (usually the same level as the one that the caller of `RestClient.httpRequest` uses.
This fix will also fix system tests that broke by outputting this response code to stdout.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#4591 from kkonstantine/MINOR-Redirect-response-code-in-Connect-RestClient-to-logs-instead-of-stdout
Another fall-through of `headers.converter` and `batch.size` properties. Here in `FileStreamSourceConnector` tests
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#4590 from kkonstantine/MINOR-Fix-file-source-task-config-in-system-tests
Add validation checks that the offset range is valid and aligned with the batch count prior to appending to the log. Several unit tests have been added to verify the various invalid cases.
https://github.com/apache/kafka/pull/4356 added `batch.size` config property to `FileStreamSourceConnector` but the property was added as required without a default in config definition (`ConfigDef`). This results in validation error during connector startup.
Unit tests were added for both `FileStreamSourceConnector` and `FileStreamSinkConnector` to avoid such issues in the future.
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
The log cleaner should not naively remove the partition from in progress map without checking the partition state. This may cause the other thread calling `LogCleanerManager.abortAndPauseCleaning()` to hang indefinitely.
Enable deep-iteration option when print-data-log is enabled in DumpLogSegments. Otherwise data is not printed.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Avoid loosing log/stdout/stderr files on restart
Reenables tests
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
Allows input data to be read from a file and removes .toLowerCase in word count stream
Author: Filipe Agapito <filipe.agapito@gmail.com>
Reviewers: Ted Yu <yuzhihong@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Make it clear in the docs that the rebalance listener is only invoked during an active call to `poll()`. Plus a few additional doc cleanups.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Currently we hold onto all Records references in a multi-partition fetch response until the full response has completed. This can be a problem when the records have been down-converted since they will be occupying a (potentially large) chunk of memory. This patch changes the behavior in MultiSend so that once a Send is completed, we no longer keep a reference to it, which will allow the Records objects to be freed sooner.
I have added a simple unit test to verify that sends are removed as the MultiSend progresses.
Reviewers: Ismael Juma <ismael@juma.me.uk>
`Node` is immutable so this is safe.
With 100 brokers, 150 topics and 350 partitions, `HashSet.contains` in `RecordAccumulator.ready` took about 40% of the application time. It
is caused by re-calculating a hash code of a leader (Node instance) for
every batch entry. Caching the hashCode reduced the time of
`HashSet.contains` in `RecordAccumulator.ready` to ~2%. The
measurements were taken with Flight Recorder.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ted Yu <yuzhihong@gmail.com>, Ismael Juma <ismael@juma.me.uk>
ZooKeeperClient acquires initializationLock#writeLock to establish a new connection while processing session expiry WatchEvent. ZooKeeperClient#handleRequests acquires initializationLock#readLock, allowing multiple batches of requests to be processed concurrently, but preventing reconnections while processing requests. At the moment, handleRequests holds onto the readLock throughout the method, even while waiting for responses and inflight requests to complete. But responses cannot be delivered if event thread is blocked on the writeLock to process session expiry event. This results in a deadlock. During broker shutdown, the shutdown thread is also blocked since it needs the readLock to perform ZooKeeperClient#unregisterStateChangeHandler, which cannot be acquired if a session expiry had occurred earlier since this thread gets queued behind the event handler thread waiting for writeLock.
This commit reduces locking in ZooKeeperClient#handleRequests to just the non-blocking send, so that session expiry handling doesn't get blocked when a send is blocked waiting for responses. Also moves session expiry handling to a separate thread so that Kafka controller doesn't block the event handler thread when processing session expiry.
Avoids double initialization of resuming tasks
Removes race condition in StreamThreadTest plus code cleanup
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
ProducerBatch retains references to MemoryRecordsBuilder and cannot be freed until acks are received. Removing references to buffers used for compression after records are built will enable these to be garbage collected sooner, reducing the risk of OOM.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Lothsahn <Lothsahn@gmail.com>
…ateMetadataRequestForBrokers
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: Lucas Wang <luwang@linkedin.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#4472 from gitlw/improving_addUpdateMetadataRequestForBrokers
This fixes two alerts flagged on lgtm.com for Apache Kafka.
This dead code alert where InvalidTypeIdException indirectly extends JsonMappingException. The flagged condition with the type test appears after the type test for the latter and thus makes its body dead. I opted to change the order of the tests. Please let me know if this is the intended behavior.
The second commit addresses this out-of-bounds alert.
More alerts can be found here. Note that my colleague Aditya Sharad addressed some of those in the now outdated #2939.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
This is a small change to parallelize plugin scanning. This may help in some environments where otherwise plugin scanning is slow.
Author: Robert Yokota <rayokota@gmail.com>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4561 from rayokota/K6503-improve-plugin-scanning
Added a second check for race condition where store changelog topic updated during restore, but not if a KTable changelog topic. This will be tricky to test, but I wanted to push the PR to get feedback on the approach.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Prior to this patch, the consumer always blocks in poll() if there are any partitions which are awaiting their initial positions. This behavior was inconsistent with normal fetch behavior since we allow fetching on available partitions even if one or more of the assigned partitions becomes unavailable _after_ initial offset lookup. With this patch, the consumer will do offset resets asynchronously, which allows other partitions to make progress even if the initial positions for some partitions cannot be found.
I have added several new unit tests in `FetcherTest` and `KafkaConsumerTest` to verify the new behavior. One minor compatibility implication worth mentioning is apparent from the change I made in `DynamicBrokerReconfigurationTest`. Previously it was possible to assume that all partitions had a fetch position after `poll()` completed with a non-empty assignment. This assumption is no longer generally true, but you can force the positions to be updated using the `position()` API which still blocks indefinitely until a position is available.
Note that this this patch also removes the logic to cache committed offsets in `SubscriptionState` since it was no longer needed (the consumer's `committed()` API always does an offset lookup anyway). In addition to avoiding the complexity of maintaining the cache, this avoids wasteful offset lookups to refresh the cache when `commitAsync()` is used.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
* do not use static properties
* use new object to take appID
* capture timeout exception inside condition
Reviewers: Matthias J. Sax <matthias@confluent.io>