I've observed several reports of sudden unexpected streamthread shutdown with the log entry like:
State transition from PENDING_SHUTDOWN to DEAD
but there is no related error logs before this line at all. I suspect this is because we intentionally do not log for KafkaException and there's some edge cases where we miss internally and hence caused this. I'm adding the ERROR level log entry here in order to reveal more information in case I saw this again in the future.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Update the Trogdor StringExpander regex to handle an epilogue. Previously the regex would use a lazy quantifier at the end, which meant it would not catch anything after the range expression. Add a unit test.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
Added testing of logical types for Kafka Connect in support of KIP-145 features.
Added tests for Boolean, Time, Date and Timestamp, including the valid conversions.
The area of ISO8601 strings is a bit of a mess because the tokenizer is not compatible with
that format, and a subsequent JIRA will be needed to fix that.
A few small fixes as well as creating test cases, but they're clearly just corrections such as
using 0 to mean January (java.util.Calendar uses zero-based month numbers).
Author: Andrew Schofield <andrew_schofield@uk.ibm.com>
Reviewers: Mickael Maison <mimaison@users.noreply.github.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6077 from AndrewJSchofield/KAFKA-7461-ConverterValuesLogicalTypesTest
Expose a programmatic way to bring up a Kafka and Zk cluster through Java API to facilitate integration tests for framework level changes in Kafka Connect. The Kafka classes would be similar to KafkaEmbedded in streams. The new classes would reuse the kafka.server.KafkaServer classes from :core, and provide a simple interface to bring up brokers in integration tests.
Signed-off-by: Arjun Satish <arjunconfluent.io>
Author: Arjun Satish <arjun@confluent.io>
Author: Arjun Satish <wicknicks@users.noreply.github.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5516 from wicknicks/connect-integration-test
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Ryanne Dolan <ryannedolan@gmail.com>, Ismael Juma <ismael@confuent.io>
The existing javadoc for PartitionGroup is a little confusing.
It's relatively important for these concepts to be clear, since
they form the basis for stream-time in Kafka Streams.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
KIP-291 Implementation : Added code to separate controller connections and requests from the data plane.
Tested with local deployment that the controller request are handled by the control plane and other requests are handled by the data plane.
Also added unit tests in order to test the functionality.
Author: Lucas Wang <luwang@linkedin.com>,
Author: Mayuresh Gharat <gharatmayuresh15@gmail.com>
Reviewers: Joel Koshy <jjkoshy@gmail.com>, Jun Rao <junrao@gmail.com>
Using AdminClient#alterConfigs, topic `retention.ms` property can be assigned to a value lesser than -1. This leads to inconsistency while describing the topic configuration. We should not allow values lesser than -1.
Author: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>,Matthias J. Sax <matthias@confluent.io>
Closes#6082 from kamalcph/KAFKA-7781
This patch adds a framework to automatically generate the request/response classes for Kafka's protocol. The code will be updated to use the generated classes in follow-up patches. Below is a brief summary of the included components:
**buildSrc/src**
The message generator code is here. This code is automatically re-run by gradle when one of the schema files changes. The entire directory is processed at once to minimize the number of times we have to start a new JVM. We use Jackson to translate the JSON files into Java objects.
**clients/src/main/java/org/apache/kafka/common/protocol/Message.java**
This is the interface implemented by all automatically generated messages.
**clients/src/main/java/org/apache/kafka/common/protocol/MessageUtil.java**
Some utility functions used by the generated message code.
**clients/src/main/java/org/apache/kafka/common/protocol/Readable.java, Writable.java, ByteBufferAccessor.java**
The generated message code uses these classes for writing to a buffer.
**clients/src/main/message/README.md**
This README file explains how the JSON schemas work.
**clients/src/main/message/\*.json**
The JSON files in this directory implement every supported version of every Kafka API. The unit tests automatically validate that the generated schemas match the hand-written schemas in our code. Additionally, there are some things like request and response headers that have schemas here.
**clients/src/main/java/org/apache/kafka/common/utils/ImplicitLinkedHashSet.java**
I added an optimization here for empty sets. This is useful here because I want all messages to start with empty sets by default prior to being loaded with data. This is similar to the "empty list" optimizations in the `java.util.ArrayList` class.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>, Bob Barrett <bob.barrett@outlook.com>, Jason Gustafson <jason@confluent.io>
+ Add a parameter to the ducktap-ak to control the OpenJDK base image.
+ Fix a few issues of using OpenJDK:11 as the base image.
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: Xi Yang <xi@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6071 from yangxi/ducktape-jdk
* Update KafkaAdminClient#describeTopics to throw UnknownTopicOrPartitionException.
* Remove unused method: WorkerUtils#getMatchingTopicPartitions.
* Add some JavaDoc.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>, Ryanne Dolan <ryannedolan@gmail.com>
The Trogdor Coordinator now overwrites a task's startMs to the time it received it if startMs is in the past.
The Trogdor Agent now correctly expires a task after the expiry time (startMs + durationMs) passes. Previously, it would ignore startMs and expire after durationMs milliseconds of local start of the task.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
When I originally refactored the streams_upgrade_test#upgrade_downgrade_brokers test I removed the wait call which would wait for up 24 minutes for the SmokeTestDriver class to publish and verify all of its records.
Since most of the tests run in two minutes or less I set the monitor_log timeout to three minutes. However, the SmokeTestDriver#verify method allows up to six minutes to consume all records before verifying the monitor_log timeout needs to be greater than 6 minutes. I've set the timeout to 8 minutes.
Additionally, the steps needed to update the streams_upgrade_test should be documented as there are several components that need to get updated. So I've documented those steps here on the test as a giant comment.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
* KAFKA-6627: Prevent config default values overriding ones specified through --producer-property on command line.
In Console{Producer,Consumer}, extraProducerProps (options specified in
--producer-property) is applied first, then overriden unconditionally,
even if the value is not specified explicitly (and default value is
used). This patch fixes it so that it doesn't override the existing
value set by --producer-property if it is not explicitly specified.
The contribution is my original work and I license the work to the
project under the project's open source license.
Reviewers: Sriharsha Chintalapani <sriharsha@apache.org>
As documented in https://issues.apache.org/jira/browse/KAFKA-7741,
the javax dependency we receive transitively from connect is incompatible
with SBT builds.
Streams doesn't use the portion of Connect that needs the dependency,
so we can fix the builds by simply excluding it.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
We recently improved the handling of the InternalTopicManager retries with #6085. The AdminClient will throw an InvalidTopicException if the topic is not found. We need to ignore that exception as when calling AdminClient#describe we may not have had a chance to create the topic yet, especially with the case of internal topics
I've created a new test asserting that when an InvalidTopicException is thrown when the topic is not found we continue on.
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
There is a race condition in ReplicaFetcherThread, where we can update PartitionFetchState with the new leader epoch (same leader) before handling the OffsetsForLeaderEpoch response with FENCED_LEADER_EPOCH error which causes removing partition from partitionStates, which in turn causes no fetching until the next LeaderAndIsr.
This patch adds logic to ensure that the leader epoch doesn't change while an OffsetsForLeaderEpoch request is in flight (which could happen with back-to-back leader elections). If it has changed, we ignore the response.
Also added toString() implementation to PartitionData, because some log messages did not show useful info which I found while investigating the above system test failure.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Use `info` for failed authentications and `debug` for successful ones.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Ryanne Dolan <ryannedolan@gmail.com>, Guozhang Wang <guozhang@confluent.io>
The test always fails if testOptionsDoesNotIncludeWadlOutput is executed before testCORSEnabled. It seems the problem is the use of the system property. Perhaps there is some static caching somewhere.
Reviewers: Randall Hauch <rhauch@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
This patch changes the behavior of KafkaProducer.waitOnMetadata to wait up to max.block.ms when the partition specified in the produce request is out of the range of partitions present in the metadata. This improves the user experience in the case when partitions are added to a topic and a client attempts to produce to one of the new partitions before the metadata has propagated to the brokers. Tested with unit tests.
Reviewers: Arjun Satish <arjun@confluent.io>, Jason Gustafson <jason@confluent.io>
The null map returned from the current snapshot causes the null type in response. The connector class name can be taken from the config of request instead since we require the config should contain the connector class name.
Reviewers: Jason Gustafson <jason@confluent.io>
This commit creates an EndToEndTest base class which relies on the verifiable consumer. This will ultimately replace ProduceConsumeValidateTest which depends on the console consumer. The advantage is that the verifiable consumer exposes more information to use for validation. It also allows for a nicer shutdown pattern. Rather than relying on the console consumer idle timeout, which requires a minimum wait time, we can halt consumption after we have reached the last acked offsets. This should be more reliable and faster. The downside is that the verifiable consumer only works with the new consumer, so we cannot yet convert the upgrade tests. This commit converts only the replication tests and a flaky security test to use EndToEndTest.
Lookup client host name after every full iteration through the addresses returned.
Reviewers: Loïc Monney <loicmonney@github.com>, Edoardo Comar <ecomar@uk.ibm.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
The StreamsUpgradeTest::test_upgrade_downgrade_brokers used sleep calls in the test which led to flaky test performance and as a result, we placed an @ignore annotation on the test. This PR uses log events instead of the sleep calls hence we can now remove the @ignore setting.
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
While looking into KAFKA-7657, I found there are a few loopholes in this logic:
We kept a map of thread-name to thread-state and a global-thread state at the KafkaStreams instance-level, in addition to the instance state itself. stateLock is used when accessing the instance state, however when we are in the thread state change callback, we are accessing both the thread-states as well as the instance state at the same time in the callers of setState without a lock, which is vulnerable to concurrent multi-stream threads. The fix is a) introduce a threadStatesLock in addition to the stateLock, which should always be grabbed to modify the thread-states map before the stateLock for modifying the instance level; and we also defer the checking of the instance-level state inside the setState call.
When transiting to state.RUNNING, we check if all threads are either in RUNNING or DEAD state, this is because some threads maybe dead at the rebalance period but we should still proceed to RUNNING if the rest of threads are still transiting to RUNNING.
Added unit test for 2) above. Also simplified another test as a nit change.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>
While looking into KAFKA-7657, I found there are a few loopholes in this logic:
1. We kept a map of thread-name to thread-state and a global-thread state at the KafkaStreams instance-level, in addition to the instance state itself. stateLock is used when accessing the instance state, however when we are in the thread state change callback, we are accessing both the thread-states as well as the instance state at the same time in the callers of setState without a lock, which is vulnerable to concurrent multi-stream threads. The fix is a) introduce a threadStatesLock in addition to the stateLock, which should always be grabbed to modify the thread-states map before the stateLock for modifying the instance level; and we also defer the checking of the instance-level state inside the setState call.
2. When transiting to state.RUNNING, we check if all threads are either in RUNNING or DEAD state, this is because some threads maybe dead at the rebalance period but we should still proceed to RUNNING if the rest of threads are still transiting to RUNNING.
Added unit test for 2) above. Also simplified another test as a nit change.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>