Author: radai-rosenblatt <radai.rosenblatt@gmail.com>
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <github@juma.me.uk>, Dong Lin <lindong28@gmail.com>
Closes#5221 from radai-rosenblatt/metadata-adventures
Pre-initialization of clients in IntegrationTestHarness is a cause of significant confusion and has resulted in a bunch of inconsistent client creation patterns. This patch requires test cases to create needed clients explicitly and makes the creation logic more consistent.
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Part I of KIP-238:
* add grace period to Windows
* deprecate retention/maintainMs and segmentInterval from Windows
* record expired records in the store with a new metric
* record late record drops as a new metric instead of as a "skipped record"
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Increase record size and use compression for downconversion metrics test to ensure that conversion time is above 1ms to avoid transient test failures.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Since ConsumerFetcherThread has been removed, we have
an opportunity to simplify the *FetcherThread classes. This
is an unambitious first step which removes the now unneeded
`PartitionData` indirection.
Currently, we skip the steps to make a replica a follower if the leader does not change, including truncating the follower log if necessary. This can cause problems if the follower has missed one or more leader updates. Change the logic to only skip the steps if the new epoch is the same or one greater than the old epoch. Tested with unit tests that verify the behavior of `Partition` and that show log truncation when the follower's log is ahead of the leader's, the follower has missed an epoch update, and the follower receives a `LeaderAndIsrRequest` making it a follower.
Reviewers: Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>
#5468 introduced a breaking API change that was actually avoidable. This PR re-introduces the old API as deprecated and alters the API introduced by #5468 to be consistent with the other methods
also, fixed misc syntax problems
- fix log statement in Topology Builder.
- addressed some warnings shown by Intellij
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Satish Duggana <satishd@apache.org>, Matthias J. Sax <matthias@confluent.io>
Use a volatile field to track the size of the set of assigned partitions to avoid the concurrent access to the underlying linked hash map.
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
While working on 4th PR, I noticed that I had missed adding stores via the graph vs. directly via the InternalStreamsBuilder. Probably ok to do so, but we should be consistent.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
While debugging the reported issue, I found that our current unit test lacks coverage to actually expose the underlying root cause.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
ACL updates currently get `(currentAcls, currentVersion)` for the resource from ZK and do a conditional update using `(currentAcls+newAcl, currentVersion)`. This supports concurrent atomic updates if the resource path already exists in ZK. If the path doesn't exist, we currently do a conditional createOrUpdate using `(newAcl, -1)`. But `-1` has a special meaning in ZooKeeper for update operations - it means match any version. So two brokers adding acls using `(newAcl1, -1)` and `(newAcl2, -1)` will result in one broker creating the path and setting newAcl1, while the other broker can potentially update the path with `(newAcl2, -1)`, losing newAcl1. The timing window is very small, but we have seen intermittent failures in `SimpleAclAuthorizerTest.testHighConcurrencyModificationOfResourceAcls` as a result of this window.
This commit fixes the version used for conditional updates in ZooKeeper. It also replaces the confusing `ZkVersion.NoVersion=-1` used for `set(any-version)` and `get(return not-found)` with `ZkVersion.MatchAnyVersion` for `set(any-version)` and `ZkVersion.UnknownVersion` for `get(return not-found)` to avoid the return value from `get` matching arbitrary values in `set`.
Added a system test which creates a file sink with json converter and attempts to feed it bad records. The bad records should land in the DLQ if it is enabled, and the task should be killed or bad records skipped based on test parameters.
Signed-off-by: Arjun Satish <arjunconfluent.io>
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: Arjun Satish <arjun@confluent.io>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5456 from wicknicks/error-handling-sys-test
In AK's documentation, the config props for connectors are not listed (https://kafka.apache.org/documentation/#connectconfigs). This PR adds these sink and source connector configs to the html site-docs.
Signed-off-by: Arjun Satish <arjunconfluent.io>
Author: Arjun Satish <arjun@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5469 from wicknicks/add-connector-configs-to-docs
If a property requires validation, it should be pretransformed if it is a variable reference, in order to have a value that will properly pass the validation.
Author: Robert Yokota <rayokota@gmail.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5445 from rayokota/KAFKA-7225-pretransform-validated-props
The original way of stopping the minikdc process sometimes misfires because the process arg string is very long, and `ps` is not able to find the correct process. Using the `kill_java_processes` method is more reliable for finding and killing java processes.
Some sections of the Monitoring metrics documentation list out the -total metrics, and some sections do not list them out. Make them consistent and list out the missing -total metrics.
Summary:
1. Revert GroupMetadata.members to private
2. Add back a wrongly removed comment
3. In GroupMetadata.remove(), update supportedProtocols and awaitingJoinCallbackMembers, only when the remove succeeded
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Sriharsha Chintalapani <sriharsha@apache.org>
We currently do a lot of bookkeeping for timeouts which is both error-prone and distracting. This patch adds a new `Timer` class to simplify this logic and control unnecessary calls to system time. In particular, this helps with nested timeout operations. The consumer has been updated to use the new class.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Currently Fetcher.getTopicMetadata() will not include offline partitions. Thus
KafkaConsumer.partitionsFor(topic) will not return all partitions of a topic if
there if any partition of the topic is offline. This causes problem if user
tries to query the total number of partitions of the given topic.
Author: radai-rosenblatt <radai.rosenblatt@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4679 from radai-rosenblatt/partition_shenanigans
1. In each iteration, decide if a task is processable if all of its partitions contains data, so it can decide which record to process next.
1.a Add one exception that, if the task indeed have data on some but not all of its partitions, we only consider as not processable for some finite round of iterations.
1.b Add a task-level metric to record whenever we are forced to process a task that is only "partially data available", since it may leads to non-determinism.
2. Break the main loop on put-raw-data and process-them. Since now not all data put into the queue would be processed completely within a single iteration.
3. NOTE that within an iteration, if a task has exhausted one of its queue it will still be processed, since we only update processable list once in each iteration, I'm improving on this on the follow-up part III PR.
4. Found and fixed a bug in metrics recording: the taskName and sensorName parameters were exchanged.
5. Optimized task stream time computation again since our current partition stream time reasoning has been simplified.
6. Added unit tests.
Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <vvcephei@users.noreply.github.com>, Bill Bejeck <bbejeck@gmail.com>