Key improvements with this PR:
* tasks will remain available for IQ during a rebalance (but not during restore)
* continue restoring and processing standby tasks during a rebalance
* continue processing active tasks during rebalance until the RecordQueue is empty*
* only revoked tasks must suspended/closed
* StreamsPartitionAssignor tries to return tasks to their previous consumers within a client
* but do not try to commit, for now (pending KAFKA-7312)
Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This patch adds flexible version support for the following inter-broker APIs: ControlledShutdown, LeaderAndIsr, UpdateMetadata, and StopReplica. Version checks have been removed from `getErrorResponse` methods since they were redundant given the checks in `AbstractRequest` and the respective`*Data` types.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Implementation of [KIP-382 "MirrorMaker 2.0"](https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0)
Author: Ryanne Dolan <ryannedolan@gmail.com>
Author: Arun Mathew <arunmathew88@gmail.com>
Author: In Park <inpark@cloudera.com>
Author: Andre Price <obsoleted@users.noreply.github.com>
Author: christian.hagel@rio.cloud <christian.hagel@rio.cloud>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, William Hammond <william.t.hammond@gmail.com>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jakub Korzeniowski, Tim Carey-Smith, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Arun Mathew, Jeremy-l-ford, vpernin, Oleg Kasian <oleg.kasian@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Qihong Chen, Sriharsha Chintalapani <sriharsha@apache.org>, Jun Rao <junrao@gmail.com>, Randall Hauch <rhauch@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#6295 from ryannedolan/KIP-382
Minor tweak to gracefully handle a possible IllegalStateException while checking a metric value.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
1. Moves StreamsMetricsImpl from StreamThread to KafkaStreams
2. Adds instance-level metrics as specified in KIP-444, i.e.:
-- version
-- commit-id
-- application-id
-- topology-description
-- state
Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Small follow-up to trunk PR #7423
While debugging the 2.3 VP PR we realized we should remove the leader-tracking from the VP system test altogether. We'd already merged the corresponding trunk PR so I made a quick new PR for trunk (also fixes a missed version bump in one of the log messages)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Trim whitespaces in topic names specified in sink connector configs before subscribing to the consumer. Topic names don't allow whitespace characters, so trimming only will eliminate potential problems and will not place additional limits on topics specified in sink connectors.
Author: Magesh Nandakumar <magesh.n.kumar@gmail.com>
Reviewers: Arjun Satish <arjun@confluent.io>, Randall Hauch <rhauch@gmail.com>
https://issues.apache.org/jira/browse/KAFKA-3705
Allows for a KTable to map its value to a given foreign key and join on another KTable keyed on that foreign key. Applies the joiner, then returns the tuples keyed on the original key. This supports updates from both sides of the join.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Christopher Pettitt <cpettitt@confluent.io>, Bill Bejeck <bbejeck@gmail.com>, Jan Filipiak <Jan.Filipiak@trivago.com>, pgwhalen, Alexei Daniline
Adds support for the Connect Cast transforms to cast from Connect logical types, such as DATE, TIME, TIMESTAMP, and DECIMAL. Casting to numeric types will produce the underlying numeric value represented in the desired type. For logical types represented by underlying Java Date class, this means the milliseconds since EPOCH. For Decimal, this means the underlying value. If the value does not fit in the desired target type, it may overflow.
Casting to String from Date, Time, and Timestamp types will produce their ISO 8601 representation. Casting to String from Decimal will result in the value represented as a string. e.g. 1234 -> "1234".
Author: Nigel Liang <nigel@nigelliang.com>
Reviewer: Randall Hauch <rhauch@gmail.com>
I tried to add a test for this, but it's actually pretty hard to verify what we want to
verify. I could add a test that checks the correlation field after the connection
has been established, but it would not catch this kind of bug where the issue is not
the value we store, but the value we create the request header with.
I have another PR that avoids intermediate structs during serialization/deserialization,
which has a test that fails without this change. So we'll get coverage that way.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Implemented KIP-475 to add new metrics for each worker to expose the number of current tasks per connector and per status.
Author: Cyrus Vafadari <cyrus@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>, Boyang Chen <Boyang Chen <boyang@confluent.io>
Implemented KIP-507 to secure the internal Connect REST endpoints that are only for intra-cluster communication. A new V2 of the Connect subprotocol enables this feature, where the leader generates a new session key, shares it with the other workers via the configuration topic, and workers send and validate requests to these internal endpoints using the shared key.
Currently the internal `POST /connectors/<connector>/tasks` endpoint is the only one that is secured.
This change adds unit tests and makes some small alterations to system tests to target the new `sessioned` Connect subprotocol. A new integration test ensures that the endpoint is actually secured (i.e., requests with missing/invalid signatures are rejected with a 400 BAD RESPONSE status).
Author: Chris Egerton <chrise@confluent.io>
Reviewed: Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
Implemented KIP-521 to change the provided Connect Log4J configuration file to also write logs to files and to rotate them daily.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
Implemented KIP-495 to expose a new `admin/loggers` endpoint for the Connect REST API that lists the current log levels and allows the caller to change log levels.
Author: Arjun Satish <arjun@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
Instead of sending the leader's version and having older members try to blindly upgrade.
The only other real change here is that we will also set the VERSION_PROBING error code and return early from onAssignment when we are upgrading our used subscription version (not just downgrading it) since this implies the whole group has finished the rolling upgrade and all members should rejoin with the new subscription version.
Also piggy-backing on a fix for a potentially dangerous edge case, where every thread of an instance is assigned the same set of active tasks.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Previously we would log the following on each controller startup:
```
[2019-09-13 22:40:10,272] INFO [Controller id=2] DEPRECATED: Partitions being reassigned through ZooKeeper: Map()
```
This patch only logs the message if the map is non-empty.
Reviewers: Jason Gustafson <jason@confluent.io>
It add support to delete offsets in the `kafka-consumer-group`.
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: David Jacot <djacot@confluent.io>
Reviewers: Gwen Shapira
Closes#7362 from dajac/KAFKA-8901-delete-offsets-command
added shutdown for thread that triggers recording of RocksDBMetrics
added unit tests to verify the start and shutdown of the thread
refactored a bit of code
Reviewers: Christopher Pettitt <cpettitt@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Implemented KIP-481 by adding support for deserializing Connect DECIMAL values encoded in JSON as numbers, in addition to raw byte array (base64) format used previously.
Author: Almog Gavra <almog@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
A minor refactor to explicitly verify that Processor#close is only called once.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>,
shouldNotThrowExceptionOnRestoreWhenThereIsPreExistingRocksDbFiles takes
1m30s, which is too long for a unit test.
`RocksDBTimestampedStoreTest` inherits from `RocksDBStoreTest` and it's
implicitly considered an integration test too.
Reviewers: Guozhang Wang <guozhang@confluent.io>
As noted in the KIP-467, the updated ProduceResponse is
```
Produce Response (Version: 8) => [responses] throttle_time_ms
responses => topic [partition_responses]
topic => STRING
partition_responses => partition error_code base_offset log_append_time log_start_offset
partition => INT32
error_code => INT16
base_offset => INT64
log_append_time => INT64
log_start_offset => INT64
error_records => [INT32] // new field, encodes the relative offset of the records that caused error
error_message => STRING // new field, encodes the error message that client can use to log itself
throttle_time_ms => INT32
with a new error code:
```
INVALID_RECORD(86, "Some record has failed the validation on broker and hence be rejected.", InvalidRecordException::new);
Reviewers: Jason Gustafson <jason@confluent.io>, Magnus Edenhill <magnus@edenhill.se>, Guozhang Wang <wangguoz@gmail.com>
In addition to the existing metrics added in KAFKA-8609, add the total (cumulative) time spent during rebalances.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
A minor change in logic to account for repartition topics where we might not have the num partitions yet in the metadata.
Ran all existing tests plus all streams system tests.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
Detailed info is available in the ticket: https://issues.apache.org/jira/browse/KAFKA-8911
Briefly, implicit defs are calling empty constructors, which exists only for reflection object creation.
Therefore, while using the implicit definitons, a NPE occurs when Serde is called.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Replaced UpdateMetadata{Request, Response}, LeaderAndIsr{Request, Response}
and StopReplica{Request, Response} with the automated protocol classes.
Updated the JSON schema for the 3 request types to be more consistent and
less strict (if needed to avoid duplication).
The general approach is to avoid generating new collections in the request
classes. Normalization happens in the constructor to make this possible. Builders
still have to group by topic to maintain the external ungrouped view.
Introduced new tests for LeaderAndIsrRequest and UpdateMetadataRequest to
verify that the new logic is correct.
A few other clean-ups/fixes in code that was touched due to these changes:
* KAFKA-8956: Refactor DelayedCreatePartitions#updateWaiting to avoid modifying
collection in foreach.
* Avoid unnecessary allocation for state change trace logging if trace logging is not enabled
* Use `toBuffer` instead of `toList`, `toIndexedSeq` or `toSeq` as it generally performs
better and it matches the performance characteristics of `java.util.ArrayList`. This is
particularly important when passing such instances to Java code.
* Minor refactoring for clarity and readability.
* Removed usage of deprecated `/:`, unused imports and unnecessary `var`s.
* Include exception in `AdminClientIntegrationTest` failure message.
* Move StopReplicaRequest verification in `AuthorizerIntegrationTest` to the end
to match the comment.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
Profiling while benchmarking shows unnecessary calls to
`responseDataToLogString` in FetchSessionHandler when logging was set to
INFO level. This leads to 1.47% of the JVM CPU time going to this method.
Fix it by checking if debug logging is enabled.
Reviewers: Ismael Juma <ismael@juma.me.uk>
The Kafka Clients library includes a version file that contains its version and Git commit ID.
Since Kafka Streams wants to expose version and commit ID in the metrics it needs to read the version file. To enable the users to check during runtime for version mismatches between the Streams library and the Clients library, the version file is copied from Clients during build
time and during runtime only the Streams version file is read.
If Streams would read Clients' version file during runtime, it would read a wrong version and commit ID if the libraries where not build from repositories in different states.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
Previous KafkaStreamsTest takes 2min20s on my local laptop, because lots of its integration test which is producing / consuming records, and checking state directory file system takes lots of time. On the other hand, these tests should be well simplified with mocks.
This test reduces the test from a clumsy integration test class into a unit tests with mocks of its internal modules. And some other test functions should not be in KafkaStreamsTest actually and have been moved to other modular test classes. Now it takes 2s.
Also it helps removing the potential flakiness of the following (some of them are claimed resolved only because we have not seen them recently, but after looking at the test code I can verify they are still flaky):
* KAFKA-5818 (the original JIRA ticket indeed exposed a real issue that has been fixed, but the test itself remains flaky)
* KAFKA-6215
* KAFKA-7921
* KAFKA-7990
* KAFKA-8319
* KAFKA-8427
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Bruno Cadonna <bruno@confluent.io>