The client caches metadata fetched from Metadata requests. Previously, each metadata response overwrote all of the metadata from the previous one, so we could rely on the expectation that the broker only returned the leaderId for a partition if it had connection information available. This behavior changed with KIP-320 since having the leader epoch allows the client to filter out partition metadata which is known to be stale. However, because of this, we can no longer rely on the request-level guarantee of leader availability. There is no mechanism similar to the leader epoch to track the staleness of broker metadata, so we still overwrite all of the broker metadata from each response, which means that the partition metadata can get out of sync with the broker metadata in the client's cache. Hence it is no longer safe to validate inside the `Cluster` constructor that each leader has an associated `Node`
Fixing this issue was unfortunately not straightforward because the cache was built to maintain references to broker metadata through the `Node` object at the partition level. In order to keep the state consistent, each `Node` reference would need to be updated based on the new broker metadata. Instead of doing that, this patch changes the cache so that it is structured more closely with the Metadata response schema. Broker node information is maintained at the top level in a single collection and cached partition metadata only references the id of the broker. To accommodate this, we have removed `PartitionInfoAndEpoch` and we have altered `MetadataResponse.PartitionMetadata` to eliminate its `Node` references.
Note that one of the side benefits of the refactor here is that we virtually eliminate one of the hotspots in Metadata request handling in `MetadataCache.getEndpoints` (which was renamed to `maybeFilterAliveReplicas`). The only reason this was expensive was because we had to build a new collection for the `Node` representations of each of the replica lists. This information was doomed to just get discarded on serialization, so the whole effort was wasteful. Now, we work with the lower level id lists and no copy of the replicas is needed (at least for all versions other than 0).
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
A few references to KIP-559 in the schema definitions needed to be fixed.
Reviewers: Brajesh Kumar <bristy@users.noreply.github.com>, Ron Dagostino <rdagostino@confluent.io>, Jason Gustafson <jason@confluent.io>
Since ValueAndTimestampSerializer wraps an unknown Serializer, the output of that Serializer can be null. In which case the line
.allocate(rawTimestamp.length + rawValue.length)
will throw a NullPointerException.
This pull request returns null instead.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This PR is collaborated by Guozhang Wang and John Roesler. It is a significant tech debt cleanup on task management and state management, and is broken down by several sub-tasks listed below:
Extract embedded clients (producer and consumer) into RecordCollector from StreamTask.
guozhangwang#2
guozhangwang#5
Consolidate the standby updating and active restoring logic into ChangelogReader and extract out of StreamThread.
guozhangwang#3
guozhangwang#4
Introduce Task state life cycle (created, restoring, running, suspended, closing), and refactor the task operations based on the current state.
guozhangwang#6
guozhangwang#7
Consolidate AssignedTasks into TaskManager and simplify the logic of changelog management and task management (since they are already moved in step 2) and 3)).
guozhangwang#8
guozhangwang#9
Also simplified the StreamThread logic a bit as the embedded clients / changelog restoration logic has been moved into step 1) and 2).
guozhangwang#10
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>
When a follower's fetch offset is behind the leader's log start offset, the
follower will do a full log truncation. When it does so, it must update both
its log start offset and high watermark. The previous code did the former,
but not the latter. Failure to update the high watermark in this case can lead
to out of range errors if the follower becomes leader before getting the latest
high watermark from the previous leader. The out of range errors occur when
we attempt to resolve the log position of the high watermark in DelayedFetch
in order to determine if a fetch is satisfied.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>
* KAFKA-9074: Correct Connect’s `Values.parseString` to properly parse a time and timestamp literal
Time and timestamp literal strings contain a `:` character, but the internal parser used in the `Values.parseString(String)` method tokenizes on the colon character to tokenize and parse map entries. The colon could be escaped, but then the backslash character used to escape the colon is not removed and the parser fails to match the literal as a time or timestamp value.
This fix corrects the parsing logic to properly parse timestamp and time literal strings whose colon characters are either escaped or unescaped. Additional unit tests were added to first verify the incorrect behavior and then to validate the correction.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Chris Egerton <chrise@confluent.io>, Nigel Liang <nigel@nigelliang.com>, Jason Gustafson <jason@confluent.io>
Fixes NPE in brokers when processing record errors in produce response for older versions.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Also add support for flexible versions to both protocol types.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Colin Patrick McCabe <cmccabe@apache.org>
Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com>
Co-authored-by: Jason Gustafson <jason@confluent.io>
Refactors CheckpointFile such that buffers can be read in lieu of files. This
is a relatively simple refactoring as we already create a buffered reader over
the checkpoint file.
#6742, which improves the performance of the checkpointing code, requires
a similar refactoring of code, although it does go further than this.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
This PR implements the KIP-559: https://cwiki.apache.org/confluence/display/KAFKA/KIP-559%3A+Make+the+Kafka+Protocol+Friendlier+with+L7+Proxies
- it adds the Protocol Type and the Protocol Name fields in JoinGroup and SyncGroup API;
- it validates that the fields are provided by the client when the new version of the API is used and ensure that they are consistent. it errors out otherwise;
- it validates that the fields are consistent in the client and errors out otherwise;
- it adds many tests related to the API changes but also extends the testing coverage of the requests/responses themselves.
- it standardises the naming in the coordinator. now, `ProtocolType` and `ProtocolName` are used across the board in the coordinator instead of having a mix of protocol type, protocol name, subprotocol, protocol, etc.
Reviewers: Jason Gustafson <jason@confluent.io>
As the feature freeze approaches, we should support `2.5` as the inter.broker.protocol.version value. There are no new APIs so far, so `2.5` is effectively equivalent to `2.4`.
This PR implements `default.api.timeout.ms` as documented by KIP-533. This is a rebased version of #6913 with some additional test cases and small cleanups.
Reviewers: David Arthur <mumrah@gmail.com>
Co-authored-by: huxi <huxi_2b@hotmail.com>
Attempt to load multiple IBM classes but fallback on loading the Sun class if the IBM one is not found.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>
`emit.heartbeats.enabled` and `emit.checkpoints.enabled` are supposed to
be the knobs to control if the heartbeat message or checkpoint message
will be sent or not to the topics respectively. In our experiments,
setting them to false will not suspend the activity in their SourceTasks,
e.g. MirrorHeartbeatTask, MirrorCheckpointTask.
The observations are, when setting those knobs to false, huge volume of
`SourceRecord` are being sent without interval, causing significantly high
CPU usage and GC time of MirrorMaker 2 instance and congesting the single
partition of the heartbeat topic and checkpoint topic.
The proposed fix in the following PR is to (1) explicitly check if `interval`
is set to negative (e.g. -1), when the `emit.heartbeats.enabled` or
`emit.checkpoints.enabled` is off. (2) if `interval` is indeed set to negative,
no task is created.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ryanne Dolan <ryannedolan@gmail.com>
This feature corresponds to KIP-558 and extends how the internal status topic (set via `status.storage.topic` distributed worker config) is used to include information that allows Kafka Connect to keep track which topics a connector is using.
The set of topics a connector is actively using, is exposed via a new endpoint that is added to the REST API of Connect workers.
* A `GET /connectors/{name}/topics` request will return the set of topics that have been recorded as active since a connector started or since the set of topics was reset for this connector.
An additional endpoints allows users to reset the set of active topics for a connector via the second endpoint that this feature is adding:
* A `PUT /connectors/{name}/topics/reset` request clears the set of active topics. An operator may enable or disable this feature by setting `topic.tracking.enable` (true by default).
The `topic.tracking.enable` worker config property (true by default) allows an operator to enable/disable the entire feature. Or if the feature is enabled, the `topic.tracking.allow.reset` worker config property (true by default) allows an operator to control whether reset requests submitted to the Connect REST API are allowed.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
Also: Improve error message, Add test, Minor code quality fixes
Verified that the test fails if the broker default for max message bytes is lower or higher than the currently set value.
Reviewers: Andrew Choi <andchoi@linkedin.com>, Viktor Somogyi <viktorsomogyi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
* lz4: fixes identified by oss-fuzz
* jetty: fixes a few recent regressions
* powermock: better support for Java 12+
* zstd-jni: minor fixes
* httpclient: minor fixes
* spotless-plugin: minor fixes
* jmh: minor fixes
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Currently, when a dynamic change is made to the broker-level default log configuration, existing log configs will be recreated with an empty overridden configs. In such case, when updating dynamic broker configs a second round, the topic-level configs are lost. This can cause unexpected data loss, for example, if the cleanup policy changes from "compact" to "delete."
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
This ticket shall improve two aspects of the retrieval of sensors:
https://issues.apache.org/jira/browse/KAFKA-9152
Currently, when a sensor is retrieved with *Metrics.*Sensor() (e.g. ThreadMetrics.createTaskSensor()) after it was created with the same method *Metrics.*Sensor(), the sensor is added again to the corresponding queue in Sensors (e.g. threadLevelSensors) in StreamsMetricsImpl. Those queues are used to remove the sensors when removeAllLevelSensors() is called. Having multiple times the same sensors in this queue is not an issue from a correctness point of view. However, it would reduce the footprint to only store a sensor once in those queues.
When a sensor is retrieved, the current code attempts to create a new sensor and to add to it again the corresponding metrics. This could be avoided.
Both aspects could be improved by checking whether a sensor already exists by calling getSensor() on the Metrics object and checking the return value.
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Also addresses KAFKA-8821
Note that we still have to fall back to using pattern subscription if the user has added any regex-based source nodes to the topology. Includes some minor cleanup on the side
Reviewers: Bill Bejeck <bbejeck@gmail.com>
Previously the idempotent producer and transactional producer use separate logic when initializing the producerId. This patch consolidates the two paths. We also do some cleanup in `TransactionManagerTest` to eliminate brittle expectations on `Sender`.
Reviewers: Bob Barrett <bob.barrett@confluent.io>, Viktor Somogyi <viktorsomogyi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
This patch adds a new API to the producer to implement transactional offset commit fencing through the group coordinator as proposed in KIP-447. This PR mainly changes on the Producer end for compatible paths to old `sendOffsetsToTxn(offsets, groupId)` vs new `sendOffsetsToTxn(offsets, groupMetadata)`.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
This commit makes `DistributedHerder` log that some error has happened during task reconfiguration only when it actually has happened.
Author: Ivan Yurchenko <ivan0yurchenko@gmail.com>
Reviewer: Randall Hauch <rhauch@gmail.com>