As noted in the KIP-467, the updated ProduceResponse is
```
Produce Response (Version: 8) => [responses] throttle_time_ms
responses => topic [partition_responses]
topic => STRING
partition_responses => partition error_code base_offset log_append_time log_start_offset
partition => INT32
error_code => INT16
base_offset => INT64
log_append_time => INT64
log_start_offset => INT64
error_records => [INT32] // new field, encodes the relative offset of the records that caused error
error_message => STRING // new field, encodes the error message that client can use to log itself
throttle_time_ms => INT32
with a new error code:
```
INVALID_RECORD(86, "Some record has failed the validation on broker and hence be rejected.", InvalidRecordException::new);
Reviewers: Jason Gustafson <jason@confluent.io>, Magnus Edenhill <magnus@edenhill.se>, Guozhang Wang <wangguoz@gmail.com>
Replaced UpdateMetadata{Request, Response}, LeaderAndIsr{Request, Response}
and StopReplica{Request, Response} with the automated protocol classes.
Updated the JSON schema for the 3 request types to be more consistent and
less strict (if needed to avoid duplication).
The general approach is to avoid generating new collections in the request
classes. Normalization happens in the constructor to make this possible. Builders
still have to group by topic to maintain the external ungrouped view.
Introduced new tests for LeaderAndIsrRequest and UpdateMetadataRequest to
verify that the new logic is correct.
A few other clean-ups/fixes in code that was touched due to these changes:
* KAFKA-8956: Refactor DelayedCreatePartitions#updateWaiting to avoid modifying
collection in foreach.
* Avoid unnecessary allocation for state change trace logging if trace logging is not enabled
* Use `toBuffer` instead of `toList`, `toIndexedSeq` or `toSeq` as it generally performs
better and it matches the performance characteristics of `java.util.ArrayList`. This is
particularly important when passing such instances to Java code.
* Minor refactoring for clarity and readability.
* Removed usage of deprecated `/:`, unused imports and unnecessary `var`s.
* Include exception in `AdminClientIntegrationTest` failure message.
* Move StopReplicaRequest verification in `AuthorizerIntegrationTest` to the end
to match the comment.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
Add a version number to request and response headers. The header
version is determined by the first two 16 bit fields read (API key and
API version). For now, ControlledShutdown v0 has header version 0, and
all other requests have v1. Once KIP-482 is implemented, there will be
a v2 of the header which supports tagged fields.
1. Add the overloaded functions.
2. Update the code in Streams to use the batch API for better latency (this applies to both active StreamsTask for initialize the offsets, as well as the StandbyTasks for updating offset limits).
3. Also update all unit test to replace the deprecated APIs.
Reviewers: Christopher Pettitt <cpettitt@confluent.io>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Bill Bejeck <bill@confluent.io>
I realized some flaky tests failed at setup or calls that tries to create offset topics, and I think using one partition and one replica would be sufficient in these cases.
Reviewers: Bill Bejeck <bill@confluent.io>
KIP-455 (18d4e57f6e8c67ffa7937fc855707d3a03cc165a) bumped the LeaderAndIsr version to 3 but did not change the Controller code to actually send the new version. The ControllerChannelManagerTest had a bug which made it assert wrongly, hence why it did not catch it. This patch fixes said test.
Because the new fields in LeaderAndIsr are not used yet, the gap was not caught by integration tests either.
Reviewers: Jason Gustafson <jason@confluent.io>
It's useful to know when the cleaner runs what the last modified time
of the segment and the deletion horizon is. The current log message
only allows you to infer that one is greater than the other.
Reviewers: Jun Rao <junrao@gmail.com>
This PR makes two changes to code in the ReplicaManager.updateFollowerFetchState path, which is in the hot path for follower fetches. Although calling ReplicaManager.updateFollowerFetch state is inexpensive on its own, it is called once for each partition every time a follower fetch occurs.
1. updateFollowerFetchState no longer calls maybeExpandIsr when the follower is already in the ISR. This avoid repeated expansion checks.
2. Partition.maybeIncrementLeaderHW is also in the hot path for ReplicaManager.updateFollowerFetchState. Partition.maybeIncrementLeaderHW calls Partition.remoteReplicas four times each iteration, and it performs a toSet conversion. maybeIncrementLeaderHW now avoids generating any intermediate collections when updating the HWM.
**Benchmark results for Partition.updateFollowerFetchState on a r5.xlarge:**
Old:
```
1288.633 ±(99.9%) 1.170 ns/op [Average]
(min, avg, max) = (1287.343, 1288.633, 1290.398), stdev = 1.037
CI (99.9%): [1287.463, 1289.802] (assumes normal distribution)
```
New (when follower fetch offset is updated):
```
261.727 ±(99.9%) 0.122 ns/op [Average]
(min, avg, max) = (261.565, 261.727, 261.937), stdev = 0.114
CI (99.9%): [261.605, 261.848] (assumes normal distribution)
```
New (when follower fetch offset is the same):
```
68.484 ±(99.9%) 0.025 ns/op [Average]
(min, avg, max) = (68.446, 68.484, 68.520), stdev = 0.023
CI (99.9%): [68.460, 68.509] (assumes normal distribution)
```
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
This adds an administrative API to delete consumer offsets of a group as well as extends the mechanism to expire offsets of consumer groups.
It makes the group coordinator aware of the set of topics a consumer group (protocol type == 'consumer') is actively subscribed to, allowing offsets of topics which are not actively subscribed to by the group to be either expired or administratively deleted. The expiration rules remain the same.
For the other groups (non-consumer), the API allows to delete offsets when the group is empty and the expiration remains the same.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
Replace the `<table>` elements by `<ul>` so the full page width can be used for the configuration descriptions instead of only a very narrow column. I moved the other fields (Type, Default Value, etc) below each entry.
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
If the topic already exists, `handleCreateTopicsRequest` should return TopicExistsException even given an invalid config (replication factor for instance).
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
Implement the revisions to the controller state machine and reassignment logic needed for KIP-455.
Add the addingReplicas and removingReplicas field to the topics ZNode.
Deprecate the methods initiating a reassignment via direct ZK access in KafkaZkClient.
Add ControllerContextTest, and add some test cases to ReassignPartitionsClusterTest.
Add a note to upgrade.html recommending not initiating reassignments during an upgrade.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>
When we have an unhandled exception in the request handler, we print some details about the request such as the api key and payload. It is also useful to see the version of the request which is not always apparent from the request payload.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This patch adds an atomic counter in the test to ensure we have processed all the events before we assert the metrics. There was a race condition with the previous assertion, which asserted that the event queue is empty before checking the metrics.
Reviewers: Jason Gustafson <jason@confluent.io>
This PR adds supporting features for static membership. It could batch remove consumers from the group with provided group.instance.id list.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Unexpected exceptions are caught during topic deletion in `TopicCommand`. The caught exception is currently lost and we raise `AdminOperationException`. This patch fixes the problem by chaining the caught exception.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This PR changes the TxnOffsetCommit protocol to auto-generated types, and add more unit test coverage to the plain OffsetCommit protocol.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch contains a few improvements on the offset range handling when computing the cleanable range of offsets.
1. It adds bounds checking to ensure the dirty offset cannot be larger than the log end offset. If it is, we reset to the log start offset.
2. It adds a method to get the non-active segments in the log while holding the lock. This ensures that a truncation cannot lead to an invalid segment range.
3. It improves exception messages in the case that an inconsistent segment range is provided so that we have more information to find the root cause.
The patch also fixes a few problems in `LogCleanerManagerTest` due to unintended reuse of the underlying log directory.
Reviewers: Vikas Singh <soondenana@users.noreply.github.com>, Jun Rao <junrao@gmail.com>
Right now we only have very generic FailedProduceRequestsPerSec and FailedFetchRequestsPerSec metrics that mark whenever a record is failed on the broker side. To improve the debugging UX, I added 4 new metrics in BrokerTopicStats to log various scenarios when an InvalidRecordException is thrown when LogValidator fails to validate a record:
-- NoKeyCompactedTopicRecordsPerSec: counter of failures by compacted records with no key
-- InvalidMagicNumberRecordsPerSec: counter of failures by records with invalid magic number
-- InvalidMessageCrcRecordsPerSec: counter of failures by records with crc corruption
-- NonIncreasingOffsetRecordsPerSec: counter of failures by records with invalid offset
Reviewers: Robert Yokota <rayokota@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
This patch ensures that relevant system configurations are included in exception messages when zk security validation fails.
Reviewers: Vikas Singh <soondenana@users.noreply.github.com>, José Armando García Sancio <jsancio@users.noreply.github.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
New Java Authorizer API and a new out-of-the-box authorizer (AclAuthorizer) that implements the new interface.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
- Call `assertNoNonDaemonThreads` in test method instead of tear down method
to avoid situation where parent's class tear down is not invoked.
- Pass the thread prefix in tests that call `assertNoNonDaemonThreads` so that it
works correctly.
- Rename `verifyNonDaemonThreadsStatus` to `assertNoNonDaemonThreads` to
make it clear that it may throw.
Reviewers: Anna Povzner <anna@confluent.io>, Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
This patch fixes the quota system test whose JMX tool relies on the existence
of these metrics.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Nikhil Bhatia <nikhil@confluent.io>, Tu V. Tran <tuvtran97@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Make sure to show the message key, even when the message value is null.
This changes the output of one of the tools. Is the output of the tool considered a public API? Does this need a discussion or a KIP?
Testing: Ran the tool on a compacted topic. Previously, the tool did not show any message keys for tombstone messages (messages where the value is null). Now, the tool shows message keys.
Reviewers: Mickael Maison <mimaison@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
This is the implementation for [KIP-503](https://cwiki.apache.org/confluence/display/KAFKA/KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion)
When deleting a large number of topics, the Controller can get quite bogged down. One problem with this is the lack of visibility into the progress of the Controller. We can look into the ZK path for topics marked for deletion, but in a production environment this is inconvenient. This PR adds a JMX metric `kafka.controller:type=KafkaController,name=TopicsToDeleteCount` to make it easier to see how many topics are being deleted.
Reviewers: Stanislav Kozlovski <stanislav@confluent.io>, Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
`testProduceAfterLogDirFailureOnLeader` currently disables producer retries in
order to catch and validate the exception thrown by a failure, and then tries to
produce successfully once the leadership changes. This second produce can
intermittently fail, causing test flakiness. This patch splits these validations
into two tests in order to allow retries for the produce request after the
leadership change.
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Ismael Juma <ismael@juma.me.uk>
If retrieving metadata during `poll(Duration)` repeatedly takes longer than the
poll timeout, we don't make progress and `waitUntilTrue` eventually times out
causing the test to fail. This behaviour differs from the older `poll(long)` that
would block until metadata retrieval had completed. The flakiness was likely
introduced when we switched from the latter to the former.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Refactors the DescribeDelegationToken to use the generated RPC classes.
Author: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Author: Viktor Somogyi <viktorsomogyi@gmail.com>
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#7154 from viktorsomogyi/refactor-describe-dt
This patch updates ConsumerGroupCommandTest.scala to use the maximum possible number of AdminClient retries. The test runs will still be bounded by the request timeout. This address flakiness in tests such as testResetOffsetsNotExistingGroup and testResetOffsetsExistingTopic, which was caused by group coordinators being intermittently unavailable.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Update the javadoc of SockerServer#startup(). SocketServer#startProcessors() does not exist any more and it has been replaced by SocketServer#startDataPlaneProcessors() and SocketServer#startControlPlaneProcessor().
Reviewers: Jason Gustafson <jason@confluent.io>
This patch stores `clusterId` in the `meta.properties` file. During startup, the broker checks that it joins the correct cluster and fails fast otherwise.
The `meta.properties' is versioned. I have decided to not bump the version because 1) the clusterId is null anyway if not present in the file; and 2) bumping it means that rolling back to a previous version won't work.
I have refactored the way the metadata is read and written as it was strongly coupled with the brokerId bits. Now, the metadata is read independently during the startup and used to 1) check the clusterId and 2) get or generate the brokerId (as before).
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
The assignReplicasToBrokers method has helpful but a large unformatted
javadoc comment that results in a big blob in generated html. This
change formats the comment so that generated javadoc is nice.
Reviewers: Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>