This PR makes two changes to code in the ReplicaManager.updateFollowerFetchState path, which is in the hot path for follower fetches. Although calling ReplicaManager.updateFollowerFetch state is inexpensive on its own, it is called once for each partition every time a follower fetch occurs.
1. updateFollowerFetchState no longer calls maybeExpandIsr when the follower is already in the ISR. This avoid repeated expansion checks.
2. Partition.maybeIncrementLeaderHW is also in the hot path for ReplicaManager.updateFollowerFetchState. Partition.maybeIncrementLeaderHW calls Partition.remoteReplicas four times each iteration, and it performs a toSet conversion. maybeIncrementLeaderHW now avoids generating any intermediate collections when updating the HWM.
**Benchmark results for Partition.updateFollowerFetchState on a r5.xlarge:**
Old:
```
1288.633 ±(99.9%) 1.170 ns/op [Average]
(min, avg, max) = (1287.343, 1288.633, 1290.398), stdev = 1.037
CI (99.9%): [1287.463, 1289.802] (assumes normal distribution)
```
New (when follower fetch offset is updated):
```
261.727 ±(99.9%) 0.122 ns/op [Average]
(min, avg, max) = (261.565, 261.727, 261.937), stdev = 0.114
CI (99.9%): [261.605, 261.848] (assumes normal distribution)
```
New (when follower fetch offset is the same):
```
68.484 ±(99.9%) 0.025 ns/op [Average]
(min, avg, max) = (68.446, 68.484, 68.520), stdev = 0.023
CI (99.9%): [68.460, 68.509] (assumes normal distribution)
```
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
* log lock acquistion failures on the state store
* Document required uniqueness of state.dir path
* Move bunch of log calls around task state changes to DEBUG
* More readable log messages during partition assignment
Reviewers: Matthias J. Sax <mjsax@apache.org>, A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Given that the tests do not create clusters larger than 3, we do not gain much by creating 50 partitions for that topic. Reducing it should slightly increase test startup and shutdown speed.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
The streams config built.in.metrics.version is needed to add metrics in
a backward-compatible way. However, not in every location where metrics are
added a streams config is available to check built.in.metrics.version. Thus,
the config value needs to be exposed through the StreamsMetricsImpl object.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
This adds an administrative API to delete consumer offsets of a group as well as extends the mechanism to expire offsets of consumer groups.
It makes the group coordinator aware of the set of topics a consumer group (protocol type == 'consumer') is actively subscribed to, allowing offsets of topics which are not actively subscribed to by the group to be either expired or administratively deleted. The expiration rules remain the same.
For the other groups (non-consumer), the API allows to delete offsets when the group is empty and the expiration remains the same.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
Replace the `<table>` elements by `<ul>` so the full page width can be used for the configuration descriptions instead of only a very narrow column. I moved the other fields (Type, Default Value, etc) below each entry.
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
Key changes include:
1. Moves general offset limit updates down to StandbyTask.
2. Updates offsets for StandbyTask at most once per commit and only when we need and updated offset limit to make progress.
3. Avoids writing an 0 checkpoint when StandbyTask.update is called but we cannot apply any of the records.
4. Avoids going into a restoring state in the case that the last checkpoint is greater or equal to the offset limit (consumer committed offset). This needs special attention please. Code is in
StoreChangelogReader.
5. Does update offset limits initially for StreamTask because it provides a way to prevent playing to many records from the changelog (also the input topic with optimized topology).
NOTE: this PR depends on KAFKA-8816, which is under review separately. Fortunately the changes involved are few. You can focus just on the KAFKA-8755 commit if you prefer.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
The purpose of this PR is to add static membership support for range assignor. More details for the motivation in here.
Similar to round robin assignor, if we are capable of persisting member identity across generations, we will reach a much more stable assignment.
Reviewers: John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <bruno@confluent.io>
If the topic already exists, `handleCreateTopicsRequest` should return TopicExistsException even given an invalid config (replication factor for instance).
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
Implement the revisions to the controller state machine and reassignment logic needed for KIP-455.
Add the addingReplicas and removingReplicas field to the topics ZNode.
Deprecate the methods initiating a reassignment via direct ZK access in KafkaZkClient.
Add ControllerContextTest, and add some test cases to ReassignPartitionsClusterTest.
Add a note to upgrade.html recommending not initiating reassignments during an upgrade.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>
When we have an unhandled exception in the request handler, we print some details about the request such as the api key and payload. It is also useful to see the version of the request which is not always apparent from the request payload.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
I bumped into this flaky test while working on another PR. It's a bit different from the reported PR, where it actually timed out at parsing localhost:port already. I think what we should really test is that the closing call can complete, so I removed the whole test timeout and add the timeout for the shutdown latch instead.
Reviewers: Jason Gustafson <jason@confluent.io>, cpettitt-confluent <53191309+cpettitt-confluent@users.noreply.github.com>
This patch adds an atomic counter in the test to ensure we have processed all the events before we assert the metrics. There was a race condition with the previous assertion, which asserted that the event queue is empty before checking the metrics.
Reviewers: Jason Gustafson <jason@confluent.io>
The previous approach to testing KAFKA-8412 was to look at the logs and
determine if an error occurred during close. There was no direct way to
detect than an exception occurred because the exception was eaten in
AssignedTasks.close. In the PR for that ticket (#7207) it was
acknowledged that this was a brittle way to test for the exception. We
now see occasional failures because an unrelated ERROR level log entry
is made while closing the task.
This change eliminates the brittle log checking by rethrowing any time
an exception occurs in close, even when a subsequent unclean close
succeeds. This has the potential benefit of uncovering other supressed
exceptions down the road.
I've verified that even with us rethrowing on closeUnclean that all
tests pass.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
We need stacktrace of the error to understand the root cause and to trouble shoot the underlying problem.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This PR adds supporting features for static membership. It could batch remove consumers from the group with provided group.instance.id list.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This change adds a command line option to the `ducker-ak up' command to enable exposing ports from docker containers. The exposed ports will be mapped to the ephemeral ports on the host. The option is called `expose-ports' and can take either a single value (like 5005) or a range (like 5005-5009). This port will then exposed from each docker container that ducker-ak sets up.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@users.noreply.github.com>
This creates a test that generates sustained connections against Kafka. There
are three different components we can stress with this, KafkaConsumer,
KafkaProducer, and AdminClient. This test tries use minimal bandwidth per
connection to reduce overhead impacts.
This test works by creating a threadpool that creates connections and then
maintains a central pool of connections at a specified keepalive rate. The
keepalive action varies by which component is being stressed:
* KafkaProducer: Sends one single produce record. The configuration for
the produce request uses the same key/value generator as the ProduceBench
test.
* KafkaConsumer: Subscribes to a single partition, seeks to the end, and
then polls a minimal number of records. Each consumer connection is its
own consumer group, and defaults to 1024 bytes as FETCH_MAX_BYTES to keep
traffic to a minimum.
* AdminClient: Makes an API call to get the nodes in the cluster.
NOTE: This test is designed to be run alongside a ProduceBench test for a
specific topic, due to the way the Consumer test polls a single partition.
There may be no data returned by the consumer test if this is run on its own.
The connection should still be kept alive, but with no data returned.
Author: Scott Hendricks <scott.hendricks@confluent.io>
Reviewers: Stanislav Kozlovski, Gwen Shapira
Closes#7289 from scott-hendricks/trunk
Unexpected exceptions are caught during topic deletion in `TopicCommand`. The caught exception is currently lost and we raise `AdminOperationException`. This patch fixes the problem by chaining the caught exception.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This patch fixes a couple problems with logging of request/response objects which include records data. First, it adds a missing `toString` to `LazyDownConversionRecords`. Second, it changes the `toString` of `MemoryRecords` to not print record-level information. This was always a dangerous practice, but it was especially bad when these objects ended up in request logs. With this patch, implementations use `toString` only to print summary details.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This PR changes the TxnOffsetCommit protocol to auto-generated types, and add more unit test coverage to the plain OffsetCommit protocol.
Reviewers: Jason Gustafson <jason@confluent.io>
The `CompletedFetch` and `PartitionRecords` types are somewhat redundant. This PR consolidates the two types.
Reviewers: Jason Gustafson <jason@confluent.io>
* Leader instance uses dictionary encoding on the wire to send topic partitions
* Topic names (most expensive component) are mapped to an integer using the dictionary
* Follower instances receive the dictionary, decode topic names back
* Purely an on-the-wire optimization, no in-memory structures changed
* Test case added for version 5 AssignmentInfo
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This patch contains a few improvements on the offset range handling when computing the cleanable range of offsets.
1. It adds bounds checking to ensure the dirty offset cannot be larger than the log end offset. If it is, we reset to the log start offset.
2. It adds a method to get the non-active segments in the log while holding the lock. This ensures that a truncation cannot lead to an invalid segment range.
3. It improves exception messages in the case that an inconsistent segment range is provided so that we have more information to find the root cause.
The patch also fixes a few problems in `LogCleanerManagerTest` due to unintended reuse of the underlying log directory.
Reviewers: Vikas Singh <soondenana@users.noreply.github.com>, Jun Rao <junrao@gmail.com>
Right now we only have very generic FailedProduceRequestsPerSec and FailedFetchRequestsPerSec metrics that mark whenever a record is failed on the broker side. To improve the debugging UX, I added 4 new metrics in BrokerTopicStats to log various scenarios when an InvalidRecordException is thrown when LogValidator fails to validate a record:
-- NoKeyCompactedTopicRecordsPerSec: counter of failures by compacted records with no key
-- InvalidMagicNumberRecordsPerSec: counter of failures by records with invalid magic number
-- InvalidMessageCrcRecordsPerSec: counter of failures by records with crc corruption
-- NonIncreasingOffsetRecordsPerSec: counter of failures by records with invalid offset
Reviewers: Robert Yokota <rayokota@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
This patch ensures that relevant system configurations are included in exception messages when zk security validation fails.
Reviewers: Vikas Singh <soondenana@users.noreply.github.com>, José Armando García Sancio <jsancio@users.noreply.github.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Added unit test for recent fix of `KafkaConfigBackingStore`.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
Corrected the `KafkaConfigBackingStore` logic to notify of only the changed tasks, rather than all tasks. This was not noticed before because Connect always stopped and restarted all tasks during a rebalanced, but since 2.3 the incremental rebalance logic exposed this bug.
Author: Luying Liu <lyliu@lyliu-mac.freewheelmedia.net>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
New Java Authorizer API and a new out-of-the-box authorizer (AclAuthorizer) that implements the new interface.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Fix a bug where ClientCompatibilityFeaturesTest fails when running multiple iterations.
Also, fix a typo in tests/docker/Dockerfile.
Reviewers: Ismael Juma <ismael@juma.me.uk>