For metadata request version 6 and above, use a different error code to indicate missing listener on leader broker to enable diagnosis of listener configuration issues.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Also:
- Remove exceptions in `kafka.common` that are no longer used.
- Keep `kafka.common.KafkaException` as it's still used by `ZkUtils`,
`kafka.admin.AdminClient` and `kafka.security.auth` classes and
we would like to maintain compatibility for now.
- Add deprecated annotation to `kafka.admin.AdminClient`. The scaladoc
stated that the class is deprecated, but the annotation was missing.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
A broker with multiple log dirs will die on startup if dir.getCanonicalPath() throws
IOException for one of the log dirs. We should mark such log directory as offline
instead and the broker should start if there is a healthy log dir.
Reviewers: Ismael Juma <ismael@juma.me.uk>
This patch fixes the following issues in the log splitting logic added to address KAFKA-6264:
1. We were not handling the case when all messages in the segment overflowed the index. In this case, there is only one resulting segment following the split.
2. There was an off-by-one error in the recovery logic when completing a swap operation which caused an unintended segment deletion.
Additionally, this patch factors out of `splitOverflowedSegment` a method to write to a segment using from with an instance of `FileRecords`. This allows for future reuse and isolated testing.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
- Removed Scala consumers (`SimpleConsumer` and `ZooKeeperConsumerConnector`)
and their tests.
- Removed Scala request/response/message classes.
- Removed any mention of new consumer or new producer in the code
with the exception of MirrorMaker where the new.consumer option was
never deprecated so we have to keep it for now. The non-code
documentation has not been updated either, that will be done
separately.
- Removed a number of tools that only made sense in the context
of the Scala consumers (see upgrade notes).
- Updated some tools that worked with both Scala and Java consumers
so that they only support the latter (see upgrade notes).
- Removed `BaseConsumer` and related classes apart from `BaseRecord`
which is used in `MirrorMakerMessageHandler`. The latter is a pluggable
interface so effectively public API.
- Removed `ZkUtils` methods that were only used by the old consumers.
- Removed `ZkUtils.registerBroker` and `ZKCheckedEphemeral` since
the broker now uses the methods in `KafkaZkClient` and no-one else
should be using that method.
- Updated system tests so that they don't use the Scala consumers except
for multi-version tests.
- Updated LogDirFailureTest so that the consumer offsets topic would
continue to be available after all the failures. This was necessary for it
to work with the Java consumer.
- Some multi-version system tests had not been updated to include
recently released Kafka versions, fixed it.
- Updated findBugs and checkstyle configs not to refer to deleted
classes and packages.
Reviewers: Dong Lin <lindong28@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Support configuration of dynamic broker configs in ZooKeeper before starting brokers using ConfigCommand. This will allow password configs to be encrypted and stored in ZooKeeper, without requiring clear passwords in server.properties to bootstrap the broker first.
Reviewers: Jason Gustafson <jason@confluent.io>
The initial PR for KIP-290 #5117 added a new `ResourceNameType`, which was initially a field on `Resource` and `ResourceFilter`. However, follow on PRs have now moved the name type fields to new `ResourcePattern` and `ResourcePatternFilter` classes. This means the old name is no longer valid and may be confusing. The PR looks to rename the class to a more intuitive `resource.PatternType`.
@cmccabe also requested that the current `ANY` value for this class be renamed to avoid confusion. `PatternType.ANY` currently causes `ResourcePatternFilter` to bring back all ACLs that would affect the supplied resource, i.e. it brings back literal, wildcard ACLs, and also does pattern matching to work out which prefix acls would affect the resource. This is very different from the behaviour of `ResourceType.ANY`, which just means the filter ignores the type of resources.
`ANY` is to be renamed to `MATCH` to disambiguate it from other `ANY` filter types. A new `ANY` will be added that works in the same way as others, i.e. it will cause the filter to ignore the pattern type, (but won't do any pattern matching).
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Jun Rao <junrao@gmail.com>
This does the minimal amount of work so that the tool
relies on public non-deprecated APIs (i.e. it no longer
relies on Scala clients code).
Additional improvements (not included here) have
been proposed via KIP-308.
There are a few other PRs that touch this class with
overlapping goals:
- https://github.com/apache/kafka/pull/2891
- https://github.com/apache/kafka/pull/3051
- https://github.com/apache/kafka/pull/3320
One of them remains relevant in the context of KIP-308, but
the others have been superseded. I included the authors of
the 3 PRs as co-authors.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Vahid Hashemian <vahidhashemian@us.ibm.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Co-authored-by: Arseniy Tashoyan <tashoyan@gmail.com>
Co-authored-by: Vahid Hashemian <vahidhashemian@us.ibm.com>
Co-authored-by: Mohammed Amine GARMES
Co-authored-by: Ismael Juma <ismael@juma.me.uk>
It is possible that log start offset may fall in the middle of the batch after AdminClient#deleteRecords(). This will cause a follower starting from log start offset to fail fetching (all records). Use-cases when a follower will start fetching from log start offset includes: 1) new replica due to partition re-assignment; 2) new local replica created as a result of AdminClient#AlterReplicaLogDirs(); 3) broker that was down for some time while AdminClient#deleteRecords() move log start offset beyond its HW.
Added two integration tests:
1) Produce and then AdminClient#deleteRecords() while one of the followers is down, and then restart of the follower requires fetching from log start offset;
2) AdminClient#AlterReplicaLogDirs() after AdminClient#deleteRecords()
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
For tests that are not testing the old consumers functionality. As part of this,
consolidate `TopicMetadataTest` into `MetadataRequestTest`. Finally,
remove `ProducerBounceTest` which has no tests left in it.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch changes the default `request.timeout.ms` of the consumer to 30 seconds. Additionally, it adds logic to `NetworkClient` and related to components to support timeouts at the request level. We use this to handle the special case of the JoinGroup request, which may block for as long as the value configured by `max.poll.interval.ms`.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <guozhang@confluent.io>
- Update some tests to use the Java consumer.
- Remove ignored `ProducerBounceTest.testBrokerFailure`. This test
is flaky and it has been superseded by `TransactionBounceTest`.
- Use non-blocking poll for consumption methods in `TestUtils`.
This is a step on the road to remove the old consumers.
A bug in the original KIP-277 submission was caught during code review,
but it was not detected by the tests. Fix that gap.
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Keep Literal ACLs on the old paths, using the old formats, to maintain backwards compatibility.
Have Prefixed, and any latter types, go on new paths, using JSON, (old brokers are not aware of them)
Add checks to reject any adminClient requests to add prefixed acls before the cluster is fully upgraded.
Colin Patrick McCabe <colin@cmccabe.xyz>, Jun Rao <junrao@gmail.com>
It takes O(n^2) time to instantiate a mbean with n attributes which can be very slow if the number of attributes of this mbean is large. This PR removes metrics whose number of attributes can grow with the number of partitions in the cluster to fix the performance issue. These metrics have already been marked for removal in 2.0 by KIP-225.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#5172 from lindong28/remove-deprecated-metrics
This is a follow-on change requested as part of the initial PR for KIP-290 #5117. @cmccabe requested that the `resource.Resource` class be factored out in favour of `ConfigResource` to avoid confusion between all the `Resource` implementations.
Colin Patrick McCabe <colin@cmccabe.xyz>, Jun Rao <junrao@gmail.com>
Currently, a throttled fetch response is returned with INVALID_SESSION_ID, which causes dropping the current fetch session if incremental fetch is in progress. This patch fixes this by returning the correct session id.
Author: Jon Lee <jonlee@linkedin.com>
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Dong Lin <lindong28@gmail.com>
Closes#5164 from jonlee2/KAFKA-6946
This patch adds logic to detect and fix segments which have overflowed offsets as a result of bugs in older versions of Kafka.
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
remove duplicate Scala ResourceNameType in preference to in preference to Java ResourceNameType.
This is follow on work for KIP-290 and PR #5117, which saw the Scala ResourceNameType class introduced.
I've added tests to ensure AclBindings can't be created with ResourceNameType.ANY or UNKNOWN.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Jun Rao <junrao@gmail.com>
The initial PR for KIP-290 #5117 added a `ResourceNameType` field to the Java and Scala `Resource` classes to introduce the concept of Prefixed ACLS. This does not make a lot of sense as these classes are meant to represent cluster resources, which would not have a concept of 'name type'. This work has not been released yet, so we have time to change it.
This PR looks to refactor the code to remove the name type field from the Java `Resource` class. (The Scala one will age out once KIP-290 is done, and removing it would involve changes to the `Authorizer` interface, so this class was not touched).
This is achieved by replacing the use of `Resource` with `ResourcePattern` and `ResourceFilter` with `ResourceFilterPattern`. A `ResourcePattern` is a combination of resource type, name and name type, where each field needs to be defined. A `ResourcePatternFilter` is used to select patterns during describe and delete operations.
The adminClient uses `AclBinding` and `AclBindingFilter`. These types have been switched over to use the new pattern types.
The AclCommands class, used by Kafka-acls.sh, has been converted to use the new pattern types.
The result is that the original `Resource` and `ResourceFilter` classes are not really used anywhere, except deprecated methods. However, the `Resource` class will be used if/when KIP-50 is done.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Jun Rao <junrao@gmail.com>
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Jun Rao <junrao@gmail.com>
Co-authored-by: Piyush Vijay <pvijay@apple.com>
Co-authored-by: Andy Coates <big-andy-coates@users.noreply.github.com>
- CreateTopicsRequest now requires Create auth on Topic resource
or Create on Cluster resource.
- AclCommand --producer option adjusted
- Existing unit and Integration tests adjusted accordingly and
new tests added.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Vahid Hashemian <vahidhashemian@us.ibm.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
PrincipalBuilder implementations can now take the listener into account
when creating the Principal. This is especially interesting in deployments
where inter-broker traffic is on a different listener than client traffic or
when the same protocol is used by multiple listeners.
The change in itself is mostly "plumbing" as the listener name needs to be
passed from ChannelBuilders all the way down to all classes implementing
AuthenticationContext.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Adding checks on "version" field for tools using it.
This is a new version of the closed PR #3887 (to see for more comments and related discussion).
Author: Paolo Patierno <ppatierno@live.com>
Reviewers: Dong Lin <lindong28@gmail.com>
Closes#5126 from ppatierno/kafka-5919-update
We added logic to reassign nodes in callToSend after a connection failure, but we do not handle the case when there is no node currently available to reassign the request to. This can happen when using MetadataUpdateNodeIdProvider if all of the known nodes are blacked out awaiting the retry backoff. To fix this, we need to ensure that the call is added to pendingCalls if a new node cannot be found.
This patch implements KIP-281, which adds a configurable timeout to the consumer performance tool with a default value of 10 seconds. The old timeout was hard-coded as 1 second. Additionally, this patch adds a warning message when the tool exits after a timeout rather than returning silently.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Jason Gustafson <jason@confluent.io>
- Removed internal kafka.admin.AdminClient.deleteRecordsBefore since it's
no longer used.
- Removed redundant tests and rewrote non redundant ones to use the Java
AdminClient.
Reviewers: Viktor Somogyi <viktor.somogyi@cloudera.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Specifying an invalid config (i.e. something other than `CreateTime` or
`LogAppendTime`) via `TopicCommand` would previously cause the
broker to fail on start-up.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Implementation for lazy down-conversion in a chunked manner for efficient memory usage during down-conversion. This pull request is mainly to get initial feedback on the direction of the patch. The patch includes all the main components from KIP-283.
Reviewers: Jason Gustafson <jason@confluent.io>
This implements KIP-219, where a broker returns a response with throttle time on
quota violation immediately after processing the corresponding request. After
the response is sent out, the broker will keep the channel muted until the
throttle time is over. Also, on receiving a response with throttle time, client
will block outgoing communication to the broker for the specified throttle time.
See PR 4830, 5064 and 5094 for all the review history
Author: Jon Lee <jonlee@jonlee-ld1.linkedin.biz>
Reviewers: Jun Rao <junrao@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>, Dong Lin <lindong28@gmail.com>
Closes#5064 from jonlee2/kip-219
KafkaController currently writes reassignment znode once for every partition that has been successfully reassigned. This is unnecessary and controller should be able to update reassignment znode once to remove all partitions that have been reassigned from the reassignment znode.
Author: Dong Lin <dolin@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4659 from lindong28/KAFKA-6617
Refresh metadata if broker connection fails so that new calls are sent only to nodes that are alive and requests to controller are sent to the new controller if controller changes due to broker failure. Also reassign calls that could not be sent.
Reviewers: Dong Lin <lindong28@gmail.com>, Jason Gustafson <jason@confluent.io>
Document cases where `IllegalStateException` is raised when attempting an invalid operation on an unassigned partition. Also change `position()` to raise `IllegalStateException` when called on an unassigned partition for consistency.
Add the new stricter-timeout version of `poll` proposed in KIP-266.
The pre-existing variant `poll(long timeout)` would block indefinitely for metadata
updates if they were needed, then it would issue a fetch and poll for `timeout` ms
for new records. The initial indefinite metadata block caused applications to become
stuck when the brokers became unavailable. The existence of the timeout parameter
made the indefinite block especially unintuitive.
This PR adds `poll(Duration timeout)` with the semantics:
1. iff a metadata update is needed:
1. send (asynchronous) metadata requests
2. poll for metadata responses (counts against timeout)
- if no response within timeout, **return an empty collection immediately**
2. if there is fetch data available, **return it immediately**
3. if there is no fetch request in flight, send fetch requests
4. poll for fetch responses (counts against timeout)
- if no response within timeout, **return an empty collection** (leaving async fetch request for the next poll)
- if we get a response, **return the response**
The old method, `poll(long timeout)` is deprecated, but we do not change its semantics, so it remains:
1. iff a metadata update is needed:
1. send (asynchronous) metadata requests
2. poll for metadata responses *indefinitely until we get it*
2. if there is fetch data available, **return it immediately**
3. if there is no fetch request in flight, send fetch requests
4. poll for fetch responses (counts against timeout)
- if no response within timeout, **return an empty collection** (leaving async fetch request for the next poll)
- if we get a response, **return the response**
One notable usage is prohibited by the new `poll`: previously, you could call `poll(0)` to block for metadata updates, for example to initialize the client, supposedly without fetching records. Note, though, that this behavior is not according to any contract, and there is no guarantee that `poll(0)` won't return records the first time it's called. Therefore, it has always been unsafe to ignore the response.
This KIP adds the following functionality related to SASL/OAUTHBEARER:
1) Allow clients (both brokers when SASL/OAUTHBEARER is the inter-broker protocol as well as non-broker clients) to flexibly retrieve an access token from an OAuth 2 authorization server based on the declaration of a custom login CallbackHandler implementation and have that access token transparently and automatically transmitted to a broker for authentication.
2) Allow brokers to flexibly validate provided access tokens when a client establishes a connection based on the declaration of a custom SASL Server CallbackHandler implementation.
3) Provide implementations of the above retrieval and validation features based on an unsecured JSON Web Token that function out-of-the-box with minimal configuration required (i.e. implementations of the two types of callback handlers mentioned above will be used by default with no need to explicitly declare them).
4) Allow clients (both brokers when SASL/OAUTHBEARER is the inter-broker protocol as well as non-broker clients) to transparently retrieve a new access token in the background before the existing access token expires in case the client has to open new connections.
consumer offset path in zookeeper should be /consumers/${group}/offsets/${topic}/${partition} instead of /consumers/${group}/offset/${topic}/${partition}. Added `s` to the word `offset`.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy O <manikumar.reddy@gmail.com>, Jun Rao <junrao@gmail.com>