Let consumer back-off and retry offset fetch when the specific offset topic has pending commits.
The major change lies in the broker side offset fetch logic, where a request configured with flag WaitTransaction to true will be required to back-off when some pending transactional commit is ongoing. This prevents any ongoing transaction being modified by third party, thus guaranteeing the correctness with input partition writer shuffling.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Loop call Consumer.endOffsets Throw TimeoutException: Failed to get offsets by times in 30000ms after a leader change.
Reviewers: Steven Lu, Guozhang Wang <wangguoz@gmail.com>
Prior to this commit, broker's Selector used to read all requests available on the socket when the socket is ready for read. These are queued up as staged receives. This can result in excessive memory usage when socket read buffer size is large. We mute the channel and stop reading any more data until all the staged requests are processed. This behaviour is slightly inconsistent since for the initial read we drain the socket buffer, allowing it to get filled up again, but if data arrives slighly after the initial read, then we dont read from the socket buffer until pending requests are processed.
To avoid holding onto requests for longer than required, this commit removes staged receives and reads one request at a time even if more data is available in the socket buffer. This is especially useful for produce requests which may be large and may take long to process. Additional data read from the socket for SSL is limited to the pre-allocated intermediate SSL buffers.
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Not wait until updateAssignmentMetadataIfNeeded returns true, but only call it once with 0 timeout. Also do not return empty if in rebalance.
Trim the pre-fetched records after long polling since assignment may have been changed.
Also need to update SubscriptionState to retain the state in assignFromSubscribed if it already exists (similar to assignFromUser), so that we do not need the transition of INITIALIZING to FETCHING.
Unit test: this actually took me the most time :)
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Sophie Blee-Goldman <sophie@confluent.io>, Jason Gustafson <jason@confluent.io>, Richard Yu <yohan.richard.yu@gmail.com>, dengziming <dengziming1993@gmail.com>
Existing producer state expiration uses timestamps from data records only and not from transaction markers. This can cause premature producer expiration when the coordinator times out a transaction because we drop the state from existing batches. This in turn can allow the coordinator epoch to revert to a previous value, which can lead to validation failures during log recovery. This patch fixes the problem by also leveraging the timestamp from transaction markers.
We also change the validation logic so that coordinator epoch is verified only for new marker appends. When replicating from the leader and when recovering the log, we only log a warning if we notice that the coordinator epoch has gone backwards. This allows recovery from previous occurrences of this bug.
Finally, this patch fixes one minor issue when loading producer state from the snapshot file. When the only record for a given producer is a control record, the "last offset" field will be set to -1 in the snapshot. We should check for this case when loading to be sure we recover the state consistently.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
It is possible for the user to call `commitTransaction` before a pending `AddPartitionsToTxn` call has returned. If the `AddPartitionsToTxn` call returns an abortable error, we need to cancel any pending batches in the accumulator and we need to fail the pending commit. The problem in this case is that `Sender.runOnce`, after failing the batches, enters `NetworkClient.poll` before it has a chance to cancel the commit. Since there are no pending requests at this time, this will block unnecessarily and prevent completion of the doomed commit. This patch fixes the problem by returning from `runOnce` if any batches have been aborted, which allows the commit to also fail without the delay.
Note that this was the cause of the delay executing `AuthorizerIntegrationTest.testTransactionalProducerTopicAuthorizationExceptionInCommit` compared to other tests in this class. After fixing the bug, the delay is gone and the test time is similar to other test cases in the class.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Also document the parameters of `ReplicaAssignment` in `ControllerContext`.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
This change updates the CreatePartitions request and response api objects
to use the generated protocol classes.
Reviewers: Jason Gustafson <jason@confluent.io>
The close method calls `Thread.join` to wait for AdminClient thread to
die, but if the close is called in the api completion callback, `join`
waits forever, as the thread calling `join` is same as the thread it
wants to wait to die.
This change checks for this condition and skips the join. The thread
will then return to main loop, where it will check for this condition
and exit.
Reviewers: Jason Gustafson <jason@confluent.io>
The log context is useful when debugging applications which have multiple clients. This patch propagates the context to the channel builders and the SASL authenticator.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
* Adjust build and documentation.
* Use lambda syntax for SAM types in `core`, `streams-scala` and
`connect-runtime` modules.
* Remove `runnable` and `newThread` from `CoreUtils` as lambda
syntax for SAM types make them unnecessary.
* Remove stale comment in `FunctionsCompatConversions`,
`KGroupedStream`, `KGroupedTable' and `KStream` about Scala 2.11,
the conversions are needed for Scala 2.12 too.
* Deprecate `org.apache.kafka.streams.scala.kstream.Suppressed`
and use `org.apache.kafka.streams.kstream.Suppressed` instead.
* Use `Admin.create` instead of `AdminClient.create`. Static methods
in Java interfaces can be invoked since Scala 2.12. I noticed that
MirrorMaker 2 uses `AdminClient.create`, but I did not change them
as Connectors have restrictions on newer client APIs.
* Improve efficiency in a few `Gauge` implementations by avoiding
unnecessary intermediate collections.
* Remove pointless `Option.apply` in `ZookeeperClient`
`SessionState` metric.
* Fix unused import/variable and other compiler warnings.
* Reduce visibility of some vals/defs.
Reviewers: Manikumar Reddy <manikumar@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Gwen Shapira <gwen@confluent.io>
When the consumer's fetch request is throttled by the KIP-219 mechanism,
it receives an empty fetch response. The consumer should not log this
as an error.
Reviewers: Jason Gustafson <jason@confluent.io>
Allow ConfigCommand to handle more operations without using direct ZooKeeper access, as described by KIP-543. Also allow specifying entity type and name via a single flag-- again, as the KIP describes.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Adds support for TLSv1.3 in SslTransportLayer. Note that TLSv1.3 is only enabled from Java 11 onwards, so we test the code only when running with Java11 and above.
Tests run on this PR:
- SslTransportLayerTest: This covers testing of our SslTransportLayer and all tests are run with TLSv1.3 when running with Java 11. These tests are also run with TLSv1.2 for all Java versions.
- SslFactoryTest: Also run with TLSv1.3 on Java 11 onwards in addition to TLSv1.2 for all Java versions.
- SslEndToEndAuthorizationTest - Run only with TLSv1.3 on Java 11 onwards and only with TLSv1.2 on earlier Java versions. We have other versions of this test which use SSL that continue to be with TLSv1.2 on Java 11 to avoid reducing test coverage for TLSv1.2
Additional testing for done for TLSv1.3:
- Most tests that use SSL use TestSslUtils.DEFAULT_TLS_PROTOCOL_FOR_TESTS which is set to TLSv1.2. I have run all clients and core tests with DEFAULT_TLS_PROTOCOL_FOR_TESTS=TLSv1.3 with Java 11.
- Ran a few system tests locally with TKSv1.3
Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>
`KafkaAdminClientTest` contains many code repetitions which could be removed. This PR removes most of the boiler plate code.
Reviewers: Jason Gustafson <jason@confluent.io>
The create topic api do not work with older version of the api. It can be reproduced by trying to create a topic with kafka-topics.sh from 2.3. It timeouts.
b94c7f4 has added a check which raises an exception if a field has been set to a non-default value unless the field is marked as "ignorable".
The fields added in the version 5 of the response are always set regardless of the version used by the client. If an older version is used, an exception is thrown during the serialization because the fields have non-default values. We should either not set the fields for older versions in the api layer or mark them as ignorable. I have chosen the later in this case because it looks cleaner.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Mickael Maison <mickael.maison@gmail.com>
Collect and expose the KIP-511 client name and version information the clients now provide to the server as part of ApiVersionsRequests. Also refactor how we handle selector metrics by creating a ChannelMetadataRegistry class. This will make it easier for various parts of the networking code to modify channel metrics.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Provide custom configs to ChannelBuilders in addition to parsed `values()`
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Andrew Choi <andchoi@linkedin.com
When sun.security.jgss.native=true, we currently create a new server credential for every new client connection and add to the private credential set of the server's Subject. This is expensive and can result in an unbounded number of private credentials in the Subject used by the broker. The PR creates a credential immediately after login so that a single credential can be reused.
Reviewers: Guozhang Wang <wangguoz@gmail.com
Brokers are supposed to force SASL clients to re-authenticate (and kill such connections in the absence of a timely and successful re-authentication) when KIP-368 SASL Re-Authentication is enabled via a positive connections.max.reauth.ms configuration value. There was a flaw in the logic that caused connections to not be killed in the absence of a timely and successful re-authentication if the client did not leverage the SaslAuthenticateRequest API (which was defined in KIP-152).
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
KIP-320 improved fetch semantics by adding leader epoch validation. This relies on
reliable propagation of leader epoch information from the controller. Unfortunately, we
have encountered a bug during partition reassignment in which the leader epoch in the
controller context does not get properly updated. This causes UpdateMetadata requests
to be sent with stale epoch information which results in the metadata caches on the
brokers falling out of sync.
This bug has existed for a long time, but it is only a problem due to the new epoch
validation done by the client. Because the client includes the stale leader epoch in its
requests, the leader rejects them, yet the stale metadata cache on the brokers prevents
the consumer from getting the latest epoch. Hence the consumer cannot make progress
while a reassignment is ongoing.
Although it is straightforward to fix this problem in the controller for the new releases
(which this patch does), it is not so easy to fix older brokers which means new clients
could still encounter brokers with this bug. To address this problem, this patch also
modifies the client to treat the leader epoch returned from the Metadata response as
"unreliable" if it comes from an older version of the protocol. The client in this case will
discard the returned epoch and it won't be included in any requests.
Also, note that the correct epoch is still forwarded to replicas correctly in the
LeaderAndIsr request, so this bug does not affect replication.
Reviewers: Jun Rao <junrao@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>
Remove in catch clause and move it to the callback.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
If a non-consumer group is specified in `describeConsumerGroup`, the future will hang indefinitely because the future callback is never completed. This patch fixes the problem by completing the future exceptionally with an `IllegalArgumentException`.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
This patch fixes an NPE in `DefaultMetadataUpdater` due to an inconsistency in event expectations. Whenever there is an authentication failure, we were treating it as a failed update even if was from a separate connection from an inflight metadata request. This patch fixes the problem by making the `MetadataUpdater` api clearer in terms of the events that are handled.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
This patch fixes a bug in `SocketServer` in the expiration of connections which have not re-authenticated quickly enough. Previously these connections were left hanging, but now they are properly closed and cleaned up. This was one cause of the flaky test failures in `EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl`.
Reviewers: Jason Gustafson<jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
This reverts commit 90043d5f as it caused a regression in some cases:
> Caused by: java.io.IOException: Stream frame descriptor corrupted
> at org.apache.kafka.common.record.KafkaLZ4BlockInputStream.readHeader(KafkaLZ4BlockInputStream.java:132)
> at org.apache.kafka.common.record.KafkaLZ4BlockInputStream.<init>(KafkaLZ4BlockInputStream.java:78)
> at org.apache.kafka.common.record.CompressionType$4.wrapForInput(CompressionType.java:110)
I will investigate why after, but I want to get the safe fix into 2.4.0.
The reporter of KAFKA-9203 has verified that reverting this change
makes the problem go away.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Changes the ProducerMetadata to longer record a sentinel TOPIC_EXPIRY_NEEDS_UPDATE upon topic map emplacement, and instead set the expiry time directly. Previously the expiry time was being updated for all touched topics after a metadata fetch was processed, which could be seconds/minutes in the future.
Additionally propagates the current time further in the Producer, which should reduce the total number of current-time calls.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
This PR is a follow-up of #7087, fixing typos, styles, etc.
cc/ big-andy-coates ijuma
Author: Lee Dongjin <dongjin@apache.org>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#7217 from dongjinleekr/feature/trivial-admin-javadoc
If a field is not marked as ignorable, we should raise an exception if it has been set to a non-default value. This check already exists in `Message.write`, so this patch adds it to `Message.toStruct`. Additionally, we fix several fields which should have been marked ignorable and we fix some related test assertions.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>, Colin Patrick McCabe <cmccabe@apache.org>
Allow null as a valid default for tagged fields. Fix a bunch of cases where this would previously result in null pointer dereferences.
Also allow inferring FieldSpec#versions based on FieldSpec#taggedVersions. Prefix 'key' with an underscore when it is used in the generated code, to avoid potential name collisions if someone names an RPC field "key".
Allow setting setting hexadecimal constants and 64-bit contstants.
Add a lot more test cases to SimpleExampleMessage.json.
Reviewers: Jason Gustafson <jason@confluent.io>
After a number of last minute bugs were found stemming from the incremental closing of lost tasks in StreamsRebalanceListener#onPartitionsLost, a safer approach to this edge case seems warranted. We initially wanted to be as "future-proof" as possible, and avoid baking further protocol assumptions into the code that may be broken as the protocol evolves. This meant that rather than simply closing all active tasks and clearing all associated state in #onPartitionsLost(lostPartitions) we would loop through the lostPartitions/lost tasks and remove them one by one from the various data structures/assignments, then verify that everything was empty in the end. This verification in particular has caused us significant trouble, as it turns out to be nontrivial to determine what should in fact be empty, and if so whether it is also being correctly updated.
Therefore, before worrying about it being "future-proof" it seems we should make sure it is "present-day-proof" and implement this callback in the safest possible way, by blindly clearing and closing all active task state. We log all the relevant state (at debug level) before clearing it, so we can at least tell from the logs whether/which emptiness checks were being violated.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Andrew Choi <andchoi@linkedin.com>
This patch removes the explicit version check pattern we used in `getErrorResponse`, which is a pain to maintain (as seen by KAFKA-9200). We already check that requests have a valid version range in the `AbstractRequest` constructor.
Reviewers: Andrew Choi <andrewchoi5@users.noreply.github.com>, Ismael Juma <ismael@juma.me.uk>
ListOffsetResponse getErrorResponse is missing a a case for version 5, introduced
by 152292994e and released in 2.3.0.
```
java.lang.IllegalArgumentException: Version 5 is not valid. Valid versions for ListOffsetRequest are 0 to 5
at org.apache.kafka.common.requests.ListOffsetRequest.getErrorResponse(ListOffsetRequest.java:282)
at kafka.server.KafkaApis.sendErrorOrCloseConnection(KafkaApis.scala:3062)
at kafka.server.KafkaApis.sendErrorResponseMaybeThrottle(KafkaApis.scala:3045)
at kafka.server.KafkaApis.handleError(KafkaApis.scala:3027)
at kafka.server.KafkaApis.handle(KafkaApis.scala:209)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:78)
at java.lang.Thread.run(Thread.java:748)
```
Reviewers: Ismael Juma <ismael@juma.me.uk>
The current Utils::readFileAsString method creates a FileChannel and
memory maps file and copies its content to a String and returns it. But
that means that we need to know the size of the file in advance. This
precludes us from reading files whose size is not known in advance, i.e.
any file opened with flag S_IFIFO.
This change updates the method to use stream to read the content of the file.
It has couple of practical advantages:
Allows bash process substitution to pass in strings as file. So we can
say `./bin/kafka-reassign-partitions.sh --reassignment-json-file <(echo
"reassignment json")
When adding systest for commands that take file, we don't have to
create real physical file. Instead we can just dump the content of the
file on the command line.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Explicitly mention that max.request.size validates uncompressed record sizes and max.message.bytes validates compressed record sizes.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Sometimes to be backwards compatible regarding metrics the simplest
solution is to create an empty sensor. Recording an empty sensor on
the hot path may negatively impact performance. With hasMetrics()
recordings of empty sensors on the hot path can be avoided without
being to invasive.
Reviewers: Bill Bejeck <bbejeck@gmail.com>
This patch implements the broker-side changes for KIP-360. It adds two new fields to InitProducerId: lastEpoch and producerId. Passing these values allows the TransactionCoordinator to safely bump a producer's epoch after some failures (such as UNKNOWN_PRODUCER_ID and INVALID_PRODUCER_ID_MAPPING). When a producer calls InitProducerId after a failure, the coordinator first checks the producer ID from the request to make sure no other producer has been started using the same transactional ID. If it is safe to continue, the coordinator checks the epoch from the request; if it matches the existing epoch, the epoch is bumped and the producer can safely continue. If it matches the previous epoch, the the current epoch is returned without bumping. Otherwise, the producer is fenced.
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>