Here we introduce client and broker changes to support multiple inflight requests while still guaranteeing idempotence. Two major problems to be solved:
1. Sequence number management on the client when there are request failures. When a batch fails, future inflight batches will also fail with `OutOfOrderSequenceException`. This must be handled on the client with intelligent sequence reassignment. We must also deal with the fatal failure of some batch: the future batches must get different sequence numbers when the come back.
2. On the broker, when we have multiple inflights, we can get duplicates of multiple old batches. With this patch, we retain the record metadata for 5 older batches.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3743 from apurvam/KAFKA-5494-increase-max-in-flight-for-idempotent-producer
It adds complexity for no benefit since we don't use
it anywhere.
Also removed a few unused imports, variables and
default parameters.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
Closes#3856 from ijuma/remove-security-protocol-trace
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3852 from hachikuji/minor-use-log-context-txn-manager
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3854 from vahidhashemian/minor/fix_log_context_message_format
`SelectorTest.testCloseConnectionInClosingState` creates a channel with some staging receives and moves time forward to expire the channel. To ensure that the channel will be expired on the next poll, the channel must be muted to avoid expiry time being updated if more data is available for read.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3823 from rajinisivaram/MINOR-SelectorTest-closingChannel
Changes:
1. When an exception is encountered in any of the methods in `Processor` while processing a channel, log the exception and close the connection. Continue to process other channels.
2. Fixes KAFKA-5790: SocketServer.processNewResponses should not skip a response if exception is thrown.
3. For `IllegalStateException` and `IOException` in `poll()`, don't close the `Selector`. Log the exception and continue.
4. Close channel on any failed send in `Selector`.
5. When closing channel fails or is closed, leave channel state as-is, indicating the state in which the channel was moved to closing.
6. Add tests for various failure scenarios.
7. Fix timing issue in `SocketServerTest.testConnectionIdReuse` by waiting for new connections to be processed by the server.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3548 from rajinisivaram/KAFKA-5607
Author: Erik Kringen <erik.kringen@icloud.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3582 from ErikKringen/trunk
StickyAssignor javadoc has a bunch of formatting issues which make it pretty hard to read:
http://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/consumer/StickyAssignor.html
cc vahidhashemian
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3670 from mimaison/sticky_assignor_javadoc
Also throw an exception if a null keystore type is seen
in `SecurityStore`. This should never happen.
The default keystore type has changed in Java 9 (
http://openjdk.java.net/jeps/229), so we need to
be explicit to have consistent behaviour across
Java versions.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3808 from ijuma/set-jks-explicitly-in-system-tests
- changed the interface & implementations
- updated tests to use the new method where applicable
Author: Attila Kreiner <attila@kreiner.hu>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3669 from attilakreiner/KAFKA-5726
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3719 from mjsax/kafka-5603-dont-abort-tx-for-zombie-tasks-2
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Jiangjie Qin <becket.qin@gmail.com>, Colin P. Mccabe <cmccabe@confluent.io>
Closes#3621 from lindong28/KAFKA-5694
If a request for a broker configuration failed due to a timeout or
the broker not being available, we would fail the futures
associated with the non-broker request instead (and never fail the
broker future, which would be left uncompleted forever).
We would also do an unnecessary request if only broker configs were
requested.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3585 from cmccabe/KAFKA-5659
- client id is part of the log context, so removed ad-hoc usages
- Fixed an issue where the response was not printed correctly,
use `toString(version)` instead of `toString()`
- Capitalized all log statements for consistency
- Fixed a number of double spaces after period
Author: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3741 from Kamal15/kafka-5762
* When a call is aborted, that should count as a "try" in the failure log message.
* FailureInjectingTimeoutProcessorFactory should fail the first request it is asked about.
* testCallTimeouts should expect the first request it makes to fail because of the timeout we injected.
* FailureInjectingTimeoutProcessorFactory should track how many failures it has injected, and the test should verify that one has been injected.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3731 from cmccabe/KAFKA-5720
Expose the ClassCastException as the cause for the SerializationException.
Author: Jeremy Custenborder <jcustenborder@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3556 from jcustenborder/KAFKA-5620
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3699 from cmccabe/trogdor-review
With LogContext, each producer log item is automatically prefixed with client id and transactional id.
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3703 from huxihx/KAFKA-5755
This patch contains a few small improvements to make request/response handling more consistent. Primarily it consolidates request/response serialization logic so that `SaslServerAuthenticator` and `KafkaApis` follow the same path. It also reduces the amount of custom logic needed to handle unsupported versions of the ApiVersions requests.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3673 from hachikuji/consolidate-response-handling
Two requests sent together may not always trigger a staged receive since the requests may not be received in a single poll and the channel is muted when receives are complete. Hence attempt to stage multiple times until a receive is staged to make the test more stable.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>
Closes#3712 from rajinisivaram/MINOR-connectionidreuse-test
This patch improves documentation on the handling of errors for the idempotent/transactional producer. It also fixes a couple minor inconsistencies and improves test coverage. In particular:
- UnsupportedForMessageFormat should be a fatal error for TxnOffsetCommit responses
- UnsupportedVersion should be fatal for Produce responses and should be returned instead of InvalidRequest
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3716 from hachikuji/KAFKA-5342
If prepare throws an exception in the same poll when the connection
is established, the channel id should be in `disconnected`, but
not in `connected`.
Author: dongeforever <dongeforever@apache.org>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3282 from dongeforever/KAFKA-5417
And make AppInfo's static fields final.
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3696 from omkreddy/KAFKA-4856
Author: Kamal C <kamal.chandraprakash@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3695 from Kamal15/typo_error
This patch ensures that the consumer groupId and clientId are available in all log messages which makes debugging much easier when a single application has multiple consumer instances. To make this easier, I've added a new `LogContext` object which builds a log prefix similar to the broker-side `kafka.utils.Logging` mixin. Additionally this patch changes the log level for a couple minor cases:
- Consumer wakeup events are now logged at DEBUG instead of TRACE
- Heartbeat enabling/disabling is now logged at DEBUG instead of TRACE
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3676 from hachikuji/log-consumer-wakeups
When there are broker delays that cause a response to take longer
than `connections.max.idle.ms`, connections may be closed by the
broker (as well as by the client) before the response is processed.
If the port is reused, broker may send the outstanding response to
a new connection with the reused port. The new connection will end
up with correlation id mismatch, requiring process restart. This
is also a security exposure since clients receive response
intended for the wrong connection.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3530 from rajinisivaram/KAFKA-5595
This implements two improvements for request handling prior to SASL authentication:
1. Only parse request types that are allowed prior to authentication.
2. Limit the maximum request size (the default is 100Mb).
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3558 from hachikuji/minor-restrict-presasl-request-parsing
Also make "created" message more consistent across clients.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#3658 from ijuma/admin-client-should-register-with-app-info-parser
The AdminClient does not properly clear calls from the callsInFlight structure.
Later, in an effort to clear the lingering call objects, it closes the connection
they are associated with. This disrupts new incoming calls, which then get
BrokerNotAvailableException.
This patch fixes this bug by properly removing completed calls from the
callsInFlight structure. It also adds the Call#aborted flag, which
ensures that we throw the right exception (TimeoutException instead of
DisconnectException) and only abort a connection once -- even if there
is a similar bug in the future which causes old Call objects to linger.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3584 from cmccabe/KAFKA-5658
Producer should not drop header information when splitting batches. This PR also corrects a minor typo in Sender.java, where `spitting and retrying` should be `splitting and retrying`.
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jiangjie Qin <becket.qin@gmail.com>
Closes#3620 from huxihx/KAFKA-5700
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3627 from hachikuji/minor-fix-controlled-shutdown-error-response
This patch replaces the legacy ControlledShutdown objects in `kafka.api` with the alternatives in `org.apache.kafka.common.requests`. Since this was the last API that needed updating, we have also dropped the reference in `RequestChannel.Request` to the legacy object type.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3612 from hachikuji/remove-old-controlled-shutdown-objects
This patch handles the case that a CorruptRecordException is thrown from the iterator directly.
The fix is a little tricky as exceptions can be thrown from a few different scenarios. The current approach is to let the same record go through the exact same process as last time when exception is thrown, so the exception will be thrown at the same step. The only problem for that is the iterator state will change once it throws an exception. To handle that we cache the first iterator exception and put it into the suppressed exception of the IllegalStateException thrown in the future.
Author: Jiangjie Qin <becket.qin@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3573 from becketqin/KAFKA-5630
this is the initial implementation.
Author: radai-rosenblatt <radai.rosenblatt@gmail.com>
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>, Jun Rao <junrao@gmail.com>
Closes#2330 from radai-rosenblatt/broker-memory-pool-with-muting
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3565 from lindong28/KAFKA-5627