If request logging is enabled, `ProduceRequest` can be accessed
and mutated concurrently from a network thread (which calls
`toString`) and a request handler thread (which calls
`clearPartitionRecords()`).
That can lead to a `ConcurrentModificationException` when iterating
the `partitionRecords` map.
The underlying thread-safety issue has existed since the server
started using the Java implementation of ProduceRequest in 0.10.0.
However, we were incorrectly not clearing the underlying struct until
0.10.2, so `toString` itself was thread-safe until that change. In 0.10.2,
`toString` is no longer thread-safe and we could potentially see a
`NullPointerException` given the right set of interleavings between
`toString` and `clearPartitionRecords` although we haven't seen that
happen yet.
In trunk, we changed the requests to have a `toStruct` method
instead of creating a struct in the constructor and `toString` was
no longer printing the contents of the `Struct`. This accidentally
fixed the race condition, but it meant that request logging was less
useful.
A couple of days ago, `AbstractRequest.toString` was changed to
print the contents of the request by calling `toStruct().toString()`
and reintroduced the race condition. The impact is more visible
because we iterate over a `HashMap`, which proactively
checks for concurrent modification (unlike arrays).
We will need a separate PR for 0.10.2.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>, Onur Karaman <okaraman@linkedin.com>, Jun Rao <junrao@gmail.com>
Closes#2689 from ijuma/produce-request-thread-safety
hachikuji
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2679 from guozhangwang/KHotfix-serde-headers
Both the headers and requests have regressed to just show object ids instead of their contents from their underlying structs. I'm guessing this regression came from commit [fc1cfe475e](fc1cfe475e)
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>
Closes#2678 from onurkaraman/KAFKA-4891
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Dongjin Lee, Eno Thereska, Damian Guy, Colin P. McCabe, Matthas J. Sax, Guozhang Wang
Closes#2554 from miguno/KAFKA-4769
It’s a minor optimisation, but simple enough.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#2658 from ijuma/has-in-flight-requests
Remove unnecessary 'flush', the underlying stream should handle it on close.
Author: Will Droste <william.droste@arris.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2528 from wdroste/trunk
Also fix a potential reordering bug and include a few clean-ups.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jiangjie (Becket) Qin <becket.qin@gmail.com>, Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2641 from lindong28/KAFKA-4820-followup
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2646 from hachikuji/rename-record-batch
`Long.MaxValue` for the linger overflows in `RecordBatch#maybeExpire` when added to
the current timestamp.
Then causes an error to be set for the batch by `Sender` (not happening every time since
it depends on the timing of `Sender`):
That error then causes a call to `ProduceRequestResult#done` on the batch, which then
makes the check for "not done" fail.
Author: Armin Braun <me@obrown.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2639 from original-brownbear/KAFKA-3155
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jiangjie Qin <becket.qin@gmail.com>
Closes#2619 from lindong28/KAFKA-4820
If leader node of one more more partitions in a consumer subscription are temporarily unavailable, request metadata refresh so that partitions skipped for assignment dont have to wait for metadata expiry before reassignment. Metadata refresh is also requested if a subscribe topic or assigned partition doesn't exist.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2622 from rajinisivaram/KAFKA-4631
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ewen Cheslack-Postava <me@ewencp.org>, Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2594 from dguy/checkstyle
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2303 from mjsax/licenseHeader
The consumer properties get logged twice since two instances
of ConsumerConfig are created during creation of KafkaConsumer.
I added a constructor of ConsumerConfig accepting the boolean
parameter doLog which is already passable in AbstractConfig
and set it to false during the second ConsumerConfig creating
in the KafkaConsumer constructor.
Author: Marco Ebert <marco_ebert@icloud.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2600 from Gacko/trunk
Finding the protocol associated with an API key can be a challenge in the lengthy [web page](http://kafka.apache.org/protocol.html#protocol_api_keys).
Adding hyperlinks would definitely help with that.
Co-authored with imandhan.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2467 from vahidhashemian/minor/hyperlinks_in_kafka_protocol_guide
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#2585 from ijuma/overflow-and-format-fixes
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Onur Karaman <okaraman@linkedin.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2596 from hachikuji/ensure-poll-with-inflight-requests
Author: Armin Braun <me@obrown.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2582 from original-brownbear/cleanup-nonfinal-close
Replace one-by-one initialization of state stores with bulk initialization.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#2560 from dguy/kafka-4494
- Minor Javadoc fixes
- Used final modifier if possible
- Unnecessary type casts removed
- Other minor clean-ups
Author: Kamal C <kamal.chandraprakash@gmail.com>
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2578 from Kamal15/config
It contained this step:
val canShutdown = isShuttingDown.compareAndSet(false, true)
if (canShutdown && shutdownLatch.getCount > 0) {
without any fallback for the case of `shutdownLatch.getCount == 0`. So in the case
of `shutdownLatch.getCount == 0` (when a previous call to the shutdown method
was right about to finish) you would set `isShuttingDown` to true again without any
possibility of ever getting the server started (since `startup` will check
`isShuttingDown` before setting up a new latch with count 1).
Long story short: concurrent calls to shutdown can get the server locked in a broken state.
This fixes the reported error:
java.lang.IllegalStateException: Kafka server is still shutting down, cannot re-start!
at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
at kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply$mcVI$sp(KafkaServerTestHarness.scala:117)
at kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
at kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at kafka.integration.KafkaServerTestHarness$class.restartDeadBrokers(KafkaServerTestHarness.scala:116)
at kafka.api.ConsumerBounceTest.restartDeadBrokers(ConsumerBounceTest.scala:34)
at kafka.api.ConsumerBounceTest$BounceBrokerScheduler.doWork(ConsumerBounceTest.scala:158)
Author: Armin Braun <me@obrown.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2568 from original-brownbear/KAFKA-4198
Also move `requireTimestamp` to `minVersion` logic from `Fetcher` to
`ListOffsetRequest.Builder.forConsumer()`.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#2580 from ijuma/move-proto-utils-to-api-keys
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2543 from hachikuji/remove-message-writer
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2561 from hachikuji/fix-npe-api-version-tostring
…lass should be static
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2558 from cmccabe/KAFKA-4774
More details:
* Replaced `struct` field in Request/Response with a `toStruct` method. This
makes the performance model (including memory usage) easier to understand.
Note that requests have `toStruct()` while responses have `toStruct(version)`.
* Replaced mutable `version` field in `Request.Builder` with an immutable
field `desiredVersion` and a `version` parameter passed to the `build` method.
* Optimised `handleFetchRequest` to avoid unnecessary creation of `Struct`
instances (from 4 to 2 in the worst case and 2 to 1 in the best case).
* Various clean-ups in request/response classes and their test. In particular,
it is now clear what we are testing. Previously, it looked like we were testing
more than we really were.
With this in place, we could remove `AbstractRequest.Builder` in the future by
doing the following:
* Change `AbstractRequest.toStruct` to accept a version (like responses).
* Change `AbstractRequest.version` to be `desiredVersion` (like `Builder`).
* Change `ClientRequest` to take `AbstractRequest`.
* Move validation from the `build` methods to the request constructors or
static factory methods.
* Anything else required for the code to compile again.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#2513 from ijuma/separate-struct
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2536 from hachikuji/minor-move-compression-io-construction
We are now using artificially broken hosts that are not resolved as potential
collisions (127.0.53.53s) by some DNS servers.
This change is the only way to build while using Google's `8.8.8.8` (at least in my network).
Author: Armin Braun <me@obrown.io>
Reviewers: Dong Lin <lindong28@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2552 from original-brownbear/KAFKA-4765
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Vahid Hashemian <vahidhashemian@us.ibm.com>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Closes#2545 from hachikuji/KAFKA-4761
…porter.configure
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2540 from cmccabe/KAFKA-4756
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2475 from vahidhashemian/minor/use_explicit_Errors_type_when_possible
Author: Grant Henke <ghenke@cloudera.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2246 from granthenke/truststore-password
GroupCoordinatorMetrics currently sets up join-time-max and sync-time-max incorrectly as a "new Avg()" MeasurableStat instead of "new Max()"
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2520 from onurkaraman/KAFKA-4749
Author: Satish Duggana <sduggana@hortonworks.com>
Reviewers: Dong Lin <lindong28@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2509 from satishd/buffer-cleanup
This change is in response to [KAFKA-4725](https://issues.apache.org/jira/browse/KAFKA-4725).
When a produce request is received, if the user/client is exceeding their produce quota, the response will be delayed until the quota is refilled appropriately.
Unfortunately, the request body is still referenced in the callback which in turn leaks the messages contained within the request.
This change allows the `KafkaApis` method to take ownership of the request body from the `RequestChannel.Request` object.
I am not sure whether this breaks other invariants which are assumed within other parts of Kafka.
Author: Tim Carey-Smith <tim@spork.in>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2496 from halorgium/fix-throttled-response-leak
Author: Maysam Yabandeh <myabandeh@dropbox.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2474 from ijuma/kafka-4039-deadlock-during-shutdown
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Manikumar reddy O <manikumar.reddy@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2469 from hachikuji/improve-consumer-logging
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2414 from cmccabe/KAFKA-4635