This is a minor change but it helps to improve the log readability.
Author: Kamal C <kamal.chandraprakash@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2709 from Kamal15/util
The maxBytes field should be set to DEFAULT_RESPONSE_MAX_BYTES,
the same way as the constructor using the Struct does.
codeveloped with mimaison
Author: Edoardo Comar <ecomar@uk.ibm.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2694 from edoardocomar/MINOR-FetchRequest
- Improves streams efficiency by more than 200K requests/second (small 100 byte requests)
- Gets streams efficiency very close to pure consumer (see results in https://jenkins.confluent.io/job/system-test-kafka-branch-builder/746/console)
- Maintains same fairness across tasks
- Schedules all records in the queue in-between poll() calls, not just one per task.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax, Guozhang Wang
Closes#2643 from enothereska/minor-schedule-round-robin
I manually tested that Crc32CTest and AbstractChecksums pass with JDK 9. I also verified that `Java9ChecksumFactory` is used in that case.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2739 from ijuma/kafka-1449-crc32c
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Michael G. Noll <michael@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2727 from cmccabe/KAFKA-4944
The transient failures make it harder to spot real failures and we can live without what is being tested (adding security to ZK via a rolling upgrade) until KIP-101 lands.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>
Closes#2742 from ijuma/disable-zk-upgrade-test
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#2738 from hachikuji/streaming-compressed-iterator
This brought down a cluster by causing continuous controller moves.
ZkClient's ZkEventThread and a RequestSendThread can concurrently use objects that aren't thread-safe:
* Selector
* NetworkClient
* SSLEngine (this was the big one for us. We turn on SSL for interbroker communication).
As per the "Concurrency Notes" section from https://docs.oracle.com/javase/7/docs/api/javax/net/ssl/SSLEngine.html:
> two threads must not attempt to call the same method (either wrap() or unwrap()) concurrently
SSLEngine.wrap gets called in:
* SslTransportLayer.write
* SslTransportLayer.handshake
* SslTransportLayer.close
It turns out that the ZkEventThread and RequestSendThread can concurrently call SSLEngine.wrap:
* ZkEventThread calls SslTransportLayer.close from ControllerChannelManager.removeExistingBroker
* RequestSendThread can call SslTransportLayer.write or SslTransportLayer.handshake from NetworkClient.poll
Suppose the controller moves for whatever reason. The former controller could have had a RequestSendThread who
was in the middle of sending out messages to the cluster while the ZkEventThread began executing
KafkaController.onControllerResignation, which calls ControllerChannelManager.shutdown, which sequentially cleans
up the controller-to-broker queue and connection for every broker in the cluster. This cleanup includes the call
to ControllerChannelManager.removeExistingBroker as mentioned earlier, causing the concurrent call to SSLEngine.wrap.
This concurrent call throws a BufferOverflowException which ControllerChannelManager.removeExistingBroker catches so
the ControllerChannelManager.shutdown moves onto cleaning up the next controller-to-broker queue and connection,
skipping the cleanup steps such as clearing the queue, stopping the RequestSendThread, and removing the entry from its
brokerStateInfo map.
By failing out of the Selector.close, the sensors corresponding to the broker connection has not been cleaned up. Any
later attempt at initializing an identical Selector will result in a sensor collision and therefore cause Selector
initialization to throw an exception. In other words, any later attempts by this broker to become controller again
will fail on initialization. When controller initialization fails, the controller deletes the /controller znode and
lets another broker take over.
Now suppose the controller moves enough times such that every broker hits the BufferOverflowException concurrency
issue. We're now guaranteed to fail controller initialization due to the sensor collision on every controller
transition, so the controller will move across brokers continuously.
This patch avoids the concurrent use of non-threadsafe classes in ControllerChannelManager.removeExistingBroker
by shutting down the RequestSendThread before closing the NetworkClient.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Joel Koshy <jjkoshy.w@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2746 from onurkaraman/KAFKA-4959
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jiangjie Qin <becket.qin@gmail.com>
Closes#2476 from lindong28/KAFKA-4586
Author: Armin Braun <me@obrown.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2699 from original-brownbear/KAFKA-4569
1. Use Initialization-on-demand holder idiom that relies on JVM lazy-loading instead of explicit initialization check.
2. Method handles were designed to be faster than Core Reflection, particularly if the method handle can be stored in a static final field (the JVM can then optimise the call as if it was a regular method call). Since the code is of similar complexity (and simpler if we consider the whole PR), I am treating this as a clean-up instead of a performance improvement (which would require doing benchmarks).
3. Remove unused `ByteBufferReceive`.
4. I removed the snappy library from the classpath and verified that `CompressionTypeTest` (which uses LZ4) still passes. This shows that the right level of laziness is achieved even if we use one of the lazily loaded compression algorithms.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2740 from ijuma/use-method-handles-for-compressed-stream-supplier
Thanks to Dong Lin for finding the lastStableOffset issue.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Dong Lin <lindong28@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#2737 from ijuma/fix-fetch-response-lso
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Jun Rao <junrao@gmail.com>, Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2614 from hachikuji/exactly-once-message-format
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Michael G. Noll, Eno Thereska, Matthias J. Sax, Elias Levy, Guozhang Wang
Closes#2725 from dguy/minor-processor-java-doc
1. add thread id as prefix in state directory classes; also added logs for lock activities.
2. add logging for task creation / suspension.
3. add more information in rebalance listener logging.
4. add restored number of records into changlog reader.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Eno Thereska, Damian Guy, Ewen Cheslack-Postava
Closes#2702 from guozhangwang/KMinor-streams-task-creation-log4j-improvements
Changing getCanonicalName() references to getName() so that docs update with "$" instead of ".". Also added a connect-plugin-discovery.sh CLI to list all of the transformations available.
Author: Bruce Szalwinski <bruce.szalwinski@cdk.com>
Reviewers: Gwen Shapira
Closes#2720 from bruce-szalwinski/transforms and squashes the following commits:
ec3b5b9 [Bruce Szalwinski] remove connect-plugin-discovery. will submit in a different PR
eba0af7 [Bruce Szalwinski] Key / Value transformations are static nested classes and so are referenced using OuterClass$Key and OuterClass$Value.
Sometimes `ResetIntegrationTest` hangs and thus the build times out. We suspect, that this happens if no data is written into the input topics. Right now, input data is written once and reused for both test cases. If for some reason, the broker gets recreated (between both test cases), no data will be available for the second test method and thus the test hangs.
This change ensures, that input data is written for each test case individually.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma, Eno Thereska, Guozhang Wang
Closes#2630 from mjsax/minor-reset-integration-test
Debug loggin of the start and end offsets used during state store restoration
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2718 from dguy/log-restore-offsets
remove unused log field from KStreamTransformValuesProcessor
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2717 from dguy/remove-unused-log-para
iterate over all keys returned from the rocksdb iterator so we don't miss any results
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Xavier Léauté, Guozhang Wang
Closes#2713 from dguy/window-iter
This uses JUnit Categories to identify integration tests. Adds 2 new build targets:
`integrationTest` and `unitTest`.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska <eno@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2695 from dguy/junit-categories
Fixes related to handling of MAX_POLL_INTERVAL_MS_CONFIG during deadlock and CommitFailedException on partition revoked.
Author: Sachin Mittal <sjmittal@gmail.com>
Reviewers: Matthias J. Sax, Damian Guy, Guozhang Wang
Closes#2642 from sjmittal/trunk
I considered catching errors to add further information about naming internal state stores. However, Topic.validate() will throw an error that prints the offending name, so I decided not to add too much complexity.
Author: Nikki Thean <nthean@etsy.com>
Reviewers: Matthias J. Sax, Guozhang Wang, Eno Thereska, Damian Guy, Ismael Juma
Closes#2331 from nixsticks/internal-topics
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2684 from cmccabe/KAFKA-4895
Improves the reliability of this test by decreasing the lower bound (this is to be expected as throttling takes the first fetch to stabilise and will "over-fetch" for this first request)
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2698 from benstopford/throttling-test-fix
This problem is hard to debug otherwise as there error
returned to the client (“Coordinator not available”) is not
very informative.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2652 from enothereska/minor-warning-enforcement-offset
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Armin Braun, Damian Guy, Jason Gustafson
Closes#2682 from guozhangwang/K4859-cache-commit-interval
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Eno Thereska, Damian Guy, Jason Gustafson
Closes#2693 from guozhangwang/K4885-system-test-unexpected-exception-handler
If request logging is enabled, `ProduceRequest` can be accessed
and mutated concurrently from a network thread (which calls
`toString`) and a request handler thread (which calls
`clearPartitionRecords()`).
That can lead to a `ConcurrentModificationException` when iterating
the `partitionRecords` map.
The underlying thread-safety issue has existed since the server
started using the Java implementation of ProduceRequest in 0.10.0.
However, we were incorrectly not clearing the underlying struct until
0.10.2, so `toString` itself was thread-safe until that change. In 0.10.2,
`toString` is no longer thread-safe and we could potentially see a
`NullPointerException` given the right set of interleavings between
`toString` and `clearPartitionRecords` although we haven't seen that
happen yet.
In trunk, we changed the requests to have a `toStruct` method
instead of creating a struct in the constructor and `toString` was
no longer printing the contents of the `Struct`. This accidentally
fixed the race condition, but it meant that request logging was less
useful.
A couple of days ago, `AbstractRequest.toString` was changed to
print the contents of the request by calling `toStruct().toString()`
and reintroduced the race condition. The impact is more visible
because we iterate over a `HashMap`, which proactively
checks for concurrent modification (unlike arrays).
We will need a separate PR for 0.10.2.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>, Onur Karaman <okaraman@linkedin.com>, Jun Rao <junrao@gmail.com>
Closes#2689 from ijuma/produce-request-thread-safety
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#2685 from guozhangwang/KMinor-improve-log4j
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar reddy O <manikumar.reddy@gmail.com>
Closes#2690 from ijuma/fix-header-in-byte-array-converter
Make sure that the iterator returned from `WindowStore.fetch(..)` only returns matching keys, rather than all keys that are a prefix match.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#2662 from dguy/kafka-4863
hachikuji
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2679 from guozhangwang/KHotfix-serde-headers