junrao added a config `--print.metrics` to control whether ProducerPerformance prints out metrics at the end of the test. If its okay, will add the code counterpart for consumer.
Author: huxi <huxi@zhenrongbao.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2860 from amethystic/kafka-5068_print_metrics_in_perf_tests
Test the various controller protocols by observing zookeeper and broker state.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2853 from onurkaraman/KAFKA-5069
This PR covers point (2) and point (5) from KAFKA-5036:
**Commit 1:**
2. Currently, we update the leader epoch in epochCache after log append in the follower but before log append in the leader. It would be more consistent to always do this after log append. This also avoids issues related to failure in log append.
5. The constructor of LeaderEpochFileCache has the following:
lock synchronized { ListBuffer(checkpoint.read(): _*) }
But everywhere else uses a read or write lock. We should use consistent locking.
This is a refactor to the way epochs are cached, replacing the code to cache the latest epoch in the LeaderEpochFileCache by reusing the cached value in Partition. There is no functional change.
**Commit 2:**
Adds an assert(epoch >=0) as epochs are written. Refactors tests so they never hit this assert.
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2831 from benstopford/KAFKA-5036-part2-second-try
Author: Dong Lin <lindong28@gmail.com>
Author: Dong Lin <lindong28@users.noreply.github.com>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>
Closes#2859 from lindong28/KAFKA-5075
Enable producer per task if exactly-once config is enabled.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2773 from mjsax/exactly-once-streams-producer-per-task
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2848 from enothereska/KAFKA-5038-trunk
Set the internal consumer config internal.leave.group.on.close in
`StreamsConfig`. This is to reduce the number of rebalances we get
during bounces.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2750 from dguy/kafka-4965
Introduced in PR #2824. Already fixed in the
website github.
Author: Gwen Shapira <cshapi@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2844 from gwenshap/docs-hotfix
(cherry picked from commit 5f728532ac)
Signed-off-by: Ismael Juma <ismael@juma.me.uk>
Also include a few code readability improvements.
Author: jozi-k <jozef.koval@protonmail.ch>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2731 from jozi-k/immutable_LeaderAndIsr
This property is mentioned in the quickstart.
Author: huxi <huxi@zhenrongbao.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2661 from amethystic/kafka4866_consoleconsumer_ignore_printvalue
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2751 from miguno/trunk-streams-window-iterator-doc-fixes
Also:
1. FindCoordinator is more general and takes a coordinator_type
so that it can be used for the group and transaction coordinators.
2. Include an error message in FindCoordinatorResponse to make the
errors at the client side more informative. We have just added the
field to the protocol in this PR, a subsequent PR will update the
code to use it.
3. Rename `Errors` names for FindCoordinator to be more generic. This
is a compatible change as the ids remain the same.
4. Since the exception classes for the error codes are in a public
package, we introduce new ones and deprecate the old ones.
The classes were not thrown back to the user (KAFKA-5052 aside),
so this is a compatible change.
5. Update InitPidRequest for transactions. Since this protocol API
was introduced recently and is not used by default, we did not bump
its version.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2825 from apurvam/exactly-once-rpc-stubs
In 67fc2a91a6, we are using an empty collection and comparing via
value equality, so if a user passes an empty collection, they will
get the default ACLs instead of no ACLs. We fix that issue here.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram
Closes#2829 from ijuma/zk-utils-default-acls-improvement and squashes the following commits:
0846172 [Ismael Juma] Add missing import
2dc84f3 [Ismael Juma] Simplify logic in `sensitivePath`
8122f27 [Ismael Juma] Use a true sentinel instead of an empty collection for `UseDefaultAcls`
Also add test and refactor things a little to make testing easier.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Ben Stopford <benstopford@gmail.com>, Jun Rao <junrao@gmail.com>
Closes#2822 from ijuma/hotfix-checkpoint-file
- Consistent validation across different code paths in LogValidator
- Validate baseOffset for message format V2
- Flesh out LogValidatorTest to check producerId, baseSequence, producerEpoch and partitionLeaderEpoch.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2802 from ijuma/validate-base-offset
Author: Eno Thereska <eno@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2819 from enothereska/minor-increase-retries
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jozef Koval <jozef.koval@protonmail.ch>, Ismael Juma <ismael@juma.me.uk>
Closes#2687 from cmccabe/KAFKA-4899
Author: Matthias J. Sax <matthias@confluent.io>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2799 from mjsax/kafka-4990-add-api-stub-config-parameters-request-types
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2780 from cmccabe/KAFKA-4995
This PR replaces https://github.com/apache/kafka/pull/2743 (just raising from Confluent repo)
This PR describes the addition of Partition Level Leader Epochs to messages in Kafka as a mechanism for fixing some known issues in the replication protocol. Full details can be found here:
[KIP-101 Reference](https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation)
*The key elements are*:
- Epochs are stamped on messages as they enter the leader.
- Epochs are tracked in both leader and follower in a new checkpoint file.
- A new API allows followers to retrieve the leader's latest offset for a particular epoch.
- The logic for truncating the log, when a replica becomes a follower, has been moved from Partition into the ReplicaFetcherThread
- When partitions are added to the ReplicaFetcherThread they are added in an initialising state. Initialising partitions request leader epochs and then truncate their logs appropriately.
This test provides a good overview of the workflow `EpochDrivenReplicationProtocolAcceptanceTest.shouldFollowLeaderEpochBasicWorkflow()`
The corrupted log use case is covered by the test
`EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards()`
Remaining work: There is a do list here: https://docs.google.com/document/d/1edmMo70MfHEZH9x38OQfTWsHr7UGTvg-NOxeFhOeRew/edit?usp=sharing
Author: Ben Stopford <benstopford@gmail.com>
Author: Jun Rao <junrao@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2808 from benstopford/kip-101-v2
Highlight that the range in `fetch` is inclusive of both `timeFrom` and `timeTo`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Michael G. Noll <michael@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2811 from dguy/minor-window-fetch-java-doc
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Eno Thereska <eno@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2812 from miguno/trunk-streams-examples-docs
We should catch `InvalidTopicException` and not just
`NoOffsetForPartitionException`. Also, we need to step through
all partitions that might be affected and reset those.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2747 from mjsax/minor-fix-reset
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2779 from cmccabe/KAFKA-4993
There should only be a single `KafkaStreams.StreamStateListener` to
ensure synchronization of operations on
`KafkaStreams.StreamStateListener#threadState`.
Author: Armin Braun <me@obrown.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2801 from original-brownbear/fix-stream-state-listener
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2660 from ewencp/minor-make-configdef-safer
This fixes:
```
java.lang.AssertionError: expected:<2> but was:<3>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at org.apache.kafka.streams.processor.internals.StateDirectoryTest.shouldCleanUpTaskStateDirectoriesThatAreNotCurrentlyLocked(StateDirectoryTest.java:145)
```
While running test in infinite loop, hit other problems:
- fixed file management (release all locks and close everything)
- increased sleep time for `shouldCleanupStateDirectoriesWhenLastModifiedIsLessThanNowMinusCleanupDelay` too (was flaky as well)
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2781 from mjsax/minor-fix-stateDirectoryTest
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2777 from mjsax/hotfix-window-serdes-trunk
Several fixes for handling broker failures:
- default replication value for internal topics is now 3 in test itself (not in streams code, that will require a KIP.
- streams producer waits for acks from all replicas in test itself (not in streams code, that will require a KIP.
- backoff time for streams client to try again after a failure to contact controller.
- fix bug related to state store locks (this helps in multi-threaded scenarios)
- fix related to catching exceptions property for network errors.
- system test for all the above
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Dan Norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2719 from enothereska/KAFKA-4916-broker-bounce-test
https://issues.apache.org/jira/browse/KAFKA-4810
> Currently SchemaBuilder is strict when checking that certain fields have not been set yet (e.g. version, name, doc). It just checks that the field is null. This is intended to protect the user from buggy code that overwrites a field with different values, but it's a bit too strict currently. In generic code for converting schemas (e.g. Converters) you will sometimes initialize a builder with these values (e.g. because you get a SchemaBuilder for a logical type, which sets name & version), but then have generic code for setting name & version from the source schema.
Changed the validation method to not only check if a field is null but also to check if the new value that is being set is the same as the current value of the field.
ewencp
Author: Vitaly Pushkar <vitaly.pushkar@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2806 from vitaly-pushkar/KAFKA-4810-schema-builder-default-fields-validation
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2763 from cmccabe/KAFKA-4977
Though MirrorMaker uses the `producer.type` value of the
producer properties, ProducerConfig show the warning:
`The configuration 'producer.type' was supplied but
isn't a known config.`
Author: Shun Takebayashi <shun@takebayashi.asia>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2676 from takebayashi/suppress-mirrormaker-warning
Of particular importance are compression buffers (64 KB for LZ4, for example).
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2796 from apurvam/idempotent-producer-close-data-stream
Addresses for https://issues.apache.org/jira/browse/KAFKA-4878
* Adjusted the error message to explicitly state errors and their number
* Dried up the logic for generating the message between standalone and distributed
Example
messed up two config keys in the file source config:
````
namse=local-file-source
connector.class=FileStreamSource
tasks.max=1
fisle=test.txt
topic=connect-test
```
Produces:
```
[2017-03-22 08:57:11,896] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:99)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 2 error(s):
Missing required configuration "file" which has no default value.
Missing required configuration "name" which has no default value.
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
```
Author: Armin Braun <me@obrown.io>
Reviewers: Gwen Shapira, Konstantine Karantasis, Ewen Cheslack-Postava
Closes#2722 from original-brownbear/KAFKA-4878
fixes:
```
java.nio.file.NoSuchFileException: /tmp/test7863510415433793941/topic2-Canonized/topic2-Canonized-197001010000/000015.sst
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:97)
at java.nio.file.Files.readAttributes(Files.java:1686)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:105)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:199)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:199)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:199)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:69)
at java.nio.file.Files.walkFileTree(Files.java:2602)
at java.nio.file.Files.walkFileTree(Files.java:2635)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:555)
at org.apache.kafka.streams.kstream.internals.KStreamWindowAggregateTest.testJoin(KStreamWindowAggregateTest.java:320)
```
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Jun Rao <junrao@gmail.com>
Closes#2778 from mjsax/minor-fix-kstreamWindowAggregateTest
The bug meant that the base offset was the same as the batch size instead of 0 so the broker would always recompress batches.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2794 from ijuma/fix-records-builder-construction
Fixes deadlock scenario found during local test run:
The main thread was waiting for the coordinator lock.
The thread performing close() was holding the
coordinator lock and polling to find coordinator.
The test expected close() to timeout, but for timing
out, the main thread had to update time, which it
couldn't since it was waiting for the lock. This fix
avoids using coordinator in the main thread during
the close task.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2792 from rajinisivaram/MINOR-closetest-deadlock
This may be a reason why we see Jenkins jobs time out at times.
I can reproduce it locally.
With current trunk there is a possibility to run into this:
```sh
"kafka-streams-close-thread" #585 daemon prio=5 os_prio=0 tid=0x00007f66d052d800 nid=0x7e02 waiting for monitor entry [0x00007f66ae2e5000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.kafka.streams.processor.internals.StreamThread.close(StreamThread.java:345)
- waiting to lock <0x000000077d33c538> (a org.apache.kafka.streams.processor.internals.StreamThread)
at org.apache.kafka.streams.KafkaStreams$1.run(KafkaStreams.java:474)
at java.lang.Thread.run(Thread.java:745)
"appId-bd262a91-5155-4a35-bc46-c6432552c2c5-StreamThread-97" #583 prio=5 os_prio=0 tid=0x00007f66d052f000 nid=0x7e01 waiting for monitor entry [0x00007f66ae4e6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.kafka.streams.KafkaStreams.setState(KafkaStreams.java:219)
- waiting to lock <0x000000077d335760> (a org.apache.kafka.streams.KafkaStreams)
at org.apache.kafka.streams.KafkaStreams.access$100(KafkaStreams.java:117)
at org.apache.kafka.streams.KafkaStreams$StreamStateListener.onChange(KafkaStreams.java:259)
- locked <0x000000077d42f138> (a org.apache.kafka.streams.KafkaStreams$StreamStateListener)
at org.apache.kafka.streams.processor.internals.StreamThread.setState(StreamThread.java:168)
- locked <0x000000077d33c538> (a org.apache.kafka.streams.processor.internals.StreamThread)
at org.apache.kafka.streams.processor.internals.StreamThread.setStateWhenNotInPendingShutdown(StreamThread.java:176)
- locked <0x000000077d33c538> (a org.apache.kafka.streams.processor.internals.StreamThread)
at org.apache.kafka.streams.processor.internals.StreamThread.access$1600(StreamThread.java:70)
at org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsRevoked(StreamThread.java:1321)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:406)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:349)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:296)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1037)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1002)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:531)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:669)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:326)
```
In a nutshell: `KafkaStreams` and `StreamThread` are both
waiting for each other since another intermittent `close`
(eg. from a test) comes along also trying to lock on
`KafkaStreams` :
```sh
"main" #1 prio=5 os_prio=0 tid=0x00007f66d000c800 nid=0x78bb in Object.wait() [0x00007f66d7a15000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
- locked <0x000000077d45a590> (a java.lang.Thread)
at org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:503)
- locked <0x000000077d335760> (a org.apache.kafka.streams.KafkaStreams)
at org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:447)
at org.apache.kafka.streams.KafkaStreamsTest.testCannotStartOnceClosed(KafkaStreamsTest.java:115)
```
=> causing a deadlock.
Fixed this by softer locking on the state change, that guarantees
atomic changes to the state but does not lock on the whole object
(I at least could not find another method that would require more
than atomicly-locked access except for `setState`).
Also qualified the state listeners with their outer-class to make
the whole code-flow around this more readable (having two
interfaces with the same naming for interface and method and then
using them between their two outer classes is crazy hard to read
imo :)).
Easy to reproduced yourself by running
`org.apache.kafka.streams.KafkaStreamsTest` in a loop for a bit
(save yourself some time by running 2-4 in parallel :)). Eventually
it will lock on one of the tests (for me this takes less than 1 min
with 4 parallel runs).
Author: Armin Braun <me@obrown.io>
Author: Armin <me@obrown.io>
Reviewers: Eno Thereska <eno@confluent.io>, Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2791 from original-brownbear/fix-streams-deadlock