1. added functions for KafkaStreams and KafkaClientSupplier.
2. added prefix for admin client in StreamsConfig.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Closes#4338 from guozhangwang/K6150-doc-changes
Utils with static methods should not be instantiated, hence marking the classes `final` and adding a `private` constructor.
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Updates:
- Gradle, gradle plugins and maven artifact updated
- Bug fix updates for ZooKeeper, Jackson, EasyMock and Snappy
Not updated:
- RocksDB as it often causes issues, so better done separately
- args4j as our test coverage is weak and the update was a
feature release
Also fixed scala-reflect version to match scala-library.
Release notes for ZooKeeper 3.4.11:
https://zookeeper.apache.org/doc/r3.4.11/releasenotes.html
A notable fix is improved handling of UnknownHostException:
https://issues.apache.org/jira/browse/ZOOKEEPER-2614
Manually tested that IntelliJ import and build still works.
Relying on existing test suite otherwise.
Reviewers: Jun Rao <junrao@gmail.com>
* Moved metrics in KafkaHealthCheck to ZookeeperClient.
* Converted remaining ZkUtils usage in KafkaServer to ZookeeperClient and removed ZkUtils from KafkaServer.
* Made the re-creation of ZooKeeper during ZK session expiration with infinite retries.
* Added unit tests for all new methods in KafkaZkClient.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
We should start the process only within the `with` block, otherwise the bytes parameter would cause a race condition that result in false alarms of system test failures.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>
Closes#4348 from guozhangwang/KMinor-fix-eos-test
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4322 from mjsax/kafka-6126-remove-topic-check-on-rebalance-2
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#4342 from mjsax/kafka-4263-concurrentAccess
* Implement process stop faults via SIGSTOP / SIGCONT
* Implement RoundTripWorkload, which both sends messages, and confirms that they are received at least once.
* Allow Trogdor tasks to block until other Trogdor tasks are complete.
* Add CreateTopicsWorker, which can be a building block for a lot of tests.
* Simplify how TaskSpec subclasses in ducktape serialize themselves to JSON.
* Implement some fault injection tests in round_trip_workload_test.py
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4323 from cmccabe/KAFKA-5849
* Use KafkaZkClient in ReassignPartitionsCommand
* Use KafkaZkClient in PreferredReplicaLeaderElectionCommand
* Updated test classes to use new methods
* All existing tests should pass
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#4260 from omkreddy/KAFKA-5647-ADMINCOMMANDS
1. Create default internal topic configs in StreamsConfig, especially for repartition topics change the segment size and time to smaller value.
2. Consolidate the default internal topic settings to InternalTopicManager and simplify InternalTopicConfig correspondingly.
3. Add an integration test for purging data.
4. MINOR: change TopologyBuilderException to IllegalStateException in StreamPartitionAssignor (part of https://issues.apache.org/jira/browse/KAFKA-5660).
Here are a few public facing APIs that get added:
1. AbstractConfig#originalsWithPrefix(String prefix, boolean strip): this for simplify the logic of passing admin and topic prefixed configs to consumer properties.
2. KafkaStreams constructor with Time object for convienent mocking in tests.
Will update KIP-204 accordingly if people re-votes these changes.
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#4315 from guozhangwang/K6150-segment-size
It should only depend on slf4j-api (like kafka-clients). The
release tarball still includes log4j and slf4j-log4j12.
Manually verified that there are no duplicate dependencies
in the release tarball and `./gradlew core:dependencies`
looks good.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4297 from ijuma/kafka-6317-kafka-slf4j-api-only
This is a workaround until KIP-91 is merged. We tried increasing the timeout multiple times already but tests are still flaky.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4329 from mjsax/hotfix-system-tests
When consumer uses plaintext and there is remaining data in consumer's buffer, consumer.poll() will read all data available from the socket buffer to consumer buffer. However, if consumer uses ssl and there is remaining data, consumer.poll() may only read 16 KB (the size of SslTransportLayer.appReadBuffer) from socket buffer. This will reduce efficient of consumer.poll() by asking user to call more poll() to get the same amount of data.
Furthermore, we observe that for users who naively sleep a constant time after each consumer.poll(), some partition will lag behind after they switch from plaintext to ssl. Here is the explanation why this can happen.
Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of these partitions are all different and consumer is consuming these 10 partitions. Let's also assume that socket read buffer size is large enough and consume sleeps 1 sec between consumer.poll(). 1 sec is long enough for consumer to receive the FetchResponse back from broker.
When consumer uses plaintext, each consumer.poll() will read all data from the socket buffer and it means 1 MB data is read from each partition.
When consumer uses ssl, each consumer.poll() is likely to find that there is some data available in the memory. In this case consumer only reads 16 KB data from other sockets, particularly the socket for the broker with the large partition. Then the throughput of the large partition will be limited to 16KB/sec.
Arguably user should not sleep 1 sec if its consumer is lagging behind. But on Kafka dev side it is nice to keep the previous behavior and optimize consumer.poll() to read as much data from socket as possible.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
Closes#4248 from lindong28/KAFKA-6258
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4335 from mjsax/minor-improve-KafkaStreams-javadocs
(WIP: this commit isn't ready to be reviewed yet. I was checking the travis-ci build with the configuration changes in my account and opened the PR prematurely against trunk. I will make it consistent with Contribution guidelines once it's well tested.)
https://issues.apache.org/jira/browse/KAFKA-5473
Design:
`zookeeper.connection.retry.timeout.ms` => this determines how long to wait before triggering the shutdown. The default is 60000ms.
Currently the implementation only handles the `handleSessionEstablishmentError` by waiting for the sessionTimeout.
Author: Prasanna Gautam <prasannagautam@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3990 from prasincs/KAFKA-5473
The `--describe` option of ConsumerGroupCommand is expanded, as proposed in [KIP-175](https://cwiki.apache.org/confluence/display/KAFKA/KIP-175%3A+Additional+%27--describe%27+views+for+ConsumerGroupCommand), to support:
* `--describe` or `--describe --offsets`: listing of current group offsets
* `--describe --members` or `--describe --members --verbose`: listing of group members
* `--describe --state`: group status
Example: With a single partition topic `test1` and a double partition topic `test2`, consumers `consumer1` and `consumer11` subscribed to `test`, consumers `consumer2` and `consumer22` and `consumer222` subscribed to `test2`, and all consumers belonging to group `test-group`, this is an output example of the new options above for `test-group`:
```
--describe, or --describe --offsets:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test2 0 0 0 0 consumer2-bad9496d-0889-47ab-98ff-af17d9460382 /127.0.0.1 consumer2
test2 1 0 0 0 consumer22-c45e6ee2-0c7d-44a3-94a8-9627f63fb411 /127.0.0.1 consumer22
test1 0 0 0 0 consumer1-d51b0345-3194-4305-80db-81a68fa6c5bf /127.0.0.1 consumer1
```
```
--describe --members
CONSUMER-ID HOST CLIENT-ID #PARTITIONS
consumer2-bad9496d-0889-47ab-98ff-af17d9460382 /127.0.0.1 consumer2 1
consumer222-ed2108cd-d368-41f1-8514-5b72aa835bcc /127.0.0.1 consumer222 0
consumer11-dc8295d7-8f3f-4438-9b11-7270bab46760 /127.0.0.1 consumer11 0
consumer22-c45e6ee2-0c7d-44a3-94a8-9627f63fb411 /127.0.0.1 consumer22 1
consumer1-d51b0345-3194-4305-80db-81a68fa6c5bf /127.0.0.1 consumer1 1
```
```
--describe --members --verbose
CONSUMER-ID HOST CLIENT-ID #PARTITIONS ASSIGNMENT
consumer2-bad9496d-0889-47ab-98ff-af17d9460382 /127.0.0.1 consumer2 1 test2(0)
consumer222-ed2108cd-d368-41f1-8514-5b72aa835bcc /127.0.0.1 consumer222 0 -
consumer11-dc8295d7-8f3f-4438-9b11-7270bab46760 /127.0.0.1 consumer11 0 -
consumer22-c45e6ee2-0c7d-44a3-94a8-9627f63fb411 /127.0.0.1 consumer22 1 test2(1)
consumer1-d51b0345-3194-4305-80db-81a68fa6c5bf /127.0.0.1 consumer1 1 test1(0)
```
```
--describe --state
COORDINATOR (ID) ASSIGNMENT-STRATEGY STATE #MEMBERS
localhost:9092 (0) range Stable 5
```
Note that this PR also addresses the issue reported in [KAFKA-6158](https://issues.apache.org/jira/browse/KAFKA-6158) by dynamically setting the width of columns `TOPIC`, `CONSUMER-ID`, `HOST`, `CLIENT-ID` and `COORDINATOR (ID)`. This avoid truncation of column values when they go over the current fixed width of these columns.
The code has been restructured to better support testing of individual values and also the console output. Unit tests have been updated and extended to take advantage of this restructuring.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4271 from vahidhashemian/KAFKA-5526
This PR creates and implements the `ProductionExceptionHandler` as described in [KIP-210](https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce).
I've additionally provided a default implementation preserving the existing behavior. I fixed various compile errors in the tests that resulted from my changing of method signatures, and added tests to cover the new behavior.
Author: Matt Farmer <mfarmer@rsglab.com>
Author: Matt Farmer <matt@frmr.me>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#4165 from farmdawgnation/msf/kafka-6086
This changes the Struct's equals and hashCode method to use Arrays#deepEquals and Arrays#deepHashCode, respectively. This resolves a problem where two structs with values of type byte[] would not be considered equal even though the byte arrays' contents are equal. By using deepEquals, the byte arrays' contents are compared instead of ther identity.
Since this changes the behavior of the equals method for byte array values, the behavior of hashCode must change alongside it to ensure the methods still fulfill the general contract of "equal objects must have equal hashCodes".
Test rationale:
All existing unit tests for equals were untouched and continue to work. A new test method was added to verify the behavior of equals and hashCode for Struct instances that contain a byte array value. I verify the reflixivity and transitivity of equals as well as the fact that equal Structs have equal hashCodes
and not-equal structs do not have equal hashCodes.
Author: Tobias Gies <tobias.gies@trivago.com>
Author: Tobias Gies <tobias@tobiasgies.de>
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#4293 from tobiasgies/feature/kafka-6308-deepequals
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#4105 from cmccabe/KAFKA-6102
Now that we support re-initializing state stores, we need to clear the segments when the store is closed so that they can be re-opened.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Ted Yu <yuzhihong@gmail.com>
Closes#4324 from dguy/kafka-6360
Fixes a `ConcurrentModificationException` in`AbstractStateManager` that is triggered when a `StateStore` is re-initialized and there are multiple stores in the context.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, GuozhangWang <wangguoz@gmail.com>
Closes#4317 from dguy/kafka-6349
Fix warn log message in RecordCollectorImpl so it prints the exception message rather than `{}`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, GuozhangWang <wangguoz@gmail.com>
Closes#4318 from dguy/minor-logging-record-collector
- Rename `encode` to `legacyEncodeAsString`, we
can remove this when we remove `ZkUtils`.
- Introduce `encodeAsString` that uses Jackson.
- Change `encodeAsBytes` to use Jackson.
- Avoid intermediate string when converting
Broker to json bytes.
The methods that use Jackson only support
Java collections unlike `legacyEncodeAsString`.
Tests were added `encodeAsString` and
`encodeAsBytes`.
Author: umesh chaudhary <umesh9794@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4259 from umesh9794/KAFKA-5631
This is required for ACLs where SSL principals contain
special characters (e.g. comma) that are escaped using
backslash. The strings need to be quoted for JSON to
ensure that the JSON stored in ZK is valid.
Also converted `SslEndToEndAuthorizationTest` to use a
principal with special characters to ensure that this
path is tested.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4303 from rajinisivaram/KAFKA-6319
- Rename `delete()` to `deleteIfExists()` in `LogSegment`, `AbstractIndex`
and `TxnIndex`. Throw exception in case of IO errors for more informative
errors and to make it less likely that errors are ignored, `boolean` is used
for the case where the file does not exist (like `Files.deleteIfExists()`).
- Fix an instance of delete while open (should fix KAFKA-6322 and
KAFKA-6075).
- `LogSegment.deleteIfExists` no longer throws an exception if any of
the files it tries to delete does not exist (fixes KAFKA-6194).
- Remove unnecessary `FileChannel.force(true)` when deleting file.
- Introduce `LogSegment.open()` and use it to improve encapsulation
and reduce duplication.
- Expand functionality of `LogSegment.onBecomeInactiveSegment()`
to reduce duplication and improve encapsulation.
- Use `AbstractIndex.deleteIfExists()` instead of deleting files manually.
- Improve logging when deleting swap files.
- Use CorruptIndexException instead of IllegalArgumentException.
- Simplify `LogCleaner.cleanSegments()` to reduce duplication and
improve encapsulation.
- A few other clean-ups in Log, LogSegment, etc.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Ted Yu <yuzhihong@gmail.com>
Closes#4040 from ijuma/kafka-5829-follow-up
Increase the number of messages produced to make the test more reliable. The test failed in a recent build and also fails intermittently when run locally. Since the producer uses acks=0 and the test stops as soon as a lag is observed, the change shouldn't have a big impact on the time taken to run when lag is observed sooner.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4312 from rajinisivaram/MINOR-replicaverification-test
- set auto.offset.reste to "none" for restore and global consumer
- handle InvalidOffsetException for restore and global consumer
- add corresponding tests
- some minor cleanup
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com, Bill Bejeck <bill@confluent.io>, GuozhangWang <wangguoz@gmail.com>
Closes#4215 from mjsax/kafka-6121-restore-global-consumer-handle-reset
The NetworkClient internally ApiVersion requests to each broker following connection establishment. If this request happens to fail (perhaps due to an incompatible broker), the NetworkClient includes the response in the result of poll(). Applications will generally not be expecting this response which may lead to failed assertions (or in the case of AdminClient, an obscure log message).
I've added test cases which await the ApiVersion request sent by NetworkClient to be in-flight, and then disconnect the connection and verify that the response is not included from poll().
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4280 from hachikuji/KAFKA-6289
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#4242 from mjsax/kafka-4857-admit-client
Recent changes are now directly using the SLF4J API, so we should have a direct dependency.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4296 from rhauch/kafka-6313
Also:
- Fix bug in result type of `createSequentialPersistentPath`
- Remove duplicated code from `ReplicationUtils`
- Move `propagateIsrChanges` from `ReplicationUtils` to `KafkaZkClient`
- Add tests
- Minor clean-ups
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Ted Yu <yuzhihong@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#4261 from ijuma/zk-data-improvements
Measures the latency of each request.
Updated existing `ZkUtils` test to use `KafkaZkClient`
instead.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#4265 from ijuma/kafka-6065-async-zk-metrics
- Ensure that `partitionsBeingReassigned` is fully populated before
`removePartitionFromReassignedPartitions` is invoked. This is
necessary to avoid premature deletion of the `reassign_partitions`
znode.
- Modify and add tests to verify the fixes.
- Add documentation.
- Use `info` log message if assignedReplicas == newReplicas and
remove control flow based on exceptions.
- General logging improvements.
- Simplify `initializePartitionAssignment` by relying on logic already
present in `maybeTriggerPartitionReassignment`.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#4283 from ijuma/kafka-6193-flaky-shouldPerformMultipleReassignmentOperationsOverVariousTopics
* Add maybeThrow method to AsyncResponse
* Update KafkaZkClient to use newly introduced maybeThrow
* Change AsyncResponse from trait to abstract class for
more readable stacktraces (there's no benefit in using a
trait here)
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4266 from omkreddy/KAFKAZKCLEINT_EXCEPTION_CLEANUP
From 0.11 to 1.0, we moved `DescribeClusterOptions timeoutMs(Integer timeoutMs)` from
DescribeClusterOptions to AbstractOptions (similarly for other Options classes). This can
cause code compiled against 0.11.0.x to fail when it is executed with 1.0 kafka-clients jar.
This patch adds back these methods to restore binary compatibility with 0.11.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4257 from lindong28/KAFKA-6174
1. Add the repartition topics information into ProcessorTopology: personally I do not like leaking this information into the topology but it seems not other simple way around.
2. StreamTask: added one more function to expose the consumed offsets from repartition topics only.
3. TaskManager: use the AdminClient to send the gathered offsets to delete only if a) previous call has completed and client intentionally ignore-and-log any errors, or b) no requests have ever called before.
NOTE that this code depends on the assumption that purge is only called right after the commit has succeeded, hence we presume all consumed offsets are committed.
4. MINOR: Added a few more constructor for ProcessorTopology for cleaner unit tests.
5. MINOR: Extracted MockStateStore out of the deprecated class.
6. MINOR: Made a pass over some unit test classes for clean ups.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#4270 from guozhangwang/K6150-purge-repartition-topics
Increase `REQUEST_TIMOUT_MS` to improve a flaky system test until KIP-91 merged
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#4291 from bbejeck/MINOR_increase_request_timeout_for_streams_bounce_test
Updated the System test `stream_broker_compatibility_test.py` to address system test failures as we have removed explicit broker version checking
- Ignore the `0.8.2.2` and `0.9.0.0` tests because the `NetworkClient` only logs `UnsupportedVersionException`s that occur and will continue to retry connecting. Once issue https://issues.apache.org/jira/browse/KAFKA-6297 is addressed, we may re-enable these tests.
- Updated existing tests expected error messages
- Updated Streams code in test for to make sure we fail fast for incompatible brokers
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4286 from bbejeck/MINOR_fix_broker_compatibility_tests
It's rare, but it can happen that the initial FindCoordinator request returns before the first Metadata request. Both authorization errors are fine for this test case.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#4287 from hachikuji/KAFKA-6118