Handle InvalidTypeIdException as NOT_IMPLEMENTED and add unit tests for all exceptions.
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
We are closing the metricGroups created in a Worker, Source task and Sink task before populating them with new metrics. This helps in cases where an Exception is thrown when previously created groups were not cleaned up correctly.
Signed-off-by: Arjun Satish <arjunconfluent.io>
Author: Arjun Satish <arjun@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4397 from wicknicks/KAFKA-6252
When the source file of `FileStreamSource` is a large file, `FileStreamSourceTask.poll()` will result in OOM. This pull request added `batch.size` parameter which can restrict the poll size.
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: Study <ph.study@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4356 from phstudy/KAFKA-4335
- Add missing locking/volatile in MetadataCache.aliveEndPoint
- Fix topic metadata not to throw BrokerNotAvailableException
when listeners are inconsistent. Add test verifying the fix. As
part of this fix, renamed Broker methods to follow Map
convention where the `get` version returns `Option`.
Reviewers: Jason Gustafson <jason@confluent.io>
LogContext to have two different implementations of Logger. One will be picked based on availability of LocationAwareLogger API.
Reviewers: Jason Gustafson <jason@confluent.io>
- Fix zk session state and session change rate metric names: type
should be SessionExpireListener instead of KafkaHealthCheck. Test
verifying the fix was included.
- Handle missing controller in controlled shutdown in the same way as if
the broker is not registered (i.e. retry after backoff).
- Restructure BrokerInfo to reduce duplication. It now contains a
Broker instance and the JSON serde is done in BrokerIdZNode
since `Broker` does not contain all the fields.
- Remove dead code from `ZooKeeperClient.initialize` and remove
redundant `close` calls.
- Move ACL handling and persistent paths definition from ZkUtils to
ZkData (and call ZkData from ZkUtils).
- Remove ZooKeeperClientWrapper and ZooKeeperClientMetrics from
ZkUtils (avoids metrics clash if third party users create a ZkUtils
instance in the same process as the broker).
- Introduce factory method in KafkaZkClient that creates
ZooKeeperClient and remove metric name defaults from
ZooKeeperClient.
- Fix a few instances where ZooKeeperClient was not closed in tests.
- Update a few TestUtils methods to use KafkaZkClient instead of
ZkUtils.
- Add test verifying SessionState metric.
- Various clean-ups.
Testing: mostly relying on existing tests, but added a couple
of new tests as mentioned above.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Jun Rao <junrao@gmail.com>
Closes#4359 from ijuma/kafka-6320-kafka-health-zk-metrics-follow-up
When a log entry is appended to a Kafka topic using `KafkaLog4jAppender`, the producer.send operation may block waiting for metadata. This can result in deadlocks in a couple of scenarios if a log entry from the producer network thread is also at a log level that results in the entry being appended to a Kafka topic.
1. Producer's network thread will attempt to send data to a Kafka topic and this is unsafe since producer.send may block waiting for metadata, causing a deadlock since the thread will not process the metadata request/response.
2. `KafkaLog4jAppender#append` is invoked while holding the lock of the logger. So the thread waiting for metadata in the initial send will be holding the logger lock. If the producer network thread has.a log entry that needs to be appended, it will attempt to acquire the logger lock and deadlock.
This is a temporary workaround to avoid deadlocks in system tests by setting log level to WARN for `Metadata` in `VerifiableLog4jAppender`. The fix has been verified using the system tests log4j_appender_test.py which started failing when the info-level log entry was introduced.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Satish Duggana <satish.duggana@gmail.com>, tedyu <yuzhihong@gmail.com>
Closes#4375 from rajinisivaram/KAFKA-6415-log4jappender
Increase commit interval to make it less likely that we flush the cache in-between.
To make it fool-proof, only compare the "final" result records if cache is enabled.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4364 from mjsax/kafka-6256-flaky-kstream-ktable-join-with-caching-test
* Removed code duplicate from GlobalProcessorContextImpl and ProcessorContextImpl to parent class AbstractProcessorContext
* Exchanged concrete implementations with interfaces to make code more maintainable
* Refactored major code duplicates in InternalTopologyBuilder
* Formatted function parameters as per code review
Added final to code introduced in this PR
* Added missing finals to putNodeGroupName function
Rearranged parameters for resetTopicsPattern function
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
* ensure topics are created with correct partitions BEFORE building the metadata for our stream tasks
* Added a test case. The test should fail with the old logic, because:
While stream-partition-assignor-test-KSTREAM-MAP-0000000001-repartition is created correctly with four partitions, the StreamPartitionAssignor will only assign three tasks to the topic. Test passes with the new logic.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Ted Yu <yuzhihong@gmail.com>
* Return offset of next record of records left after restore completed
* Changed check for restoring partition to remove the "+1" in the guard condition
Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
* KAFKA-6383: complete shutdown for CREATED StreamThreads
When transitioning StreamThread from CREATED to PENDING_SHUTDOWN
free up resources from the caller, rather than the stream thread,
since in this case the stream thread was never actually started.
Have StreamThread.setState return the old state. If the old state is
CREATED in StreamThread.shutdown() then start the thread so that it
can free the resources owned by the StreamThread.
Add a KafkaStreams test to verify that the producer gets closed even
if KafkaStreams was not started
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
We should remove the map entry from mbeans if it becomes
empty during metric removal.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Satish Duggana <satish.duggana@gmail.com>, Ismael Juma <ismael@juma.me.uk>
1. added functions for KafkaStreams and KafkaClientSupplier.
2. added prefix for admin client in StreamsConfig.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Closes#4338 from guozhangwang/K6150-doc-changes
Utils with static methods should not be instantiated, hence marking the classes `final` and adding a `private` constructor.
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Updates:
- Gradle, gradle plugins and maven artifact updated
- Bug fix updates for ZooKeeper, Jackson, EasyMock and Snappy
Not updated:
- RocksDB as it often causes issues, so better done separately
- args4j as our test coverage is weak and the update was a
feature release
Also fixed scala-reflect version to match scala-library.
Release notes for ZooKeeper 3.4.11:
https://zookeeper.apache.org/doc/r3.4.11/releasenotes.html
A notable fix is improved handling of UnknownHostException:
https://issues.apache.org/jira/browse/ZOOKEEPER-2614
Manually tested that IntelliJ import and build still works.
Relying on existing test suite otherwise.
Reviewers: Jun Rao <junrao@gmail.com>
* Moved metrics in KafkaHealthCheck to ZookeeperClient.
* Converted remaining ZkUtils usage in KafkaServer to ZookeeperClient and removed ZkUtils from KafkaServer.
* Made the re-creation of ZooKeeper during ZK session expiration with infinite retries.
* Added unit tests for all new methods in KafkaZkClient.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
We should start the process only within the `with` block, otherwise the bytes parameter would cause a race condition that result in false alarms of system test failures.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>
Closes#4348 from guozhangwang/KMinor-fix-eos-test
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4322 from mjsax/kafka-6126-remove-topic-check-on-rebalance-2
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#4342 from mjsax/kafka-4263-concurrentAccess
* Implement process stop faults via SIGSTOP / SIGCONT
* Implement RoundTripWorkload, which both sends messages, and confirms that they are received at least once.
* Allow Trogdor tasks to block until other Trogdor tasks are complete.
* Add CreateTopicsWorker, which can be a building block for a lot of tests.
* Simplify how TaskSpec subclasses in ducktape serialize themselves to JSON.
* Implement some fault injection tests in round_trip_workload_test.py
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4323 from cmccabe/KAFKA-5849
* Use KafkaZkClient in ReassignPartitionsCommand
* Use KafkaZkClient in PreferredReplicaLeaderElectionCommand
* Updated test classes to use new methods
* All existing tests should pass
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#4260 from omkreddy/KAFKA-5647-ADMINCOMMANDS
1. Create default internal topic configs in StreamsConfig, especially for repartition topics change the segment size and time to smaller value.
2. Consolidate the default internal topic settings to InternalTopicManager and simplify InternalTopicConfig correspondingly.
3. Add an integration test for purging data.
4. MINOR: change TopologyBuilderException to IllegalStateException in StreamPartitionAssignor (part of https://issues.apache.org/jira/browse/KAFKA-5660).
Here are a few public facing APIs that get added:
1. AbstractConfig#originalsWithPrefix(String prefix, boolean strip): this for simplify the logic of passing admin and topic prefixed configs to consumer properties.
2. KafkaStreams constructor with Time object for convienent mocking in tests.
Will update KIP-204 accordingly if people re-votes these changes.
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#4315 from guozhangwang/K6150-segment-size
It should only depend on slf4j-api (like kafka-clients). The
release tarball still includes log4j and slf4j-log4j12.
Manually verified that there are no duplicate dependencies
in the release tarball and `./gradlew core:dependencies`
looks good.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4297 from ijuma/kafka-6317-kafka-slf4j-api-only
This is a workaround until KIP-91 is merged. We tried increasing the timeout multiple times already but tests are still flaky.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4329 from mjsax/hotfix-system-tests
When consumer uses plaintext and there is remaining data in consumer's buffer, consumer.poll() will read all data available from the socket buffer to consumer buffer. However, if consumer uses ssl and there is remaining data, consumer.poll() may only read 16 KB (the size of SslTransportLayer.appReadBuffer) from socket buffer. This will reduce efficient of consumer.poll() by asking user to call more poll() to get the same amount of data.
Furthermore, we observe that for users who naively sleep a constant time after each consumer.poll(), some partition will lag behind after they switch from plaintext to ssl. Here is the explanation why this can happen.
Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of these partitions are all different and consumer is consuming these 10 partitions. Let's also assume that socket read buffer size is large enough and consume sleeps 1 sec between consumer.poll(). 1 sec is long enough for consumer to receive the FetchResponse back from broker.
When consumer uses plaintext, each consumer.poll() will read all data from the socket buffer and it means 1 MB data is read from each partition.
When consumer uses ssl, each consumer.poll() is likely to find that there is some data available in the memory. In this case consumer only reads 16 KB data from other sockets, particularly the socket for the broker with the large partition. Then the throughput of the large partition will be limited to 16KB/sec.
Arguably user should not sleep 1 sec if its consumer is lagging behind. But on Kafka dev side it is nice to keep the previous behavior and optimize consumer.poll() to read as much data from socket as possible.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
Closes#4248 from lindong28/KAFKA-6258
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4335 from mjsax/minor-improve-KafkaStreams-javadocs
(WIP: this commit isn't ready to be reviewed yet. I was checking the travis-ci build with the configuration changes in my account and opened the PR prematurely against trunk. I will make it consistent with Contribution guidelines once it's well tested.)
https://issues.apache.org/jira/browse/KAFKA-5473
Design:
`zookeeper.connection.retry.timeout.ms` => this determines how long to wait before triggering the shutdown. The default is 60000ms.
Currently the implementation only handles the `handleSessionEstablishmentError` by waiting for the sessionTimeout.
Author: Prasanna Gautam <prasannagautam@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3990 from prasincs/KAFKA-5473
The `--describe` option of ConsumerGroupCommand is expanded, as proposed in [KIP-175](https://cwiki.apache.org/confluence/display/KAFKA/KIP-175%3A+Additional+%27--describe%27+views+for+ConsumerGroupCommand), to support:
* `--describe` or `--describe --offsets`: listing of current group offsets
* `--describe --members` or `--describe --members --verbose`: listing of group members
* `--describe --state`: group status
Example: With a single partition topic `test1` and a double partition topic `test2`, consumers `consumer1` and `consumer11` subscribed to `test`, consumers `consumer2` and `consumer22` and `consumer222` subscribed to `test2`, and all consumers belonging to group `test-group`, this is an output example of the new options above for `test-group`:
```
--describe, or --describe --offsets:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test2 0 0 0 0 consumer2-bad9496d-0889-47ab-98ff-af17d9460382 /127.0.0.1 consumer2
test2 1 0 0 0 consumer22-c45e6ee2-0c7d-44a3-94a8-9627f63fb411 /127.0.0.1 consumer22
test1 0 0 0 0 consumer1-d51b0345-3194-4305-80db-81a68fa6c5bf /127.0.0.1 consumer1
```
```
--describe --members
CONSUMER-ID HOST CLIENT-ID #PARTITIONS
consumer2-bad9496d-0889-47ab-98ff-af17d9460382 /127.0.0.1 consumer2 1
consumer222-ed2108cd-d368-41f1-8514-5b72aa835bcc /127.0.0.1 consumer222 0
consumer11-dc8295d7-8f3f-4438-9b11-7270bab46760 /127.0.0.1 consumer11 0
consumer22-c45e6ee2-0c7d-44a3-94a8-9627f63fb411 /127.0.0.1 consumer22 1
consumer1-d51b0345-3194-4305-80db-81a68fa6c5bf /127.0.0.1 consumer1 1
```
```
--describe --members --verbose
CONSUMER-ID HOST CLIENT-ID #PARTITIONS ASSIGNMENT
consumer2-bad9496d-0889-47ab-98ff-af17d9460382 /127.0.0.1 consumer2 1 test2(0)
consumer222-ed2108cd-d368-41f1-8514-5b72aa835bcc /127.0.0.1 consumer222 0 -
consumer11-dc8295d7-8f3f-4438-9b11-7270bab46760 /127.0.0.1 consumer11 0 -
consumer22-c45e6ee2-0c7d-44a3-94a8-9627f63fb411 /127.0.0.1 consumer22 1 test2(1)
consumer1-d51b0345-3194-4305-80db-81a68fa6c5bf /127.0.0.1 consumer1 1 test1(0)
```
```
--describe --state
COORDINATOR (ID) ASSIGNMENT-STRATEGY STATE #MEMBERS
localhost:9092 (0) range Stable 5
```
Note that this PR also addresses the issue reported in [KAFKA-6158](https://issues.apache.org/jira/browse/KAFKA-6158) by dynamically setting the width of columns `TOPIC`, `CONSUMER-ID`, `HOST`, `CLIENT-ID` and `COORDINATOR (ID)`. This avoid truncation of column values when they go over the current fixed width of these columns.
The code has been restructured to better support testing of individual values and also the console output. Unit tests have been updated and extended to take advantage of this restructuring.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4271 from vahidhashemian/KAFKA-5526
This PR creates and implements the `ProductionExceptionHandler` as described in [KIP-210](https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce).
I've additionally provided a default implementation preserving the existing behavior. I fixed various compile errors in the tests that resulted from my changing of method signatures, and added tests to cover the new behavior.
Author: Matt Farmer <mfarmer@rsglab.com>
Author: Matt Farmer <matt@frmr.me>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#4165 from farmdawgnation/msf/kafka-6086