Currently if the client sends a produce request or a fetch request to a broker which isn't a replica,
we return UNKNOWN_TOPIC_OR_PARTITION. This is a bit surprising to see when the topic actually
exists. It would be better to return NOT_LEADER to avoid confusion. Clients typically handle both errors by refreshing metadata and retrying, so changing this should not cause any change in behavior on the client. This case can be hit following a partition reassignment after the leader is moved and the local replica is deleted.
To validate the current behavior and the fix, I've added integration tests for the fetch and produce APIs.
This PR implements a Scala wrapper library for Kafka Streams. The library is implemented as a project under streams, namely `:streams:streams-scala`. The PR contains the following:
* the library implementation of the wrapper abstractions
* the test suite
* the changes in `build.gradle` to build the library jar
The library has been tested running the tests as follows:
```
$ ./gradlew -Dtest.single=StreamToTableJoinScalaIntegrationTestImplicitSerdes streams:streams-scala:test
$ ./gradlew -Dtest.single=StreamToTableJoinScalaIntegrationTestImplicitSerdesWithAvro streams:streams-scala:test
$ ./gradlew -Dtest.single=WordCountTest streams:streams-scala:test
```
Author: Debasish Ghosh <ghosh.debasish@gmail.com>
Author: Sean Glover <seglo@randonom.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>, John Roesler <john@confluent.io>, Damian Guy <damian@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4756 from debasishg/scala-streams
* unify skipped records metering
* log warnings when things get skipped
* tighten up metrics usage a bit
### Testing strategy:
Unit testing of the metrics and the logs should be sufficient.
Author: John Roesler <john@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4812 from vvcephei/kip-274-streams-skip-metrics
There are a couple minor additions in this PR:
1. Add a new test for window store, to range query upon receiving each record.
2. In the non-windowed state store case, add a get call before the put call.
3. Enable caching by default to be consistent with other Join / Aggregate cases, where caching is enabled by default.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
We need to match to the end of the line to make it work with Java 10 as explained in the expanded comment.
Tested manually for all supported versions:
```shell
echo $(.../jdk1.8.0_152.jdk/Contents/Home/bin/java -version 2>&1 | sed -E -n 's/.* version "([0-9]*).*$/\1/p')
1
echo $(.../jdk-9.0.4.jdk/Contents/Home/bin/java -version 2>&1 | sed -E -n 's/.* version "([0-9]*).*$/\1/p')
9
echo $(.../jdk-10.jdk/Contents/Home/bin/java -version 2>&1 | sed -E -n 's/.* version "([0-9]*).*$/\1/p')
10
echo $(.../jdk-10.0.1.jdk/Contents/Home/bin/java -version 2>&1 | sed -E -n 's/.* version "([0-9]*).*$/\1/p')
10
```
In the AbstractResetIntegrationTest we can have a transient error when setting the time for the test where the new time is less than the original time, for those cases we should catch the exception and re-try setting the time once versus letting the test fail.
For testing, ran the entire streams test suite.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
If users don't create all topics before starting a streams application, they could get unexpected results. For example, sharing a state store between sub-topologies where one input topic is not created ahead time results in log message that that "Partition X is not assigned to any tasks" does not give any clues as to how this could have occurred.
Also, this PR changes the log level from INFO to WARN when metadata does not have partitions for a given topic.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Some anonymous classes of AbstractProcessor didn't initialize their superclass. This will not set up ProcessorContext context at AbstractProcessor.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
Start processing client connections only after completing KafkaServer initialization to ensure that credentials are loaded from ZK into cache before authentications are processed. Acceptors are started earlier so that bound port is known for registering in ZK.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
- adds Streams upgrade tests for 1.1 release
- introduces metadata version 3
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
General cleanup of Streams code, mostly resolving compiler warnings and re-formatting.
The regular testing suite should be sufficient.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
guozhangwang
While TopologyTestDriver works well with stores created from KTable it does not with stores from GlobalKTable.
Moreover, for my testing purposes but I think it can be useful to others, I need to get access to the MockProducer inside TopologyTestDriver.
I have added 4 new tests to TopologyTestDriverTest, two for stores from KTable and two for stores from GlobalKTable.
While I was changing the TopologyTestDriver I've also make it implement java.io.Closeable.
Author: Valentino Proietti <valentino.proietti@kydea.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4823 from Vale68/KAFKA-6742
minor renaming
Quota tests wait for throttle metric to be updated without waiting for requests to complete to avoid waiting for potentially large throttle times. This requires the test to read metric values while a broker may be updating the value, resulting in exception in the test. Since this issue can also occur with JMX metrics reporter, change synchronization on metrics with sensors to use the sensor as lock.
A partially deleted topic can end up with some partitions having no leadership info.
For the partially deleted topic, a new controller should be able to finish the topic deletion
by transitioning the rogue partition's replicas to OfflineReplica state.
This patch adds logic to transition replicas to OfflineReplica state whose partitions have
no leadership info.
Added a new test method to cover the partially deleted topic case.
Reviewers: Jun Rao <junrao@gmail.com>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Implement destroying tasks and workers. This means erasing all record of them on the Coordinator and the Agent.
Workers should be identified by unique 64-bit worker IDs, rather than by the names of the tasks they are implementing. This ensures that when a task is destroyed and re-created with the same task ID, the old workers will be not be treated as part of the new task instance.
Fix some return results from RPCs. In some cases RPCs were returning values that were never used. Attempting to re-create the same task ID with different arguments should fail. Add RequestConflictException to represent HTTP error code 409 (CONFLICT) for this scenario.
If only one worker in a task stops, don't stop all the other workers for that task, unless the worker that stopped had an error.
Reviewers: Anna Povzner <anna@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
SimpleBenchmark:
1.a Do not rely on manual num.records / bytes collection on atomic integers.
1.b Rely on config files for num.threads, bootstrap.servers, etc.
1.c Add parameters for key skewness and value size.
1.d Refactor the tests for loading phase, adding tumbling-windowed count.
1.e For consumer / consumeproduce, collect metrics on consumer instead.
1.f Force stop the test after 3 minutes, this is based on empirical numbers of 10M records.
Other tests: use config for kafka bootstrap servers.
streams_simple_benchmark.py: only use scale 1 for system test, remove yahoo from benchmark tests.
Note that the JMX based metrics is more accurate than the manually collected metrics.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Refactored the return types in consumer group APIs the following way:
```
Map<TopicPartition, KafkaFuture<Void>> DeleteConsumerGroupsResult#deletedGroups()
Map<TopicPartition, KafkaFuture<ConsumerGroupDescription>> DescribeConsumerGroupsResult#describedGroups()
KafkaFuture<Collection<ConsumerGroupListing>> ListConsumerGroupsResult#listings()
KafkaFuture<Map<TopicPartition, OffsetAndMetadata>> ListConsumerGroupOffsetsResult#partitionsToOffsetAndMetadata()
```
* For DeleteConsumerGroupsResult and DescribeConsumerGroupsResult, for each group id we have two round-trips to get the coordinator, and then send the delete / describe request; I leave the potential optimization of batching requests for future work.
* For ListConsumerGroupOffsetsResult, it is a simple single round-trip and hence the whole map is wrapped as a Future.
* ListConsumerGroupsResult, it is the most tricky one: we would only know how many futures we should wait for after the first listNode returns, and hence I constructed the flattened future in the middle wrapped with the underlying map of futures; also added an iterator API to compensate the "fail the whole future if any broker returns error" behavior. The iterator future will throw exception on the failing brokers, while return the consumer for other succeeded brokers.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Jason Gustafson <jason@confluent.io>
Removed timeout from get call that caused the test to fail occasionally, this will instead fall back to the wrapping waitUntilTrue timeout. Also added unnesting of exceptions from ExecutionException that was originally missing and put the retrieved value for lowWatermark in the fail message for better readability in case of test failure.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
This change makes adding a metric to a sensor idempotent.
That is, if the metric is already added to the sensor, the method
returns with success.
The current behavior is that any attempt to register a second metric
with the same name is an error.
Testing strategy: There is a new unit test covering this behavior
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
We had a regression in #4788 which caused the offset commit/fetch/describe APIs to fail if the groupId was empty. This should be allowed for backwards compatibility. Additionally, I have modified DeleteGroups to allow removal of the empty group, which was missed in the initial implementation. I've added a test case to ensure that we do not miss this again in the future.
Reviewers: Ismael Juma <ismael@juma.me.uk>
AsyncProducerTest gets an error about an incorrect mock when the logging
level is turned up. Instead of usIng a mock, just create a real
SyncProducerConfig object, since the object is simple to create.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Move creation of quota callback instance out of KafkaConfig constructor to QuotaFactory.instantiate to avoid creating a callback instance for every KafkaConfig since we create temporary KafkaConfigs during dynamic config updates.
Reviewers: Jun Rao <junrao@gmail.com>