This resolves the following issues in the ProcessorTopologyTestDriver:
- It should not create an internal changelog topic when using `through()` and `table()`
- It should forward the produced record back into the topology if it is to a source topic
Jira ticket: https://issues.apache.org/jira/browse/KAFKA-4828
The contribution is my original work and I license the work to the project under the project's open source license.
Author: Hamidreza Afzali <hrafzali@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2629 from hrafzali/KAFKA-4828_ProcessorTopologyTestDriver_through
There were some minor differences in the basic consumer config and streams config that are now rectified. In addition, in AWS environments the socket size makes a big difference to performance and I've tuned it up accordingly. I've also increased the number of records now that perf is higher.
Author: Eno Thereska <eno@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2634 from enothereska/minor-standardize-params
MINOR: Fix ResetIntegrationTest test failures
KAFKA-2857 follow-up.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2636 from vahidhashemian/minor/kafka-2857-followup
This applies to new-consumer based groups and would avoid scenarios in which user issues a `--describe` query while the group is initializing.
Example: The following could occur for a newly created group.
```
kafkakafka:~/workspace/kafka$ bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group g
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
Error: Executing consumer group command failed due to The group coordinator is not available.
```
With this PR the group is queried repeatedly at specific intervals within a preset (and configurable) timeout `group-init-timeout` to circumvent unfortunate situations like above.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2538 from vahidhashemian/KAFKA-2857
If leader node of one more more partitions in a consumer subscription are temporarily unavailable, request metadata refresh so that partitions skipped for assignment dont have to wait for metadata expiry before reassignment. Metadata refresh is also requested if a subscribe topic or assigned partition doesn't exist.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#2622 from rajinisivaram/KAFKA-4631
added \<pre> tags to not break javadoc display of the ASCII diagrams.
see broken ascii here:
https://kafka.apache.org/0102/javadoc/org/apache/kafka/streams/KafkaStreams.State.html
fix can be checked with gradle :streams:javadoc and then checking streams/build/docs/javadoc/org/apache/kafka/streams/KafkaStreams.State.html
I also fixed the diagram in StreamThread.java however currently no javadoc is generated for that one (since it's internal)
enothereska please have a look
Author: Clemens Valiente <clemens.valiente@trivago.com>
Reviewers: Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2621 from cvaliente/KAFKA-4800-ASCII-diagrams
Avoid calling time.milliseconds more often than necessary. Cleaning and committing logic can use the timestamp at the start of the loop with minimal consequences. 5-10% improvements noticed with request rates of 450K records/second.
Also tidy up benchmark code a bit more.
Author: Eno Thereska <eno.thereska@gmail.com>
Author: Eno Thereska <eno@confluent.io>
Reviewers: Matthias J. Sax, Damian Guy, Guozhang Wang
Closes#2603 from enothereska/minor-reduce-milliseconds2
Detect when a rebalance has happened due to one or more existing nodes bouncing. Keep assignment of previous active tasks the same and only assign the tasks that were not active to the new clients.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2609 from dguy/kstreams-575
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ewen Cheslack-Postava <me@ewencp.org>, Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2594 from dguy/checkstyle
This resolves the issue in the ProcessorTopologyTestDriver that the extracted timestamp is not forwarded with the produced record to the internal topics.
JIRA ticket: https://issues.apache.org/jira/browse/KAFKA-4789
The contribution is my original work and I license the work to the project under the project's open source license.
guozhangwang dguy
Author: Hamidreza Afzali <hrafzali@gmail.com>
Reviewers: Damian Guy, Guozhang Wang
Closes#2590 from hrafzali/KAFKA-4789_ProcessorTopologyTestDriver_timestamp
Author: Michael G. Noll <michael@confluent.io>
Reviewers: Damian Guy, Matthias J. Sax, Guozhang Wang
Closes#2607 from miguno/trunk-flatMapValues-docstring
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2303 from mjsax/licenseHeader
Replace one-by-one initialization of state stores with bulk initialization.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#2560 from dguy/kafka-4494
Reduces overheads by avoiding re-calling time.milliseconds().
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2583 from enothereska/minor-reduce-milliseconds
* Turned 3 inner classes that weren't static but could be into `static` ones.
* Turned one `public` inner class that wasn't used publicly into a `private`.
Trivial but imo worthwhile to explicitly keep visibility and "staticness" correct in syntax (if only to be nice to the GC) :)
Author: Armin Braun <me@obrown.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#2574 from original-brownbear/cleanup-inner-nonstatic
…lass should be static
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2558 from cmccabe/KAFKA-4774
Currently the checkpoint file is deleted at state store initialization and it is only ever written again during a clean shutdown. This can result in significant delays during restarts as the entire store needs to be loaded from the changelog.
We can mitigate against this by frequently checkpointing the offsets. The checkpointing happens only during the commit phase, i.e, after we have manually flushed the store and the producer. So we guarantee that the checkpointed offsets are never greater than what has been flushed.
In the event of hard failure we can recover by reading the checkpoints and consuming from the stored offsets.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2471 from dguy/kafka-4317
More details:
* Replaced `struct` field in Request/Response with a `toStruct` method. This
makes the performance model (including memory usage) easier to understand.
Note that requests have `toStruct()` while responses have `toStruct(version)`.
* Replaced mutable `version` field in `Request.Builder` with an immutable
field `desiredVersion` and a `version` parameter passed to the `build` method.
* Optimised `handleFetchRequest` to avoid unnecessary creation of `Struct`
instances (from 4 to 2 in the worst case and 2 to 1 in the best case).
* Various clean-ups in request/response classes and their test. In particular,
it is now clear what we are testing. Previously, it looked like we were testing
more than we really were.
With this in place, we could remove `AbstractRequest.Builder` in the future by
doing the following:
* Change `AbstractRequest.toStruct` to accept a version (like responses).
* Change `AbstractRequest.version` to be `desiredVersion` (like `Builder`).
* Change `ClientRequest` to take `AbstractRequest`.
* Move validation from the `build` methods to the request constructors or
static factory methods.
* Anything else required for the code to compile again.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#2513 from ijuma/separate-struct
Lowered the default RocksDB settings for the block cache and write buffers
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#2525 from dguy/kafka-4484
https://issues.apache.org/jira/browse/KAFKA-4720
Peek is a handy method to have to insert diagnostics that do not affect the stream itself, but some external state such as logging or metrics collection.
Author: Steven Schlansker <sschlansker@opentable.com>
Reviewers: Damian Guy, Matthias J. Sax, Eno Thereska, Guozhang Wang
Closes#2493 from stevenschlansker/kafka-4720-peek
Add a note to `KafkaStreams.metadataForKey(String, K, Serializer<K>)` to point out that in the case of a Window Store the Serializer should still be the record key serializer and not a window serializer
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Xavier Léauté, Guozhang Wang
Closes#2532 from dguy/minor-docs
This PR fixes a blocker issue, where the streams client code cannot talk to the controller. It also enables a system test that was previously failing.
This PR is for trunk only. A separate PR with just the fix (but not the tests) will be created for 0.10.2.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Ismael Juma, Matthias J. Sax, Guozhang Wang
Closes#2522 from enothereska/KAFKA-4716-metadata
Author: Eno Thereska <eno.thereska@gmail.com>
Author: Eno Thereska <eno@confluent.io>
Author: Ubuntu <ubuntu@ip-172-31-22-146.us-west-2.compute.internal>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2478 from enothereska/minor-benchmark-args
Provide test coverage for exception paths in: `schedule()`, `closeTopology()`, and `punctuate()`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2451 from dguy/kafka-4640
Removed readKeyValues() that give UNLIMITED_MESSAGES which will doom to exhaust all wait time, as all its callers actually do provide the expected number of messages.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#2507 from guozhangwang/KHotfix-not-use-limited-num-messages
This resolves an issue in driving tests using the ProcessorTopologyTestDriver when `groupBy()` is invoked downstream of a processor that flags repartitioning.
Ticket: https://issues.apache.org/jira/browse/KAFKA-4461
Discussion: http://search-hadoop.com/m/Kafka/uyzND1wbKeY1Q8nH1
dguy guozhangwang
The contribution is my original work and I license the work to the project under the project's open source license.
Author: Adrian McCague <amccague@gmail.com>
Reviewers: Damian Guy, Guozhang Wang
Closes#2499 from amccague/KAFKA-4461_ProcessorTopologyTestDriver_map_groupbykey
Delay the cleanup of state directories that are not locked and not owned by the current thread such that we only remove the directory if its last modified is < now - cleanupDelayMs.
This also helps to avoid a race between threads on the same instance, where during rebalance, one thread releases the lock on the state directory, and before another thread can take the lock, the cleanup runs and removes the data.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2486 from dguy/KAFKA-4724
This is a refactoring follow-up of https://github.com/apache/kafka/pull/2166. Main refactoring changes:
1. Extract `InMemoryKeyValueStore` out of `InMemoryKeyValueStoreSupplier` and remove its duplicates in test package.
2. Add two abstract classes `AbstractKeyValueIterator` and `AbstractKeyValueStore` to collapse common functional logics.
3. Added specialized `BytesXXStore` to accommodate cases where key value types are Bytes / byte[] so that we can save calling the dummy serdes.
4. Make the key type in `ThreadCache` from byte[] to Bytes, as SessionStore / WindowStore's result serialized bytes are in the form of Bytes anyways, so that we can save unnecessary `Bytes.get()` and `Bytes.wrap(bytes)`.
Each of these should arguably be a separate PR and I apologize for the mess, this is because this branch was extracted from a rather large diff that has multiple refactoring mingled together and dguy and myself have already put lots of efforts to break it down to a few separate PRs, and this is the only left-over work. Such PR won't happen in the future.
Ping dguy enothereska mjsax for reviews
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax, Jun Rao
Closes#2333 from guozhangwang/K3452-followup-state-store-refactor
Exception paths in `register()`, `topic()`, `partition()`, `offset()`, and `timestamp()`, were not covered by any existing tests
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Matthias J. Sax
Closes#2447 from dguy/KAFKA-4646
Add coverage for exception paths in `initialize()`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2452 from dguy/kafka-4649
…tReset
Author: bbejeck <bbejeck@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax, Guozhang Wang
Closes#2464 from bbejeck/KAFKA-4662_improve_topology_builder_test_coverage
Author: Maysam Yabandeh <myabandeh@dropbox.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2474 from ijuma/kafka-4039-deadlock-during-shutdown
The issue of transiently having duplicates is due to the bad design of the left join itself: in order to ignore the partial joined results such as `A:null`, it lets the producer to potentially send twice to source stream one and rely on all the following conditions to be true in order to pass the test:
1. `receiveMessages` happen to have fetched all the produced results and have committed offsets.
2. streams app happen to have completed sending all result data.
3. consumer used in `receiveMessages` will complete getting all messages in a single poll().
If any of the above is not true, the test fails.
Fixed this test to add a filter right after left join to filter out partial joined results. Minor cleanup on integration test utils.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy, Ewen Cheslack-Postava
Closes#2485 from guozhangwang/K3896-duplicate-join-results
the toString method prints the topology, but had no tests making sure it works and/or doesn't cause exceptions
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Eno Thereska, Guozhang Wang
Closes#2444 from dguy/KAFKA-4645
Add a test to ensure a `StreamsException` is thrown when an exception other than `StreamsException` is caught
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2450 from dguy/KAFKA-4647
Most of the exception paths weren't covered. Now they are.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Eno Thereska, Guozhang Wang
Closes#2442 from dguy/KAFKA-4642
Kafka brokers have a config called "offsets.topic.replication.factor" that specify the replication factor for the "__consumer_offsets" topic. The problem is that this config isn't being enforced. If an attempt to create the internal topic is made when there are fewer brokers than "offsets.topic.replication.factor", the topic ends up getting created anyway with the current number of live brokers. The current behavior is pretty surprising when you have clients or tooling running as the cluster is getting setup. Even if your cluster ends up being huge, you'll find out much later that __consumer_offsets was setup with no replication.
The cluster not meeting the "offsets.topic.replication.factor" requirement on the internal topic is another way of saying the cluster isn't fully setup yet.
The right behavior should be for "offsets.topic.replication.factor" to be enforced. Topic creation of the internal topic should fail with GROUP_COORDINATOR_NOT_AVAILABLE until the "offsets.topic.replication.factor" requirement is met. This closely resembles the behavior of regular topic creation when the requested replication factor exceeds the current size of the cluster, as the request fails with error INVALID_REPLICATION_FACTOR.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2177 from onurkaraman/KAFKA-3959
Makes task assignment more sticky by preferring to assign tasks to clients that had previously had the task as active task. If there are no clients with the task previously active, then search for a standby. Finally falling back to the least loaded client.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2429 from dguy/kafka-4677