There are a couple minor additions in this PR:
1. Add a new test for window store, to range query upon receiving each record.
2. In the non-windowed state store case, add a get call before the put call.
3. Enable caching by default to be consistent with other Join / Aggregate cases, where caching is enabled by default.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
SimpleBenchmark:
1.a Do not rely on manual num.records / bytes collection on atomic integers.
1.b Rely on config files for num.threads, bootstrap.servers, etc.
1.c Add parameters for key skewness and value size.
1.d Refactor the tests for loading phase, adding tumbling-windowed count.
1.e For consumer / consumeproduce, collect metrics on consumer instead.
1.f Force stop the test after 3 minutes, this is based on empirical numbers of 10M records.
Other tests: use config for kafka bootstrap servers.
streams_simple_benchmark.py: only use scale 1 for system test, remove yahoo from benchmark tests.
Note that the JMX based metrics is more accurate than the manually collected metrics.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
1. Use JmxMixin for SimpleBenchmark (will remove the self reporting in #4744), only when loading phase is false (i.e. we are in fact starting the streams app).
2. Reported the full jmx reported metrics in log files, and in the returned data only return the max values: this is because we want to skip the warming up and cooling down periods that will have lower rate numbers, while max represents the actual rate at full speed.
3. Incorporates two other improves to JMXTool: #1241 and #2950
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Rohan Desai <desai.p.rohan@gmail.com>
We are occasionally hitting some timeouts due to processing not finishing. So rather than failing the build for these reasons it would be better to reduce the runtime.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3725 from dguy/fix-system-test
The log_level parameter is used in system tests in kafka.py. However the log4j template accepted that parameter in only one place. This led to a large number of DEBUG lines printed even when the intention was to capture only INFO lines. Led to huge log files. Thanks to ijuma for noticing this.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3247 from enothereska/minor-log4j-template-fix
- Improves streams efficiency by more than 200K requests/second (small 100 byte requests)
- Gets streams efficiency very close to pure consumer (see results in https://jenkins.confluent.io/job/system-test-kafka-branch-builder/746/console)
- Maintains same fairness across tasks
- Schedules all records in the queue in-between poll() calls, not just one per task.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax, Guozhang Wang
Closes#2643 from enothereska/minor-schedule-round-robin
There were some minor differences in the basic consumer config and streams config that are now rectified. In addition, in AWS environments the socket size makes a big difference to performance and I've tuned it up accordingly. I've also increased the number of records now that perf is higher.
Author: Eno Thereska <eno@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2634 from enothereska/minor-standardize-params
This PR fixes a blocker issue, where the streams client code cannot talk to the controller. It also enables a system test that was previously failing.
This PR is for trunk only. A separate PR with just the fix (but not the tests) will be created for 0.10.2.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Ismael Juma, Matthias J. Sax, Guozhang Wang
Closes#2522 from enothereska/KAFKA-4716-metadata
Author: Eno Thereska <eno.thereska@gmail.com>
Author: Eno Thereska <eno@confluent.io>
Author: Ubuntu <ubuntu@ip-172-31-22-146.us-west-2.compute.internal>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2478 from enothereska/minor-benchmark-args
Updates to take advantage of soon-to-be-released ducktape features.
Author: Geoff Anderson <geoff@confluent.io>
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1834 from granders/systest-parallel-friendly
ijuma
As discussed in https://github.com/apache/kafka/pull/1645, this patch removes an extraneous line from several __init__.py files, and a few others as well
Author: Geoff Anderson <geoff@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#1659 from granders/minor-cleanup-init-files
Without this file the benchmark does not run nightly.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Geoff Anderson <geoff@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#1645 from enothereska/hotfix-streams-test
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Geoff Anderson, Guozhang Wang, Ismael Juma
Closes#1621 from enothereska/simple-benchmark-streams-system-tests