Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2745 from hachikuji/add-replication-testcase-for-compression
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2897 from mjsax/KAFKA-4564-follow-up
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma, Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2837 from mjsax/kafka-4564-fail-fast-test-stream-compatibility
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Magnus Edenhill, Eno Thereska, Damian Guy, Guozhang Wang
Closes#2836 from mjsax/minor-broker-comp-test
This PR replaces https://github.com/apache/kafka/pull/2743 (just raising from Confluent repo)
This PR describes the addition of Partition Level Leader Epochs to messages in Kafka as a mechanism for fixing some known issues in the replication protocol. Full details can be found here:
[KIP-101 Reference](https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation)
*The key elements are*:
- Epochs are stamped on messages as they enter the leader.
- Epochs are tracked in both leader and follower in a new checkpoint file.
- A new API allows followers to retrieve the leader's latest offset for a particular epoch.
- The logic for truncating the log, when a replica becomes a follower, has been moved from Partition into the ReplicaFetcherThread
- When partitions are added to the ReplicaFetcherThread they are added in an initialising state. Initialising partitions request leader epochs and then truncate their logs appropriately.
This test provides a good overview of the workflow `EpochDrivenReplicationProtocolAcceptanceTest.shouldFollowLeaderEpochBasicWorkflow()`
The corrupted log use case is covered by the test
`EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards()`
Remaining work: There is a do list here: https://docs.google.com/document/d/1edmMo70MfHEZH9x38OQfTWsHr7UGTvg-NOxeFhOeRew/edit?usp=sharing
Author: Ben Stopford <benstopford@gmail.com>
Author: Jun Rao <junrao@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2808 from benstopford/kip-101-v2
Several fixes for handling broker failures:
- default replication value for internal topics is now 3 in test itself (not in streams code, that will require a KIP.
- streams producer waits for acks from all replicas in test itself (not in streams code, that will require a KIP.
- backoff time for streams client to try again after a failure to contact controller.
- fix bug related to state store locks (this helps in multi-threaded scenarios)
- fix related to catching exceptions property for network errors.
- system test for all the above
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Dan Norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2719 from enothereska/KAFKA-4916-broker-bounce-test
This is from the KIP-98 proposal.
The main points of discussion surround the correctness logic, particularly the Log class where incoming entries are validated and duplicates are dropped, and also the producer error handling to ensure that the semantics are sound from the users point of view.
There is some subtlety in the idempotent producer semantics. This patch only guarantees idempotent production upto the point where an error has to be returned to the user. Once we hit a such a non-recoverable error, we can no longer guarantee message ordering nor idempotence without additional logic at the application level.
In particular, if an application wants guaranteed message order without duplicates, then it needs to do the following in the error callback:
1. Close the producer so that no queued batches are sent. This is important for guaranteeing ordering.
2. Read the tail of the log to inspect the last message committed. This is important for avoiding duplicates.
Author: Apurva Mehta <apurva@confluent.io>
Author: hachikuji <jason@confluent.io>
Author: Apurva Mehta <apurva.1618@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: fpj <fpj@apache.org>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2735 from apurvam/exactly-once-idempotent-producer
See the JIRA for the full details. Essentially the test assertions depend on receiving reliable events from the consumer processes, but this is not generally possible in the presence of a hard failure (i.e. `kill -9`). Until we solve this problem, the hard failure scenarios will be turned off.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2771 from hachikuji/KAFKA-4689
- Improves streams efficiency by more than 200K requests/second (small 100 byte requests)
- Gets streams efficiency very close to pure consumer (see results in https://jenkins.confluent.io/job/system-test-kafka-branch-builder/746/console)
- Maintains same fairness across tasks
- Schedules all records in the queue in-between poll() calls, not just one per task.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax, Guozhang Wang
Closes#2643 from enothereska/minor-schedule-round-robin
The transient failures make it harder to spot real failures and we can live without what is being tested (adding security to ZK via a rolling upgrade) until KIP-101 lands.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>
Closes#2742 from ijuma/disable-zk-upgrade-test
Improves the reliability of this test by decreasing the lower bound (this is to be expected as throttling takes the first fetch to stabilise and will "over-fetch" for this first request)
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2698 from benstopford/throttling-test-fix
ijuma ewencp cmccabe harshach Please review.
Here is a sample run:
https://travis-ci.org/raghavgautam/kafka/builds/191714520
In this run 214 tests were run and 144 tests passed.
I will open separate jiras for fixing failures.
Author: Raghav Kumar Gautam <raghav@apache.org>
Reviewers: Sriharsha Chintalapani <harsha@hortonworks.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2376 from raghavgautam/trunk
There were some minor differences in the basic consumer config and streams config that are now rectified. In addition, in AWS environments the socket size makes a big difference to performance and I've tuned it up accordingly. I've also increased the number of records now that perf is higher.
Author: Eno Thereska <eno@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2634 from enothereska/minor-standardize-params
Fix tests/docker/Dockerfile to put the old Kafka distributions in the
correct spot for tests. Also, run_tests.sh should exit with an error
code if image rebuilding fails, rather than silently falling back to an
older image.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2613 from cmccabe/dockerfix
There won't be a 0.10.3.0.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2628 from ijuma/bump-version-to-0.11.0.0-SNAPSHOT
…0.x and not 0.8
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2602 from cmccabe/KAFKA-4809
The phase_two security upgrade test verifies upgrading inter-broker and client protocols to the same value as well as different values. The second case currently changes inter-broker protocol without first enabling the protocol, disrupting produce/consume until the whole cluster is updated. This commit changes the test to be a non-disruptive upgrade test that enables protocols first (simulating phase one of upgrade).
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2589 from rajinisivaram/KAFKA-4779
The throttling system test sometimes fail because it takes longer than the current 10 second time out for partitions to get assigned to the consumer.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2567 from apurvam/increase-timeout-for-partitions-assigned
This PR fixes a blocker issue, where the streams client code cannot talk to the controller. It also enables a system test that was previously failing.
This PR is for trunk only. A separate PR with just the fix (but not the tests) will be created for 0.10.2.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Ismael Juma, Matthias J. Sax, Guozhang Wang
Closes#2522 from enothereska/KAFKA-4716-metadata
This caused the bounce and smoke tests to fail on trunk.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2524 from enothereska/hotfix-tests
Author: Eno Thereska <eno.thereska@gmail.com>
Author: Eno Thereska <eno@confluent.io>
Author: Ubuntu <ubuntu@ip-172-31-22-146.us-west-2.compute.internal>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2478 from enothereska/minor-benchmark-args
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#2508 from ewencp/minor-streams-compatibility-trunk-dev-branch
With this change, the consumer will be considered initialized in the
ProduceConsumeValidate tests once its partitions have been assigned.
Author: Apurva Mehta <apurva.1618@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2347 from apurvam/KAFKA-4588-fix-race-between-producer-consumer-start
Kafka brokers have a config called "offsets.topic.replication.factor" that specify the replication factor for the "__consumer_offsets" topic. The problem is that this config isn't being enforced. If an attempt to create the internal topic is made when there are fewer brokers than "offsets.topic.replication.factor", the topic ends up getting created anyway with the current number of live brokers. The current behavior is pretty surprising when you have clients or tooling running as the cluster is getting setup. Even if your cluster ends up being huge, you'll find out much later that __consumer_offsets was setup with no replication.
The cluster not meeting the "offsets.topic.replication.factor" requirement on the internal topic is another way of saying the cluster isn't fully setup yet.
The right behavior should be for "offsets.topic.replication.factor" to be enforced. Topic creation of the internal topic should fail with GROUP_COORDINATOR_NOT_AVAILABLE until the "offsets.topic.replication.factor" requirement is met. This closely resembles the behavior of regular topic creation when the requested replication factor exceeds the current size of the cluster, as the request fails with error INVALID_REPLICATION_FACTOR.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2177 from onurkaraman/KAFKA-3959
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Eno Thereska, Guozhang Wang
Closes#2403 from mjsax/addStreamsClientCompatibilityTest
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2390 from cmccabe/KAFKA-4630
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2424 from cmccabe/KAFKA-4688
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2426 from hachikuji/improve-consumer-test-error-messages
Renames `HoistToStruct` SMT to `HoistField`.
Adds the following SMTs:
`ExtractField`
`MaskField`
`RegexRouter`
`ReplaceField`
`SetSchemaMetadata`
`ValueToKey`
Adds HTML doc generation and updates to `connect.html`.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2374 from shikhar/more-smt
Switched console_consumer, verifiable_consumer and verifiable_producer to use new sasl.jaas_config property instead of static JAAS configuration file when used with SASL_PLAINTEXT.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2323 from rajinisivaram/KAFKA-4580
Runs sanity test and one replication test using SASL/SCRAM.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2355 from rajinisivaram/KAFKA-4590
This way, if the ${KAFKA_NUM_CONTAINERS} is changed in docker/run_tests.sh, the json is still valid
Author: Emanuele Cesena <emanuele.cesena@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2370 from 0x0ece/patch-1
Besides API and runtime changes, this PR also includes 2 data transformations (`InsertField`, `HoistToStruct`) and 1 routing transformation (`TimestampRouter`).
There is some gnarliness in `ConnectorConfig` / `ConfigDef` around creating, parsing and validating a dynamic `ConfigDef`.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2299 from shikhar/smt-2017
Otherwise in this test the sink task goes through the pause/resume cycle with 0 assigned partitions, since the default metadata refresh interval is quite long
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2313 from shikhar/kafka-4575
In reality, we’ll only test older brokers after KAFKA-4462 is fully implemented.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2263 from cmccabe/KAFKA-4508