Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2745 from hachikuji/add-replication-testcase-for-compression
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2897 from mjsax/KAFKA-4564-follow-up
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma, Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2837 from mjsax/kafka-4564-fail-fast-test-stream-compatibility
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Magnus Edenhill, Eno Thereska, Damian Guy, Guozhang Wang
Closes#2836 from mjsax/minor-broker-comp-test
This PR replaces https://github.com/apache/kafka/pull/2743 (just raising from Confluent repo)
This PR describes the addition of Partition Level Leader Epochs to messages in Kafka as a mechanism for fixing some known issues in the replication protocol. Full details can be found here:
[KIP-101 Reference](https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation)
*The key elements are*:
- Epochs are stamped on messages as they enter the leader.
- Epochs are tracked in both leader and follower in a new checkpoint file.
- A new API allows followers to retrieve the leader's latest offset for a particular epoch.
- The logic for truncating the log, when a replica becomes a follower, has been moved from Partition into the ReplicaFetcherThread
- When partitions are added to the ReplicaFetcherThread they are added in an initialising state. Initialising partitions request leader epochs and then truncate their logs appropriately.
This test provides a good overview of the workflow `EpochDrivenReplicationProtocolAcceptanceTest.shouldFollowLeaderEpochBasicWorkflow()`
The corrupted log use case is covered by the test
`EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards()`
Remaining work: There is a do list here: https://docs.google.com/document/d/1edmMo70MfHEZH9x38OQfTWsHr7UGTvg-NOxeFhOeRew/edit?usp=sharing
Author: Ben Stopford <benstopford@gmail.com>
Author: Jun Rao <junrao@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2808 from benstopford/kip-101-v2
Several fixes for handling broker failures:
- default replication value for internal topics is now 3 in test itself (not in streams code, that will require a KIP.
- streams producer waits for acks from all replicas in test itself (not in streams code, that will require a KIP.
- backoff time for streams client to try again after a failure to contact controller.
- fix bug related to state store locks (this helps in multi-threaded scenarios)
- fix related to catching exceptions property for network errors.
- system test for all the above
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Dan Norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2719 from enothereska/KAFKA-4916-broker-bounce-test
This is from the KIP-98 proposal.
The main points of discussion surround the correctness logic, particularly the Log class where incoming entries are validated and duplicates are dropped, and also the producer error handling to ensure that the semantics are sound from the users point of view.
There is some subtlety in the idempotent producer semantics. This patch only guarantees idempotent production upto the point where an error has to be returned to the user. Once we hit a such a non-recoverable error, we can no longer guarantee message ordering nor idempotence without additional logic at the application level.
In particular, if an application wants guaranteed message order without duplicates, then it needs to do the following in the error callback:
1. Close the producer so that no queued batches are sent. This is important for guaranteeing ordering.
2. Read the tail of the log to inspect the last message committed. This is important for avoiding duplicates.
Author: Apurva Mehta <apurva@confluent.io>
Author: hachikuji <jason@confluent.io>
Author: Apurva Mehta <apurva.1618@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: fpj <fpj@apache.org>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2735 from apurvam/exactly-once-idempotent-producer
See the JIRA for the full details. Essentially the test assertions depend on receiving reliable events from the consumer processes, but this is not generally possible in the presence of a hard failure (i.e. `kill -9`). Until we solve this problem, the hard failure scenarios will be turned off.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2771 from hachikuji/KAFKA-4689
The transient failures make it harder to spot real failures and we can live without what is being tested (adding security to ZK via a rolling upgrade) until KIP-101 lands.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>
Closes#2742 from ijuma/disable-zk-upgrade-test
Improves the reliability of this test by decreasing the lower bound (this is to be expected as throttling takes the first fetch to stabilise and will "over-fetch" for this first request)
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2698 from benstopford/throttling-test-fix
The phase_two security upgrade test verifies upgrading inter-broker and client protocols to the same value as well as different values. The second case currently changes inter-broker protocol without first enabling the protocol, disrupting produce/consume until the whole cluster is updated. This commit changes the test to be a non-disruptive upgrade test that enables protocols first (simulating phase one of upgrade).
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2589 from rajinisivaram/KAFKA-4779
The throttling system test sometimes fail because it takes longer than the current 10 second time out for partitions to get assigned to the consumer.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2567 from apurvam/increase-timeout-for-partitions-assigned
This PR fixes a blocker issue, where the streams client code cannot talk to the controller. It also enables a system test that was previously failing.
This PR is for trunk only. A separate PR with just the fix (but not the tests) will be created for 0.10.2.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Ismael Juma, Matthias J. Sax, Guozhang Wang
Closes#2522 from enothereska/KAFKA-4716-metadata
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#2508 from ewencp/minor-streams-compatibility-trunk-dev-branch
With this change, the consumer will be considered initialized in the
ProduceConsumeValidate tests once its partitions have been assigned.
Author: Apurva Mehta <apurva.1618@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2347 from apurvam/KAFKA-4588-fix-race-between-producer-consumer-start
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Eno Thereska, Guozhang Wang
Closes#2403 from mjsax/addStreamsClientCompatibilityTest
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2390 from cmccabe/KAFKA-4630
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2426 from hachikuji/improve-consumer-test-error-messages
Renames `HoistToStruct` SMT to `HoistField`.
Adds the following SMTs:
`ExtractField`
`MaskField`
`RegexRouter`
`ReplaceField`
`SetSchemaMetadata`
`ValueToKey`
Adds HTML doc generation and updates to `connect.html`.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2374 from shikhar/more-smt
Runs sanity test and one replication test using SASL/SCRAM.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2355 from rajinisivaram/KAFKA-4590
Besides API and runtime changes, this PR also includes 2 data transformations (`InsertField`, `HoistToStruct`) and 1 routing transformation (`TimestampRouter`).
There is some gnarliness in `ConnectorConfig` / `ConfigDef` around creating, parsing and validating a dynamic `ConfigDef`.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2299 from shikhar/smt-2017
Otherwise in this test the sink task goes through the pause/resume cycle with 0 assigned partitions, since the default metadata refresh interval is quite long
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2313 from shikhar/kafka-4575
In reality, we’ll only test older brokers after KAFKA-4462 is fully implemented.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2263 from cmccabe/KAFKA-4508
At present, the test is fragile in the sense that the console consumer
has to start and be initialized before the verifiable producer begins
producing in the produce-consume-validate loop.
If this doesn't happen, the consumer will miss messages at the head of
the log and the test will fail.
At present, the consumer is considered inited once it has a PID. This is
a weak assumption. The plan is to poll appropriate metrics (like
partition assignment), and use those as a proxy for consumer
initialization. That work will be tracked in a separate ticket. For now,
we will disable the tests so that we can get the builds healthy again.
Author: Apurva Mehta <apurva.1618@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2278 from apurvam/KAFKA-4526-throttling-test-failures
I ran it 3 times and it works again.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2257 from enothereska/minor-reenable-smoke-test
Updates to take advantage of soon-to-be-released ducktape features.
Author: Geoff Anderson <geoff@confluent.io>
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1834 from granders/systest-parallel-friendly
This reverts commit e035fc039598127e88f31739458f705290b1fdba for the
following reasons:
1. License files are missing causing local builds to fail during the
rat task (rat is not being run in Jenkins for some reason, filed
KAFKA-4459 for that)
2. It renames a number of system test files when there's a better
way to achieve the goal of running a subset of system tests to stay
under the Travis limit.
3. It adds the gradle wrapper binary even though this was removed
intentionally a while back.
A new PR will be submitted for KAFKA-4345 without the undesired
changes.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2187 from ijuma/kafka-4345-revert
As of now the ducktape tests that we have for kafka are not run for pull request. We can run these test using travis-ci. Here is a sample run:
https://travis-ci.org/raghavgautam/kafka/builds/170574293
Author: Raghav Kumar Gautam <raghav@apache.org>
Reviewers: Sriharsha Chintalapani <harsha@hortonworks.com>
Closes#2064 from raghavgautam/trunk
Added `timeout` and `timeUnit` to `KafkaStreams.close(..)`. Now do close on a thread and `join` that thread with the provided `timeout`.
Changed `state` in `KafkaStreams` to use an enum.
Added system test to ensure we don't deadlock on close when an uncaught exception handler that calls `System.exit(..)` is used and there is also a shutdown hook that calls `KafkaStreams.close(...)`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Eno Thereska, Guozhang Wang
Closes#2097 from dguy/kafka-4366
Update system test method signatures and method calls to use the new consumer by default.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2060 from vahidhashemian/KAFKA-4211
Since #1911 was merged it is hard to externally test a connector transitioning to FAILED state due to an initialization failure, which is what this test was attempting to verify.
The unit test added in #1778 already exercises exception-handling around Connector instantiation.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2131 from shikhar/test_bad_connector_class
In this patch, we test `kafka-reassign-partitions` when throttling is active.
This patch also fixes the following:
1. KafkaService.verify_reassign_partitions did not check whether
partition reassignment actually completed successfully (KAFKA-4204).
This patch works around those shortcomings so that we get the right
signal from this method.
2. ProduceConsumeValidateTest.annotate_missing_messages would call
`pop' on the list of missing messages, causing downstream methods to get
incomplete data. We fix that in this patch as well.
Author: Apurva Mehta <apurva.1618@gmail.com>
Reviewers: Geoff Anderson <geoff@confluent.io>, Ben Stopford <benstopford@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1904 from apurvam/throttling-tests
Fix existing client-id quota test which currently don't configure quota overrides correctly. Add new tests for user and (user, client-id) quota overrides and default quotas.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1860 from rajinisivaram/KAFKA-4055
This PR implements KIP-78:Cluster Identifiers [(link)](https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id#KIP-78:ClusterId-Overview) and includes the following changes:
1. Changes to broker code
- generate cluster id and store it in Zookeeper
- update protocol to add cluster id to metadata request and response
- add ClusterResourceListener interface, ClusterResource class and ClusterMetadataListeners utility class
- send ClusterResource events to the metric reporters
2. Changes to client code
- update Cluster and Metadata code to support cluster id
- update clients for sending ClusterResource events to interceptors, (de)serializers and metric reporters
3. Integration tests for interceptors, (de)serializers and metric reporters for clients and for protocol changes and metric reporters for broker.
4. System tests for upgrading from previous versions.
Author: Sumit Arrawatia <sumit.arrawatia@gmail.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1830 from arrawatia/kip-78