src-kafka

Commit Graph

Author	SHA1	Message	Date
Guozhang Wang	4090f9a2b0	KAFKA-9113: Clean up task management and state management (#7997 ) This PR is collaborated by Guozhang Wang and John Roesler. It is a significant tech debt cleanup on task management and state management, and is broken down by several sub-tasks listed below: Extract embedded clients (producer and consumer) into RecordCollector from StreamTask. guozhangwang#2 guozhangwang#5 Consolidate the standby updating and active restoring logic into ChangelogReader and extract out of StreamThread. guozhangwang#3 guozhangwang#4 Introduce Task state life cycle (created, restoring, running, suspended, closing), and refactor the task operations based on the current state. guozhangwang#6 guozhangwang#7 Consolidate AssignedTasks into TaskManager and simplify the logic of changelog management and task management (since they are already moved in step 2) and 3)). guozhangwang#8 guozhangwang#9 Also simplified the StreamThread logic a bit as the embedded clients / changelog restoration logic has been moved into step 1) and 2). guozhangwang#10 Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>	5 years ago
David Arthur	7e776b0462	Bump trunk to 2.6.0-SNAPSHOT (#8026 )	5 years ago
vinoth chandar	71c5729a41	KAFKA-6144: Add KeyQueryMetadata APIs to KafkaStreams (#7960 ) Deprecate existing metadata query APIs in favor of new ones that include standby hosts as well as partition information. Closes: #7960 Implements: KIP-535 Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com> Reviewed-by: John Roesler <vvcephei@apache.org>	5 years ago
Brian Bushree	422bc1f0fa	MINOR: Disable JmxTool in kafkatest console-consumer by default (#7785 ) Do not initialize `JmxTool` by default when running console consumer. In order to support this, we remove `has_partitions_assigned` and its only usage in an assertion inside `ProduceConsumeValidateTest`, which did not seem to contribute much to the validation. Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>	5 years ago
A. Sophie Blee-Goldman	3453e9e2ee	HOTFIX: fix system test race condition (#7836 ) In some system tests a Streams app is started and then prints a message to stdout, which the system test waits for to confirm the node has successfully been brought up. It then greps for certain log messages in a retriable loop. But waiting on the Streams app to start/print to stdout does not mean the log file has been created yet, so the grep may return an error. Although this occurs in a retriable loop it is assumed that grep will not fail, and the result is piped to wc and then blindly converted to an int in the python function, which fails since the error message is a string (throws ValueError) We should catch the ValueError and return a 0 so it can try again rather than immediately crash Reviewers: Bill Bejeck <bbejeck@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Lucas Bradstreet	8fd7cd6a43	MINOR: upgrade system test should check for ISR rejoin on each roll (#7827 ) The upgrade system test correctly rolls by upgrading the broker and leaving the IBP, and then rolling again with the latest IBP version. Unfortunately, this is not sufficient to pick up many problems in our IBP gating as we charge through the rolls and after the second roll all of the brokers will rejoin the ISR and the test will be treated as a success. This test adds two new checks: 1. We wait for the ISR to stabilize for all partitions. This is best practice during rolls, and is enough to tell us if a broker hasn't rejoined after each roll. 2. We check the broker logs for some common protocol errors. This is a fail safe as it's possible for the test to be successful even if some protocols are incompatible and the ISR is rejoined. Reviewers: Nikhil Bhatia <nikhil@confluent.io>, Jason Gustafson <jason@confluent.io>	5 years ago
Bruno Cadonna	1d21cf166a	KAFKA-9305: Add version 2.4 to Streams system tests (#7841 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Manikumar Reddy	b50d925e07	MINOR: Add compatibility tests for 2.4.0 (#7838 ) Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
John Roesler	717ce42a6d	KAFKA-9138: Add system test for relational joins (#7664 ) Add a system test to verify the new foreign-key join introduced in KIP-213 Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Randall Hauch	39d68de393	MINOR: Simplify the timeout logic to handle protocol in Connect distributed system tests (#7806 )	5 years ago
Randall Hauch	ccded348eb	MINOR: Bump system test version from 2.2.1 to 2.2.2 (#7765 ) Author: Randall Hauch <rhauch@gmail.com> Reviewer: Ismael Juma <ismael@confluent.io>	5 years ago
Randall Hauch	9d9139024a	MINOR: Increase the timeout in one of Connect's distributed system tests (#7789 ) Author: Randall Hauch <rhauch@gmail.com> Reviewers: Nigel Liang <nigel@nigelliang.com>, Roesler <john@confluent.io>	5 years ago
David Arthur	91f948763d	Actually run the delete topic command in kafka.py (#7776 ) Reviewed-By: Jason Gustafson <jason@confluent.io>	5 years ago
Jason Gustafson	e057d61050	MINOR: Fix --enable-autocommit flag in verifiable consumer (#7743 ) The --enable-autocommit argument is a flag. It does not take a parameter. This was broken in #7724. Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
David Arthur	b15e05d925	KAFKA-9123 Test a large number of replicas (#7621 ) Two tests using 50k replicas on 8 brokers: * Do a rolling restart with clean shutdown, delete topics * Run produce bench and consumer bench on a subset of topics Reviewed-By: David Jacot <djacot@confluent.io>, Vikas Singh <vikas@confluent.io>, Jason Gustafson <jason@confluent.io>	5 years ago
Jason Gustafson	9d8ab3a1a2	KAFKA-8509; Add downgrade system test (#7724 ) This patch adds a basic downgrade system test. It verifies that producing and consuming continues to work before and after the downgrade. Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>	5 years ago
Bruno Cadonna	2604d39b9f	MINOR: Fix Streams EOS system tests by adding clean-up of state dir (#7693 ) Recently, system tests test_rebalance_[simple\|complex] failed repeatedly with a verfication error. The cause was most probably the missing clean-up of a state directory of one of the processors. A node is cleaned up when a service on that node is started and when a test is torn down. If the clean-up flag clean_node_enabled of a EOS Streams service is unset, the clean-up of the node is skipped. The clean-up flag of processor1 in the EOS tests should stay set before its first start, so that the node is cleaned before the service is started. Afterwards for the multiple restarts of processor1 the cleans-up flag should be unset to re-use the local state. After the multiple restarts are done, the clean-up flag of processor1 should again be set to trigger node clean-up during the test teardown. A dirty node can lead to test failures when tests from Streams EOS tests are scheduled on the same node, because the state store would not start empty since it reads the local state that was not cleaned up. Reviewers: Matthias J. Sax <mjsax@apache.org>, Andrew Choi <andchoi@linkedin.com>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
David Arthur	cca0225390	Fix missing reference in kafka.py (#7715 ) Also fix a default value for a dictionary arg	5 years ago
David Arthur	d04699486d	KAFKA-8981 Add rate limiting to NetworkDegradeSpec (#7446 ) * Add rate limiting to tc * Feedback from PR * Add a sanity test for tc * Add iperf to vagrant scripts * Dynamically determine the network interface * Add some temp code for testing on AWS * Temp: use hostname instead of external IP * Temp: more AWS debugging * More AWS WIP * More AWS temp * Lower latency some * AWS wip * Trying this again now that ping should work * Add cluster decorator to tests * Fix broken import * Fix device name * Fix decorator arg * Remove errant import * Increase timeouts * Fix tbf command, relax assertion on latency test * Fix log line * Final bit of cleanup * Newline * Revert Trogdor retry count * PR feedback * More PR feedback * Feedback from PR * Remove unused argument	5 years ago
Brian Bushree	9fb22868fe	[MINOR] allow additional JVM args in KafkaService (#7297 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>, Vikas Singh <vikas@confluent.io>	5 years ago
Colin Patrick McCabe	7f49674439	KAFKA-8746: Kibosh must handle an empty JSON string from Trogdor (#7155 ) When Trogdor wants to clear all the faults injected to Kibosh, it sends the empty JSON object {}. However, Kibosh expects {"faults":[]} instead. Kibosh should handle the empty JSON object, since that's consistent with how Trogdor handles empty JSON fields in general (if they're empty, they can be omitted). We should also have a test for this. Reviewers: David Arthur <mumrah@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>	5 years ago
John Roesler	cac85601a0	KAFKA-9169: fix standby checkpoint initialization (#7681 ) Instead of caching the checkpoint map during StandbyTask initialization, use the latest checkpoints (which would have been updated during suspend). Reviewers: Bill Bejeck <bill@confluent.io>	5 years ago
Colin P. Mccabe	67fd88050f	KAFKA-8984: Improve tagged fields documentation Author: Colin P. Mccabe <cmccabe@confluent.io> Reviewers: Vikas Singh <vikas@confluent.io>, Jason Gustafson <jason@confluent.io> Closes #7477 from cmccabe/KAFKA-8984	5 years ago
Jason Gustafson	903d66e2f9	KAFKA-9079: Fix reset logic in transactional message copier The consumer's `committed` API does not return an entry in the response map for a requested partition if there is no committed offset. The transactional message copier, which is used in the transaction system test, did not account for this. If the first transaction attempted by the copier was randomly aborted, then we would not seek to the beginning as expected, which means we would fail to copy some of the records. This patch fixes the problem by iterating over the assignment rather than the result of `committed` when resetting offsets. It also adds enables additional logging in the transaction message copier service to make finding problems easier in the future. Author: Jason Gustafson <jason@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7653 from hachikuji/fix-transaction-system-test	5 years ago
Bruno Cadonna	065411aa22	KAFKA-9077: Fix reading of metrics of Streams' SimpleBenchmark (#7610 ) With KIP-444 the metrics definitions are refactored. Thus, Streams' SimpleBenchmark needs to be updated to correctly access the refactored metrics. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
Boyang Chen	465f810730	KAFKA-8972 (2.4 blocker): correctly release lost partitions during consumer.unsubscribe() (#7441 ) Inside onLeavePrepare we would look into the assignment and try to revoke the owned tasks and notify users via RebalanceListener#onPartitionsRevoked, and then clear the assignment. However, the subscription's assignment is already cleared in this.subscriptions.unsubscribe(); which means user's rebalance listener would never be triggered. In other words, from consumer client's pov nothing is owned after unsubscribe, but from the user caller's pov the partitions are not revoked yet. For callers like Kafka Streams which rely on the rebalance listener to maintain their internal state, this leads to inconsistent state management and failure cases. Before KIP-429 this issue is hidden away since every time the consumer re-joins the group later, it would still revoke everything anyways regardless of the passed-in parameters of the rebalance listener; with KIP-429 this is easier to reproduce now. Our fixes are following: • Inside unsubscribe, first do onLeavePrepare / maybeLeaveGroup and then subscription.unsubscribe. This we we are guaranteed that the streams' tasks are all closed as revoked by then. • [Optimization] If the generation is reset due to fatal error from join / hb response etc, then we know that all partitions are lost, and we should not trigger onPartitionRevoked, but instead just onPartitionsLost inside onLeavePrepare. This is because we don't want to commit for lost tracks during rebalance which is doomed to fail as we don't have any generation info. Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Bill Bejeck	c015169aa6	MINOR: Streams upgrade system test cleanup (#7571 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>,	5 years ago
Konstantine Karantasis	fc0963a236	KAFKA-9078: Fix Connect system test after adding MM2 connector classes MM2 added a few connector classes in Connect's classpath and given that the assertion in the Connect REST system tests need to be adjusted to account for these additions. This fix makes sure that the loaded Connect plugins are a superset of the expected by the test connectors. Testing: The change is straightforward. The fix was tested with local system test runs. Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7578 from kkonstantine/minor-fix-connect-test-after-mm2-classes	5 years ago
Bill Bejeck	6afe05fe89	MINOR: system test clean up (#7552 ) Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>,	5 years ago
Bill Bejeck	b62f2a1123	KAFKA-8496: System test for KIP-429 upgrades and compatibility (#7529 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
A. Sophie Blee-Goldman	11a401d51d	delete (#7504 ) This system test was marked @Ignore around a year and a half ago pending the version probing work, but never turned on again. These days, it is made redundant by the suite of system tests in streams_upgrade_test, which cover rolling upgrades (including version probing and metadata change). Reviewers: Boyang Chen <boyang@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
A. Sophie Blee-Goldman	cc6525a746	KAFKA-8743: Flaky Test Repartition{WithMerge}OptimizingIntegrationTest (#7472 ) All four flavors of the repartition/optimization tests have been reported as flaky and failed in one place or another: * RepartitionOptimizingIntegrationTest.shouldSendCorrectRecords_OPTIMIZED * RepartitionOptimizingIntegrationTest.shouldSendCorrectRecords_NO_OPTIMIZATION * RepartitionWithMergeOptimizingIntegrationTest.shouldSendCorrectRecords_OPTIMIZED * RepartitionWithMergeOptimizingIntegrationTest.shouldSendCorrectRecords_NO_OPTIMIZATION They're pretty similar so it makes sense to knock them all out at once. This PR does three things: * Switch to in-memory stores wherever possible * Name all operators and update the Topology accordingly (not really a flaky test fix, but had to update the topology names anyway because of the IM stores so figured might as well) * Port to TopologyTestDriver -- this is the "real" fix, should make a big difference as these repartition tests required multiple roundtrips with the Kafka cluster (while using only the default timeout) Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
A. Sophie Blee-Goldman	f9934e7e93	MINOR: remove unused imports in Streams system tests (#7468 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
A. Sophie Blee-Goldman	d88f1048da	KAFKA-8179: Part 7, cooperative rebalancing in Streams (#7386 ) Key improvements with this PR: * tasks will remain available for IQ during a rebalance (but not during restore) * continue restoring and processing standby tasks during a rebalance * continue processing active tasks during rebalance until the RecordQueue is empty* * only revoked tasks must suspended/closed * StreamsPartitionAssignor tries to return tasks to their previous consumers within a client * but do not try to commit, for now (pending KAFKA-7312) Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Manikumar Reddy	4c2bd567b1	MINOR: Bump version to 2.5.0-SNAPSHOT (#7455 )	5 years ago
A. Sophie Blee-Goldman	3b05dc685b	MINOR: just remove leader on trunk like we did on 2.3 (#7447 ) Small follow-up to trunk PR #7423 While debugging the 2.3 VP PR we realized we should remove the leader-tracking from the VP system test altogether. We'd already merged the corresponding trunk PR so I made a quick new PR for trunk (also fixes a missed version bump in one of the log messages) Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Chris Egerton	791d0d61bf	KAFKA-8804: Secure internal Connect REST endpoints (#7310 ) Implemented KIP-507 to secure the internal Connect REST endpoints that are only for intra-cluster communication. A new V2 of the Connect subprotocol enables this feature, where the leader generates a new session key, shares it with the other workers via the configuration topic, and workers send and validate requests to these internal endpoints using the shared key. Currently the internal `POST /connectors/<connector>/tasks` endpoint is the only one that is secured. This change adds unit tests and makes some small alterations to system tests to target the new `sessioned` Connect subprotocol. A new integration test ensures that the endpoint is actually secured (i.e., requests with missing/invalid signatures are rejected with a 400 BAD RESPONSE status). Author: Chris Egerton <chrise@confluent.io> Reviewed: Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>	5 years ago
A. Sophie Blee-Goldman	8da69936a7	KAFKA-8649: Send latest commonly supported version in assignment (#7423 ) Instead of sending the leader's version and having older members try to blindly upgrade. The only other real change here is that we will also set the VERSION_PROBING error code and return early from onAssignment when we are upgrading our used subscription version (not just downgrading it) since this implies the whole group has finished the rolling upgrade and all members should rejoin with the new subscription version. Also piggy-backing on a fix for a potentially dangerous edge case, where every thread of an instance is assigned the same set of active tasks. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Rajini Sivaram	0d31272b35	KAFKA-8848; Update system tests to use new AclAuthorizer (#7374 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
Randall Hauch	ada35d5ff4	Add recent versions of Kafka to the matrix of ConnectDistributedTest (#7024 ) Reviewers: Arjun Satish <arjun@confluent.io>, Konstantine Karantasis <k.karantasis@gmail.com>	5 years ago
Vikas Singh	312e4db590	MINOR. implement --expose-ports option in ducker-ak (#7269 ) This change adds a command line option to the `ducker-ak up' command to enable exposing ports from docker containers. The exposed ports will be mapped to the ephemeral ports on the host. The option is called `expose-ports' and can take either a single value (like 5005) or a range (like 5005-5009). This port will then exposed from each docker container that ducker-ak sets up. Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@users.noreply.github.com>	5 years ago
vinoth chandar	ffef0871c2	KAFKA-7149 : Reducing streams assignment data size (#7185 ) * Leader instance uses dictionary encoding on the wire to send topic partitions * Topic names (most expensive component) are mapped to an integer using the dictionary * Follower instances receive the dictionary, decode topic names back * Purely an on-the-wire optimization, no in-memory structures changed * Test case added for version 5 AssignmentInfo Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Colin Patrick McCabe	a225347ff2	KAFKA-8840: Fix bug where ClientCompatibilityFeaturesTest fails when running multiple iterations (#7260 ) Fix a bug where ClientCompatibilityFeaturesTest fails when running multiple iterations. Also, fix a typo in tests/docker/Dockerfile. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Vikas Singh	09ad6b84c5	MINOR. Fix 2.3.0 streams systest dockerfile typo (#7272 ) As part of commit 4d1ee26a136997d31dbd6ddca07e09b34c41c77d streams version 2.3.0 test jar was added, but there was a simple typo in the path that specified the version. `ducker-ak up` was failing because of that. Fixed that. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Matthias J. Sax	4d1ee26a13	KAFKA-8594: Add version 2.3 to Streams system tests (#7131 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>	5 years ago
Brian Bushree	6c8f654d5f	MINOR: Upgrade ducktape to 0.7.6 Author: Brian Bushree <bbushree@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #7138 from brianbushree/update-ducktape	5 years ago
David Arthur	ff9e95cb09	MINOR: Add fetch from follower system test (#7166 ) This adds a basic system test that enables rack-aware brokers with the rack-aware replica selector for fetch from followers (KIP-392). The test asserts that the follower was read from at least once and that all the messages that were produced were successfully consumed. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Arjun Satish	794637232c	KAFKA-8774: Regex can be found anywhere in config value (#7197 ) Corrected the AbstractHerder to correctly identify task configs that contain variables for externalized secrets. The original method incorrectly used `matcher.matches()` instead of `matcher.find()`. The former method expects the entire string to match the regex, whereas the second one can find a pattern anywhere within the input string (which fits this use case more correctly). Added unit tests to cover various cases of a config with externalized secrets, and updated system tests to cover case where config value contains additional characters besides secret that requires regex pattern to be found anywhere in the string (as opposed to complete match). Author: Arjun Satish <arjun@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com>	5 years ago
Matthias J. Sax	e9a35fe02e	MINOR: Bump system test version from 2.2.0 to 2.2.1 (#6873 ) Reviewers: Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>	5 years ago
Rajini Sivaram	de8ce78a90	MINOR: Tolerate limited data loss for upgrade tests with old message format (#7102 ) To avoid transient system test failures, tolerate a small amount of data loss due to truncation in upgrade system tests using older message format prior to KIP-101, where data loss was possible. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago

1 2 3 4 5 ...

489 Commits (87eaa5396d991b53ef89e58c37b54b6c1517b554)