src-kafka

Commit Graph

Author	SHA1	Message	Date
Boyang Chen	77fc498889	KAFKA-8992; Redefine RemoveMembersFromGroup interface on AdminClient (#7478 ) This PR fixes the inconsistency involved in the `removeMembersFromGroup` admin API calls: 1. Fail the `all()` request when there is sub level error (either partition or member) 2. Change getMembers() to members() 3. Hide the actual Errors from user 4. Do not expose generated MemberIdentity type 5. Use more consistent naming for Options and Result types Reviewers: Guozhang Wang <wangguoz@gmail.com>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>	5 years ago
NIkhil Bhatia	43262d358a	KAFKA-9038; Allow creating partitions while a reassignment is in progress (#7582 ) Prior to this patch, partition creation would not be allowed for any topic while a reassignment is in progress. The PR makes this a topic level check. As long as a particular topic is not being reassigned, we allow partitions to be increased. Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>	5 years ago
A. Sophie Blee-Goldman	5987c76153	KAFKA-8972: Need to flush state even on unclean close (#7589 ) In the case of unclean close we still need to make sure all the stores are flushed before closing any. Reviewers: Matthias J. Sax <matthias@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Guozhang Wang	4f682b3c0a	KAFKA-8729: Add upgrade docs for KIP-467 on augmented produce response (#7522 ) Add a paragraph explaining the producer caller's expected behavior change on record validation failure scenarios that are improved by KIP-467. Reviewers: Tu V. Tran <tu@confluent.io>, Jason Gustafson <jason@confluent.io>	5 years ago
Jason Gustafson	1df01d2583	KAFKA-9089; Reassignment should be resilient to unexpected errors (#7562 ) The purpose of this patch is to make the reassignment algorithm simpler and more resilient to unexpected errors. Specifically, it has the following improvements: 1. Remove `ReassignedPartitionContext`. We no longer need to track the previous reassignment through the context and we now use the assignment state as the single source of truth for the target replicas in a reassignment. 2. Remove the intermediate assignment state when overriding a previous reassignment. Instead, an overriding reassignment directly updates the assignment state and shuts down any unneeded replicas. Reassignments are _always_ persisted in Zookeeper before being updated in the controller context. 3. To fix race conditions with concurrent submissions, reassignment completion for a partition always checks for a zk partition reassignment to be removed. This means the controller no longer needs to track the source of the reassignment. 4. Api reassignments explicitly remove reassignment state from zk prior to beginning the new reassignment. This fixes an inconsistency in precedence. Upon controller failover, zookeeper reassignments always take precedence over any active reassignment. So if we do not have the logic to remove the zk reassignment when an api reassignment is triggered, then we can revert to the older zk reassignment. Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jun Rao <junrao@gmail.com>	5 years ago
Stanislav Kozlovski	28ef7f1d6d	MINOR: Re-implement NewPartitionReassignment#of() (#7592 ) Re-implement NewPartitionReassignment#of. It now takes a list rather than a variable-length list of arguments. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Vikas Singh <vikas@confluent.io>	5 years ago
Adam Bellemare	aab14fb843	MINOR: Add documentation for foreign-key joins (KIP-213) (#7535 ) Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Bill Bejeck	c015169aa6	MINOR: Streams upgrade system test cleanup (#7571 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>,	5 years ago
Chris Pettitt	6975f1dfa9	KAFKA-8700: Flaky Test QueryableStateIntegrationTest#queryOnRebalance (#7548 ) This is not guaranteed to actually fix queryOnRebalance, since the failure could never be reproduced locally. I did not bump timeouts because it looks like that has been done in the past for this test without success. Instead this change makes the following improvements: It waits for the application to be in a RUNNING state before proceeding with the test. It waits for the remaining instance to return to RUNNING state within a timeout after rebalance. I observed once that we were able to do the KV queries but the instance was still in REBALANCING, so this should reduce some opportunity for flakiness. The meat of this change: we now iterate over all keys in one shot (vs. one at a time with a timeout) and collect various failures, all of which are reported at the end. This should help us to narrow down the cause of flakiness if it shows up again. Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Nikolay	adb2bdb122	KAFKA-8584: The RPC code generator should support ByteBuffer. (#7342 ) The RPC code generator should support using the ByteBuffer class in addition to byte arrays. By using the ByteBuffer class, we can avoid performing a copy in many situations. Also modify TestByteBufferDataTest to test the new feature. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Jason Gustafson	0971f66ff5	KAFKA-9056; Inbound/outbound byte metrics should reflect incomplete sends/receives (#7551 ) Currently we only record completed sends and receives in the selector metrics. If there is a disconnect in the middle of the respective operation, then it is not counted. The metrics will be more accurate if we take into account partial sends and receives. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com	5 years ago
Konstantine Karantasis	fc0963a236	KAFKA-9078: Fix Connect system test after adding MM2 connector classes MM2 added a few connector classes in Connect's classpath and given that the assertion in the Connect REST system tests need to be adjusted to account for these additions. This fix makes sure that the loaded Connect plugins are a superset of the expected by the test connectors. Testing: The change is straightforward. The fix was tested with local system test runs. Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7578 from kkonstantine/minor-fix-connect-test-after-mm2-classes	5 years ago
Jason Gustafson	4fc649d85c	MINOR: Add toString to PartitionReassignment (#7579 ) This patch adds a `toString()` implementation to `PartitionReassignment`. It also makes the `ListPartitionReassignmentsResult` constructor use default access, which is the standard for the admin client *Result classes. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
José Armando García Sancio	c00bd38ab2	MINOR: Rename brokers to replicas in the reassignment API (#7570 ) Reviewers: Jason Gustafson <jason@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>, Vikas Singh <vikas@confluent.io>, Colin P. McCabe <cmccabe@apache.org>	5 years ago
Stanislav Kozlovski	2efee34b74	MINOR: Check against empty replicas in AlterPartitionReassignments (#7574 ) Do not allow an empty replica set to be passed into the reassignment API. Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@gmail.com>	5 years ago
Guozhang Wang	6d8da96ba8	MINOR: Reset timer when all the buffer is drained and empty (#7573 ) For scenarios where the incoming traffic of all input partitions are small, there's a pitfall that the enforced processing timer is not reset after we have enforce processed ALL records. The fix itself is pretty simple: we just reset the timer when there's no buffered records. Reviewers: Javier Holguera <javier.holguera@gmail.com>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>	5 years ago
Manikumar Reddy	f46eb292f5	MINOR: Add upgrade docs for 2.4.0 release Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7530 from omkreddy/2.4-upgrade-docs	5 years ago
Bill Bejeck	c7a3d1b52d	MINOR:Upgrade guide updates for KIP-479 (#7550 ) Reviewers: A. Sophie Blee-Goldmann <sophie@confluent.io>, John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Bill Bejeck	7bef73d724	MINOR: KIP-307 upgrade guide docs (#7547 ) Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
John Roesler	7bc8c0dcd1	MINOR: don't require key serde in join materialized (#7557 ) Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Guozhang Wang	f90b7e9cb4	KAFKA-8940: Tighten up SmokeTestDriver (#7565 ) After many runs of reproducing the failure (on my local MP5 it takes about 100 - 200 run to get one) I think it is more likely a flaky one and not exposing a real bug in rebalance protocol. What I've observed is that, when the verifying consumer is trying to fetch from the output topics (there are 11 of them), it poll(1sec) each time, and retries 30 times if there's no more data to fetch and stop. It means that if there are no data fetched from the output topics for 30 * 1 = 30 seconds then the verification would stop (potentially too early). And for the failure cases, we observe consistent rebalancing among the closing / newly created clients since the closing is async, i.e. while new clients are added it is possible that closing clients triggered rebalance are not completed yet (note that each instance is configured with 3 threads, and in the worst case there are 6 instances running / pending shutdown at the same time, so a group fo 3 * 6 = 18 members is possible). However, there's still a possible bug that in KIP-429, somehow the rebalance can never stabilize and members keep re-rejoining and hence cause it to fail. We have another unit test that have bumped up to 3 rebalance by @ableegoldman and if that failed again then it may be a better confirmation such bug may exist. So what I've done so far for this test: 1. When closing a client, wait for it to complete closure before moving on to the next iteration and starting a new instance to reduce the rebalance churns. 2. Poll for 5 seconds instead of 1 to wait for longer time: 5 * 30 = 150 seconds, and locally my laptop finished this test in about 50 seconds. 3. Minor debug logging improvement; in fact some of them is to reduce redundant debug logging since it is too long and sometimes hides the key information. Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>	5 years ago
Bill Bejeck	6afe05fe89	MINOR: system test clean up (#7552 ) Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>,	5 years ago
Manikumar Reddy	e20dcffa84	KAFKA-8943: Move SecurityProviderCreator to org.apache.kafka.common.security.auth package (#7564 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
Lee Dongjin	a9b0fc866a	KAFKA-8482: Improve documentation on AdminClient#alterReplicaLogDirs, AlterReplicaLogDirsResult (#7083 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
Mickael Maison	99a4068c5c	KAFKA-7689; Add AlterConsumerGroup/List Offsets to AdminClient [KIP-396] (#7296 ) This patch implements new AdminClient APIs to list offsets and alter consumer group offsets as documented in KIP-396: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Bruno Cadonna	2298c7f84f	KAFKA-8964: Refactor thread-level metrics depending on built-in metrics version (#7474 ) * Made commit-over-tasks sensor and skipped-records sensor optional since they are removed in the latest version * Refactored methods for sensor creation * Adapted unit and integration tests Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Viktor Somogyi	8f639eb79c	KAFKA-9041; Flaky Test LogCleanerIntegrationTest#testIsThreadFailed (#7542 ) Aims to fix the flaky LogCleanerIntegrationTest#testIsThreadFailed by changing how metrics are cleaned. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Guozhang Wang	eac367c76a	MINOR: Add metrics in operations doc for KIP-429 / KIP-467 (#7527 ) Reviewers: Tu V. Tran <tu@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Nikolay	4e094217f7	KAFKA-8455: Add VoidSerde to Serdes (#7485 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
A. Sophie Blee-Goldman	8feb516dd9	MINOR: fix typo in TestInputTopic.getTimestampAndAdvance (#7553 ) Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Dhruvil Shah	317089663c	KAFKA-8962; Use least loaded node for AdminClient#describeTopics (#7421 ) Allow routing of `AdminClient#describeTopics` to any broker in the cluster than just the controller, so that we don't create a hotspot for this API call. `AdminClient#describeTopics` uses the broker's metadata cache which is asynchronously maintained, so routing to brokers other than the controller is not expected to have a significant difference in terms of metadata consistency; all metadata requests are eventually consistent. This patch also fixes a few flaky test failures. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@gmail.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Viktor Somogyi	6c0aed2fbe	KAFKA-8834; Add reassignment metrics and distinguish URPs (KIP-352) (#7361 ) KIP-352 aims to add several new metrics in order to track reassignments much better. We will be able to measure bytes in/out rate and the count of partitions under active reassignment. We also change the semantic of the UnderReplicatedPartitions metric to cater better for reassignment. Currently it reports under-replicated partitions when during reassignment extra partitions are added as part of the process but this PR changes it so it'll always take the original replica set into account when computing the URP metrics. The newly added metrics will be: - kafka.server:type=ReplicaManager,name=ReassigningPartitions - kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesOutPerSec - kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSec The changed URP metric: - kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Matthias J. Sax	9660ecd4ec	MINOR: Fix JavaDoc warning (#7546 ) Reviewers: Bill Bejeck<bbejeck@gmail.com>	5 years ago
John Roesler	fa2c61e23f	KAFKA-9058: Lift queriable and materialized restrictions on FK Join (#7541 ) Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Chris Pettitt	7a87a30f1f	MINOR: Add ability to wait for all instances in an application to be RUNNING (#7500 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Ismael Juma	baaccbb018	MINOR: Upgrade zk to 3.5.6 (#7544 ) It includes an important fix for people running on k8s: * ZOOKEEPER-3320: Leader election port stop listen when hostname unresolvable for some time Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
David Arthur	d9f0be4523	KAFKA-9004; Prevent older clients from fetching from a follower (#7531 ) With KIP-392, we allow consumers to fetch from followers. This capability is enabled when a replica selector has been provided in the configuration. When not in use, the intent is to preserve current behavior of fetching only from leader. The leader epoch is the mechanism that keeps us honest. When there is a leader change, the epoch gets bumped, consumer fetches fail due to the fenced epoch, and we find the new leader. However, for old consumers, there is no similar protection. The leader epoch was not available to clients until recently. If there is a preferred leader election (for example), the old consumer will happily continue fetching from the demoted leader until a periodic metadata fetch causes us to discover the new leader. This does not create any problems from a correctness perspective–fetches are still bound by the high watermark–but it is unexpected and may cause unexpected performance characteristics. This patch fixes this problem by enforcing leader-only fetching for older versions of the fetch request. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Colin Patrick McCabe	3cb8ccf63a	MINOR: AbstractRequestResponse should be an interface (#7513 ) AbstractRequestResponse should be an interface, since it has no concrete elements or implementation. Move AbstractRequestResponse#serialize to RequestUtils#serialize and make it package-private, since it doesn't need to be public. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Will James	2cf7b35d90	KAKFA-8950: Fix KafkaConsumer Fetcher breaking on concurrent disconnect (#7511 ) The KafkaConsumer Fetcher can sometimes get into an invalid state where it believes that there are ongoing fetch requests, but in fact there are none. This may be caused by the heartbeat thread concurrently handling a disconnection event just after the fetcher thread submits a request which would cause the Fetcher to enter an invalid state where it believes it has ongoing requests to the disconnected node but in fact it does not. This is due to a thread safety issue in the Fetcher where it was possible for the ordering of the modifications to the nodesWithPendingFetchRequests to be incorrect - the Fetcher was adding it after the listener had already been invoked, which would mean that pending node never gets removed again. This PR addresses that thread safety issue by ensuring that the pending node is added to the nodesWithPendingFetchRequests before the listener is added to the future, ensuring the finally block is called after the node is added. Reviewers: Tom Lee, Jason Gustafson <jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
Kevin Lu	ed078bd702	KAFKA-8874: Add consumer metrics to observe user poll behavior (KIP-517) https://cwiki.apache.org/confluence/display/KAFKA/KIP-517%3A+Add+consumer+metrics+to+observe+user+poll+behavior Author: Kevin Lu <kelu@paypal.com> Reviewers: Sriharsha Chintalapani <sriharsha@apache.org>, Jason Gustafson <jason@confluent.io> Closes #7395 from KevinLiLu/KIP517-KAFKA8874	5 years ago
John Roesler	18bdcaa5a1	MINOR: log reason for fatal error in locking state dir (#7534 ) Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Antony Stubbs	f324754852	KAFKA-8884: class cast exception improvement (#7309 ) Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Nikolay	00374c3ddf	KAFKA-8104: Consumer cannot rejoin to the group after rebalancing (#7460 ) This PR contains the fix of race condition bug between "consumer thread" and "consumer coordinator heartbeat thread". It reproduces in many production environments. Condition for reproducing: 1. Consumer thread initiates rejoin to the group because of commit timeout. Call of AbstractCoordinator#joinGroupIfNeeded which leads to sendJoinGroupRequest. 2. JoinGroupResponseHandler writes to the AbstractCoordinator.this.generation new generation data and leaves the synchronized section. 3. Heartbeat thread executes mabeLeaveGroup and clears generation data via resetGenerationOnLeaveGroup. 4. Consumer thread executes onJoinComplete(generation.generationId, generation.memberId, generation.protocol, memberAssignment); with the cleared generation data. This leads to the corresponding exception. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
John Roesler	f93c473be1	KAFKA-9000: fix flaky FK join test by using TTD (#7517 ) Migrate this integration test to use TopologyTestDriver instead of running 3 Streams instances. Dropped one test that was attempting to produce specific interleavings. If anything, these should be verified deterministically by unit testing. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Bill Bejeck	b62f2a1123	KAFKA-8496: System test for KIP-429 upgrades and compatibility (#7529 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
A. Sophie Blee-Goldman	78f5da914e	KAFKA-9053: AssignmentInfo#encode hardcodes the LATEST_SUPPORTED_VERSION (#7537 ) Also put in some additional logging that makes sense to add, and proved helpful in debugging this particular issue. Unit tests verifying the encoded supported version were added. This should get cherry-picked back to 2.1 Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
John Roesler	2a54347a56	MINOR: Improve FK Join docs and optimize null-fk case (#7536 ) Fix the formatting and wording of the foreign-key join javadoc Optimize handling of null extracted foreign keys Reviewers: Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
Greg Harris	ff68b60429	KAFKA-8340, KAFKA-8819: Use PluginClassLoader while statically initializing plugins (#7315 ) Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed. KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues. Author: Greg Harris <gregh@confluent.io> Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>	5 years ago
A. Sophie Blee-Goldman	3e30bf5439	Explain the separate upgrade paths for consumer groups and Streams (#7516 ) Document the upgrade path for the consumer and for Streams (note that they differ significantly). Needs to be cherry-picked to 2.4 Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Matthias J. Sax	1d6d3709d6	MINOR: Update Kafka Streams upgrade docs for KIP-444, KIP-470, KIP-471, KIP-474, KIP-528 (#7515 ) Reviewers: Jukka Karvanen <jukka.karvanen@jukinimi.com>, Guozhang Wang <guozhang@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>	5 years ago

... 4 5 6 7 8 ...

7005 Commits (eb8e2a8e3b3e2e7f5c097593faf2c651f92f2caf) All Branches Search

7005 Commits (eb8e2a8e3b3e2e7f5c097593faf2c651f92f2caf)

All Branches