src-kafka

Commit Graph

Author	SHA1	Message	Date
Stanislav Kozlovski	ea72edebf2	MINOR: Do not override retries for idempotent producers (#8097 ) The KafkaProducer code would set infinite retries (MAX_INT) if the producer was configured with idempotence and no retries were configured by the user. This is superfluous because KIP-91 changed the retry functionality to both be time-based and the default retries config to be MAX_INT. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Boyang Chen	07db26c20f	KAFKA-9417: New Integration Test for KIP-447 (#8000 ) This change mainly have 2 components: 1. extend the existing transactions_test.py to also try out new sendTxnOffsets(groupMetadata) API to make sure we are not introducing any regression or compatibility issue a. We shrink the time window to 10 seconds for the txn timeout scheduler on broker so that we could trigger expiration earlier than later 2. create a completely new system test class called group_mode_transactions_test which is more complicated than the existing system test, as we are taking rebalance into consideration and using multiple partitions instead of one. For further breakdown: a. The message count was done on partition level, instead of global as we need to visualize the per partition order throughout the test. For this sake, we extend ConsoleConsumer to print out the data partition as well to help message copier interpret the per partition data. b. The progress count includes the time for completing the pending txn offset expiration c. More visibility and feature improvements on TransactionMessageCopier to better work under either standalone or group mode. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Matthias J. Sax	aa0d0ec32a	KAFKA-6607: Commit correct offsets for transactional input data (#8091 ) Reviewers: Guozhang Wang <guozhang@confluent.io>	5 years ago
Jason Gustafson	0a5dec0b3a	MINOR: Fix unnecessary metadata fetch before group assignment (#8095 ) The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to https://github.com/apache/kafka/pull/7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic. Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
Guozhang Wang	e70e5d913a	KAFKA-9505: Only loop over topics-to-validate in retries (#8039 ) Found this bug from the repeated flaky runs of system tests, it seems to be long lurking but also would only happen if there are frequent rebalances / topic creation within a short time, which is exactly the case in some of our smoke system tests. Also added a unit test. Reviewers: Boyang Chen <boyang@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Manikumar Reddy	41fdae35df	MINOR: Update schema field names in DescribeAcls Request/Response Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin Patrick McCabe <cmccabe@apache.org> Closes #8075 from omkreddy/KAFKA-9026-Fix	5 years ago
Sönke Liebau	3b1c61385b	KAFKA-9423: Refine layout of configuration options on website and make individual settings directly linkable (#7955 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	5 years ago
Brian Byrne	0f8698a329	KAFKA-8904: Improve producer's topic metadata fetching. (#7781 ) When the producer encouteres new topic(s), it now only fetches the metadata for the new topics. For cases where a producer interacts with a lot of topics, this reduces the cost for the topic being evicted from the cache, and during startup when populating the topic cache. Additionally adds a new producer configuration variable 'metadata.max.idle.ms', which controls how long topic metadata may be idle (i.e. not produced to) before it's finally discarded from the metadata cache. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, dengziming <dengziming1993@gmail.com>	5 years ago
Ron Dagostino	342f13a838	KAFKA-8843: KIP-515: Zookeeper TLS support Signed-off-by: Ron Dagostino <rdagostinoconfluent.io> Author: Ron Dagostino <rdagostino@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com> Closes #8003 from rondagostino/KAFKA-8843	5 years ago
Kun Song	87eaa5396d	MINOR: Simplify KafkaProducerTest (#8044 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ron Dagostino <rndgstn@gmail.com>	5 years ago
David Mao	7a2a198d1e	KAFKA-9507; AdminClient should check for missing committed offsets (#8057 ) Addresses exception being thrown by `AdminClient` when `listConsumerGroupOffsets` returns a negative offset. A negative offset indicates the absence of a committed offset for a requested partition, and should result in a null in the returned offset map. Reviewers: Anna Povzner <anna@confluent.io>, Jason Gustafson <jason@confluent.io>	5 years ago
Joel Hamill	83e1a8d71c	DOCS - clarify transactionalID and idempotent behavior (#7821 ) If transactional.id is set without setting enable.idempotence, the producer will set enable.idempotence to true implicitly. The docs should reflect this. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	9d17bf98b6	KAFKA-9447: Add new customized EOS model example (#8031 ) With the improvement of 447, we are now offering developers a better experience on writing their customized EOS apps with group subscription, instead of manual assignments. With the demo, user should be able to get started more quickly on writing their own EOS app, and understand the processing logic much better. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Jason Gustafson	ae0c6e58e5	KAFKA-9261; Client should handle unavailable leader metadata (#7770 ) The client caches metadata fetched from Metadata requests. Previously, each metadata response overwrote all of the metadata from the previous one, so we could rely on the expectation that the broker only returned the leaderId for a partition if it had connection information available. This behavior changed with KIP-320 since having the leader epoch allows the client to filter out partition metadata which is known to be stale. However, because of this, we can no longer rely on the request-level guarantee of leader availability. There is no mechanism similar to the leader epoch to track the staleness of broker metadata, so we still overwrite all of the broker metadata from each response, which means that the partition metadata can get out of sync with the broker metadata in the client's cache. Hence it is no longer safe to validate inside the `Cluster` constructor that each leader has an associated `Node` Fixing this issue was unfortunately not straightforward because the cache was built to maintain references to broker metadata through the `Node` object at the partition level. In order to keep the state consistent, each `Node` reference would need to be updated based on the new broker metadata. Instead of doing that, this patch changes the cache so that it is structured more closely with the Metadata response schema. Broker node information is maintained at the top level in a single collection and cached partition metadata only references the id of the broker. To accommodate this, we have removed `PartitionInfoAndEpoch` and we have altered `MetadataResponse.PartitionMetadata` to eliminate its `Node` references. Note that one of the side benefits of the refactor here is that we virtually eliminate one of the hotspots in Metadata request handling in `MetadataCache.getEndpoints` (which was renamed to `maybeFilterAliveReplicas`). The only reason this was expensive was because we had to build a new collection for the `Node` representations of each of the replica lists. This information was doomed to just get discarded on serialization, so the whole effort was wasteful. Now, we work with the lower level id lists and no copy of the replicas is needed (at least for all versions other than 0). Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>	5 years ago
David Jacot	5db02ead60	MINOR: Fix typos introduced in KIP-559 (#8042 ) A few references to KIP-559 in the schema definitions needed to be fixed. Reviewers: Brajesh Kumar <bristy@users.noreply.github.com>, Ron Dagostino <rdagostino@confluent.io>, Jason Gustafson <jason@confluent.io>	5 years ago
Guozhang Wang	4090f9a2b0	KAFKA-9113: Clean up task management and state management (#7997 ) This PR is collaborated by Guozhang Wang and John Roesler. It is a significant tech debt cleanup on task management and state management, and is broken down by several sub-tasks listed below: Extract embedded clients (producer and consumer) into RecordCollector from StreamTask. guozhangwang#2 guozhangwang#5 Consolidate the standby updating and active restoring logic into ChangelogReader and extract out of StreamThread. guozhangwang#3 guozhangwang#4 Introduce Task state life cycle (created, restoring, running, suspended, closing), and refactor the task operations based on the current state. guozhangwang#6 guozhangwang#7 Consolidate AssignedTasks into TaskManager and simplify the logic of changelog management and task management (since they are already moved in step 2) and 3)). guozhangwang#8 guozhangwang#9 Also simplified the StreamThread logic a bit as the embedded clients / changelog restoration logic has been moved into step 1) and 2). guozhangwang#10 Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>	5 years ago
Colin Patrick McCabe	a16dfe6739	MINOR: fix checkstyle issue in ConsumerConfig.java (#8038 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Alexandra Rodoni	7748fc2fc6	KAFKA-9477 Document RoundRobinAssignor as an option for partition.assignment.strategy (#8007 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	5 years ago
Rajini Sivaram	281ed90cd8	KAFKA-9492; Ignore record errors in ProduceResponse for older versions (#8030 ) Fixes NPE in brokers when processing record errors in produce response for older versions. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Ismael Juma	738e14edb8	KAFKA-9027, KAFKA-9028: Convert create/delete acls requests/response to use generated protocol (#7725 ) Also add support for flexible versions to both protocol types. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Colin Patrick McCabe <cmccabe@apache.org> Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com> Co-authored-by: Jason Gustafson <jason@confluent.io>	5 years ago
David Jacot	96c4ce4803	KAFKA-9437; Make the Kafka Protocol Friendlier with L7 Proxies [KIP-559] (#7994 ) This PR implements the KIP-559: https://cwiki.apache.org/confluence/display/KAFKA/KIP-559%3A+Make+the+Kafka+Protocol+Friendlier+with+L7+Proxies - it adds the Protocol Type and the Protocol Name fields in JoinGroup and SyncGroup API; - it validates that the fields are provided by the client when the new version of the API is used and ensure that they are consistent. it errors out otherwise; - it validates that the fields are consistent in the client and errors out otherwise; - it adds many tests related to the API changes but also extends the testing coverage of the requests/responses themselves. - it standardises the naming in the coordinator. now, `ProtocolType` and `ProtocolName` are used across the board in the coordinator instead of having a mix of protocol type, protocol name, subprotocol, protocol, etc. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Karan Kumar	c8d97c6d51	KAFKA-9375: Add names to all Connect threads (#7901 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ryanne Dolan <ryannedolan@gmail.com>, gcsaba2	5 years ago
Jason Gustafson	4317325fbc	KAFKA-8503; Add default api timeout to AdminClient (KIP-533) (#8011 ) This PR implements `default.api.timeout.ms` as documented by KIP-533. This is a rebased version of #6913 with some additional test cases and small cleanups. Reviewers: David Arthur <mumrah@gmail.com> Co-authored-by: huxi <huxi_2b@hotmail.com>	5 years ago
Edoardo Comar	d37d95b359	KAFKA-8162: IBM JDK Class not found error when handling SASL (#6524 ) Attempt to load multiple IBM classes but fallback on loading the Sun class if the IBM one is not found. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Brian Byrne	57cef765f5	KAFKA-9474: Adds 'float64' to the RPC protocol types (#8012 ) Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Ismael Juma	bd5a1c4d36	KAFKA-4203: Align broker default for max.message.bytes with Java producer default (#4154 ) Also: Improve error message, Add test, Minor code quality fixes Verified that the test fails if the broker default for max message bytes is lower or higher than the currently set value. Reviewers: Andrew Choi <andchoi@linkedin.com>, Viktor Somogyi <viktorsomogyi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
belugabehr	b4d7560b4f	KAFKA-9426: Use switch instead of chained if/else in OffsetsForLeaderEpochClient (#7959 ) Reviewers: Ismael Juma <ismael@juma.uk>	5 years ago
belugabehr	aecd3936a3	KAFKA-9405: Use Map.computeIfAbsent where applicable (#7937 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Mickael Maison	40b35178e8	KAFKA-9026: Use automatic RPC generation in DescribeAcls (#7560 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Nikolay	172409c44b	KAFKA-9460: Enable only TLSv1.2 by default and disable other TLS protocol versions (KIP-553) (#7998 ) Reviewers: Ron Dagostino <rndgstn@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
Ron Dagostino	a3509c0870	MINOR: MiniKdc JVM shutdown hook fix (#7946 ) Also made all shutdown hooks consistent and added tests Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
Rajini Sivaram	a565d1a182	KAFKA-9181; Maintain clean separation between local and group subscriptions in consumer's SubscriptionState (#7941 ) Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Jason Gustafson	df13fc93d0	KAFKA-7737; Use single path in producer for initializing the producerId (#7920 ) Previously the idempotent producer and transactional producer use separate logic when initializing the producerId. This patch consolidates the two paths. We also do some cleanup in `TransactionManagerTest` to eliminate brittle expectations on `Sender`. Reviewers: Bob Barrett <bob.barrett@confluent.io>, Viktor Somogyi <viktorsomogyi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	de90175fc2	KAFKA-9418; Add new sendOffsetsToTransaction API to KafkaProducer (#7952 ) This patch adds a new API to the producer to implement transactional offset commit fencing through the group coordinator as proposed in KIP-447. This PR mainly changes on the Producer end for compatible paths to old `sendOffsetsToTxn(offsets, groupId)` vs new `sendOffsetsToTxn(offsets, groupMetadata)`. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Eduardo Pinto	c8c0ec6ac9	KAFKA-7204: Avoid clearing records for paused partitions on poll of MockConsumer (#7505 ) The previous version of MockConsumer does not allow the clients to test consecutive calls to poll while consuming only from a partial set of partitions due to the fact that it clears all the records after each call. This change makes MockConsumer clearing the records only for the partitions that are not paused (whose records are actually returned by the poll). The remaining paused partitions will retain the records. Unit test added accordingly. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Rajini Sivaram	1e823a6884	KAFKA-9457; Fix flaky test org.apache.kafka.common.network.SelectorTest.testGracefulClose (#7989 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
Brian Byrne	9eface2e73	KAFKA-9449; Adds support for closing the producer's BufferPool. (#7967 ) The producer's BufferPool may block allocations if its memory limit has hit capacity. If the producer is closed, it's possible for the allocation waiters to wait for max.block.ms if progress cannot be made, even when force-closed (immediate), which can cause indefinite blocking if max.block.ms is particularly high. This patch fixes the problem by adding a `close()` method to `BufferPool`, which wakes up any waiters that have pending allocations and throws an exception. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Edoardo Comar	a88703255b	KAFKA-9218: MirrorMaker 2 can fail to create topics (#7745 ) When the scheduled refreshTopicPartitions runs, check existing topics in both source and target clusters in order to compute topic partitions to be created on target. If a temporary failure to create the target topic is encountered (e.g. insufficient number of brokers), on the next refresh the target topic creation will be re-attempted. Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com> Co-authored-by: Mickael Maison <mickael.maison@uk.ibm.com> Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
Rajini Sivaram	dbeaba5d9e	KAFKA-8847; Deprecate and remove usage of supporting classes in kafka.security.auth (#7966 ) Removes references to the old scala Acl classes from kafka.security.auth (Acl, Operation, ResourceType, Resource etc.) and replaces these with the Java API. Only the old SimpleAclAuthorizer, AuthorizerWrapper used to wrap legacy authorizer instances and tests using SimpleAclAuthorizer continue to use the old API. Deprecates the old scala API. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
dengziming	8da12d4d37	[MINOR]: Fix typo in Fetcher comment (#7934 ) Reviewers: Ron Dagostino <rndgstn@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
highluck	1e1110d8f3	MINOR: Remove unnecessary call to `super` in `MetricConfig` constructor (#7975 ) Reviewers: Ron Dagostino <rndgstn@gmail.com>, Jason Gustafson <jason@confluent.io>	5 years ago
belugabehr	6d87c12729	KAFKA-9410; Make groupId Optional in KafkaConsumer (#7943 ) Reviewers: Ron Dagostino <rndgstn@gmail.com>, Jason Gustafson <jason@confluent.io>	5 years ago
Boyang Chen	ed7c071e07	KAFKA-9365: Add server side change to include consumer group information within transaction commit (#7897 ) To be able to correctly fence zombie producer txn commit, we propose to add (member.id, group.instance.id, generation) into the transaction commit protocol to raise the same level of correctness guarantee as consumer commit. Major changes involve: 1. Upgrade transaction commit protocol with (member.id, group.instance.id, generation). The client will fail if the broker is not supporting the new protocol. 2. Refactor group coordinator logic to handle new txn commit errors such as FENCED_INSTANCE_ID, UNKNOWN_MEMBER_ID and ILLEGAL_GENERATION. We loose the check on transaction commit when the member.id is set to empty. This is because the member.id check is an add-on safety for producer commit, and we also need to consider backward compatibility for old producer clients without member.id information. And if producer equips with group.instance.id, then it must provide a valid member.id (not empty definitely), the same as a consumer commit. Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Jason Gustafson	0a2569e2b9	KAFKA-9420; Add flexible version support for converted protocols (#7931 ) This patch bumps the following APIs in order to enable flexible version support: - SaslAuthenticate - SaslHandshake - CreatePartitions - DescribeDelegationToken - ExpireDelegationToken - RenewDelegationToken This change was documented and approved in KIP-482: https://cwiki.apache.org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>	5 years ago
Boyang Chen	6da70f9b95	KAFKA-9346: Consumer back-off logic when fetching pending offsets (#7878 ) Let consumer back-off and retry offset fetch when the specific offset topic has pending commits. The major change lies in the broker side offset fetch logic, where a request configured with flag WaitTransaction to true will be required to back-off when some pending transactional commit is ongoing. This prevents any ongoing transaction being modified by third party, thus guaranteeing the correctness with input partition writer shuffling. Reviewers: Matthias J. Sax <matthias@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
zzccctv	39278d3089	KAFKA-9159: The caller of sendListOffsetRequest need to handle retriable InvalidMetadata (#7665 ) Loop call Consumer.endOffsets Throw TimeoutException: Failed to get offsets by times in 30000ms after a leader change. Reviewers: Steven Lu, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Rajini Sivaram	eb09efa9ac	KAFKA-7639: Read one request at a time from socket to avoid OOM (#5920 ) Prior to this commit, broker's Selector used to read all requests available on the socket when the socket is ready for read. These are queued up as staged receives. This can result in excessive memory usage when socket read buffer size is large. We mute the channel and stop reading any more data until all the staged requests are processed. This behaviour is slightly inconsistent since for the initial read we drain the socket buffer, allowing it to get filled up again, but if data arrives slighly after the initial read, then we dont read from the socket buffer until pending requests are processed. To avoid holding onto requests for longer than required, this commit removes staged receives and reads one request at a time even if more data is available in the socket buffer. This is especially useful for produce requests which may be large and may take long to process. Additional data read from the socket for SSL is limited to the pre-allocated intermediate SSL buffers. Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Boyang Chen	63576dde57	KAFKA-8617: Use automated protocol for `EndTxn` API (#7029 ) Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Guozhang Wang	505e8240cd	KAFKA-8421: Still return data during rebalance (#7312 ) Not wait until updateAssignmentMetadataIfNeeded returns true, but only call it once with 0 timeout. Also do not return empty if in rebalance. Trim the pre-fetched records after long polling since assignment may have been changed. Also need to update SubscriptionState to retain the state in assignFromSubscribed if it already exists (similar to assignFromUser), so that we do not need the transition of INITIALIZING to FETCHING. Unit test: this actually took me the most time :) Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Sophie Blee-Goldman <sophie@confluent.io>, Jason Gustafson <jason@confluent.io>, Richard Yu <yohan.richard.yu@gmail.com>, dengziming <dengziming1993@gmail.com>	5 years ago
Jason Gustafson	dbc9cd638f	KAFKA-9144; Track timestamp from txn markers to prevent early producer expiration (#7687 ) Existing producer state expiration uses timestamps from data records only and not from transaction markers. This can cause premature producer expiration when the coordinator times out a transaction because we drop the state from existing batches. This in turn can allow the coordinator epoch to revert to a previous value, which can lead to validation failures during log recovery. This patch fixes the problem by also leveraging the timestamp from transaction markers. We also change the validation logic so that coordinator epoch is verified only for new marker appends. When replicating from the leader and when recovering the log, we only log a warning if we notice that the coordinator epoch has gone backwards. This allows recovery from previous occurrences of this bug. Finally, this patch fixes one minor issue when loading producer state from the snapshot file. When the only record for a given producer is a control record, the "last offset" field will be set to -1 in the snapshot. We should check for this case when loading to be sure we recover the state consistently. Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>	5 years ago

1 2 3 4 5 ...

1797 Commits (ea72edebf2d484e42a4251c53cd6e383743b5d1a)