src-kafka

Commit Graph

Author	SHA1	Message	Date
Rajini Sivaram	0b23091066	MINOR: Don't process sasl.kerberos.principal.to.local.rules on client-side (#8362 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	5 years ago
Mickael Maison	3df5464fca	MINOR: Fix a number of warnings in mirror/mirror-client (#8074 ) Reviewers: Ismael Juma <ismael@juma.me.uk>, Ryanne Dolan <ryannedolan@gmail.com>, Andrew Choi <a24choi@edu.uwaterloo.ca>	5 years ago
Ismael Juma	222726d6f9	KAFKA-9373: Reduce shutdown time by avoiding unnecessary loading of indexes (#8346 ) KAFKA-7283 enabled lazy mmap on index files by initializing indices on-demand rather than performing costly disk/memory operations when creating all indices on broker startup. This helped reducing the startup time of brokers. However, segment indices are still created on closing segments, regardless of whether they need to be closed or not. This is a cleaned up version of #7900, which was submitted by @efeg. It eliminates unnecessary disk accesses and memory map operations while deleting, renaming or closing offset and time indexes. In a cluster with 31 brokers, where each broker has 13K to 20K segments, @efeg and team observed up to 2 orders of magnitude faster LogManager shutdown times - i.e. dropping the LogManager shutdown time of each broker from 10s of seconds to 100s of milliseconds. To avoid confusion between `renameTo` and `setFile`, I replaced the latter with the more restricted updateParentDir` (it turns out that's all we need). Reviewers: Jun Rao <junrao@gmail.com>, Andrew Choi <a24choi@edu.uwaterloo.ca> Co-authored-by: Adem Efe Gencer <agencer@linkedin.com> Co-authored-by: Ismael Juma <ismael@juma.me.uk>	5 years ago
Gardner Vickers	8cf781ef01	MINOR: Improve performance of checkpointHighWatermarks, patch 1/2 (#6741 ) This PR works to improve high watermark checkpointing performance. `ReplicaManager.checkpointHighWatermarks()` was found to be a major contributor to GC pressure, especially on Kafka clusters with high partition counts and low throughput. Added a JMH benchmark for `checkpointHighWatermarks` which establishes a performance baseline. The parameterized benchmark was run with 100, 1000 and 2000 topics. Modified `ReplicaManager.checkpointHighWatermarks()` to avoid extra copies and cached the Log parent directory Sting to avoid frequent allocations when calculating `File.getParent()`. A few clean-ups: * Changed all usages of Log.dir.getParent to Log.parentDir and Log.dir.getParentFile to Log.parentDirFile. * Only expose public accessor for `Log.dir` (consistent with `Log.parentDir`) * Removed unused parameters in `Partition.makeLeader`, `Partition.makeFollower` and `Partition.createLogIfNotExists`. Benchmark results: \| Topic Count \| Ops/ms \| MB/sec allocated \| \|-------------\|---------\|------------------\| \| 100 \| + 51% \| - 91% \| \| 1000 \| + 143% \| - 49% \| \| 2000 \| + 149% \| - 50% \| Reviewers: Lucas Bradstreet <lucas@confluent.io>. Ismael Juma <ismael@juma.me.uk> Co-authored-by: Gardner Vickers <gardner@vickers.me> Co-authored-by: Ismael Juma <ismael@juma.me.uk>	5 years ago
Boyang Chen	08759b2531	MINOR: Add synchronization to the protocol name check (#8349 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Rigel Bezerra de Melo	ee0ef09db4	MINOR: Allow topics with `null` leader on MockAdminClient createTopic. (#8345 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	5 years ago
Jim Galasyn	0f48446690	DOCS-3625: Add section to config topic: parameters controlled by Kafka Streams (#8268 ) Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
Nikolay	befd80b38d	KAFKA-9573: Fix JVM options to run early versions of Kafka on the latest JVMs (#8138 ) Startup scripts for the early version of Kafka contain removed JVM options like `-XX:+PrintGCDateStamps` or `-XX:UseParNewGC`. When system tests run on JVM that doesn't support these options we should set up environment variables with correct options. Reviewers: Guozhang Wang <guozhang@confluent.io>, Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk	5 years ago
Boyang Chen	4c9e56b358	KAFKA-9758: Doc changes for KIP-523 and KIP-527 (#8343 ) Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
Jason Gustafson	ef1cd3490a	KAFKA-9752; New member timeout can leave group rebalance stuck (#8339 ) Older versions of the JoinGroup rely on a new member timeout to keep the group from growing indefinitely in the case of client disconnects and retrying. The logic for resetting the heartbeat expiration task following completion of the rebalance failed to account for an implicit expectation that shouldKeepAlive would return false the first time it is invoked when a heartbeat expiration is scheduled. This patch fixes the issue by making heartbeat satisfaction logic explicit. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	5 years ago
A. Sophie Blee-Goldman	e1cbefef60	HOTFIX: fix log message in version probing system test (#8341 ) Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
jiameixie	b5409b9c04	KAFKA-9700: Fix negative estimatedCompressionRatio (#8285 ) There are cases where `currentEstimation` is less than `COMPRESSION_RATIO_IMPROVING_STEP` causing `estimatedCompressionRatio` to be negative. This, in turn, may result in `MESSAGE_TOO_LARGE`. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Chia-Ping Tsai	b8e508c823	KAFKA-9711 The authentication failure caused by SSLEngine#beginHandshake is not properly caught and handled (#8287 ) SSLEngine#beginHandshake is possible to throw authentication failures (for example, no suitable cipher suites) so we ought to catch SSLException and then convert it to SslAuthenticationException so as to process authentication failures correctly. Reviewers: Jun Rao <junrao@gmail.com>	5 years ago
Chia-Ping Tsai	a5c257890e	MINOR: Add method to Herder to control logging of connector configs during validation (#8263 ) * The Herder interface is extended with a default method that allows choosing whether to log all the connector configurations during connector validation or not. * The `PUT /connector-plugins/{connector-type}/config/validate` is modified to stop logging the connector's configurations when a validation request is hitting this endpoint. Validations during creation or reconfiguration of a connector are still logging all the connector configurations at the INFO level, which is useful in general and during troubleshooting in particular. Co-authored-by: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Ismael Juma <github@juma.me.uk>	5 years ago
Bruno Cadonna	1fbddd853d	KAFKA-6145: Add balanced assignment algorithm (#8334 ) This algorithm assigns tasks to clients and tries to - balance the distribution of the partitions of the same input topic over stream threads and clients, i.e., data parallel workload balance - balance the distribution of work over stream threads. The algorithm does not take into account potentially existing states on the client. The assignment is considered balanced when the difference in assigned tasks between the stream thread with the most tasks and the stream thread with the least tasks does not exceed a given balance factor. The algorithm prioritizes balance over stream threads higher than balance over clients. Reviewers: John Roesler <vvcephei@apache.org>	5 years ago
A. Sophie Blee-Goldman	8012f47e2c	MINOR: move some client methods to new ClientUtils (#8328 ) * move to new ClientUtils * checkstyle * fix KafkaStreamsTest * checkstyle again	5 years ago
Bob Barrett	4c96b7deba	KAFKA-9749; Transaction coordinator should treat KAFKA_STORAGE_ERROR as retriable (#8336 ) When handling a WriteTxnResponse, the TransactionMarkerRequestCompletionHandler throws an IllegalStateException when the remote broker responds with a KAFKA_STORAGE_ERROR and does not retry the request. This leaves the transaction state stuck in PendingAbort or PendingCommit, with no way to change that state other than restarting the broker, because both EndTxnRequest and InitProducerIdRequest return CONCURRENT_TRANSACTIONS if the state is PendingAbort or PendingCommit. This patch changes the error handling behavior in TransactionMarkerRequestCompletionHandler to retry KAFKA_STORAGE_ERRORs. This matches the existing client behavior, and makes sense because KAFKA_STORAGE_ERROR causes the host broker to shut down, meaning that the partition being written to will move its leadership to a new, healthy broker. Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>	5 years ago
Guozhang Wang	1b36e11967	MINOR: Restore and global consumers should never have group.instance.id (#8322 ) And hence restore / global consumers should never expect FencedInstanceIdException. When such exception is thrown, it means there's another instance with the same instance.id taken over, and hence we should treat it as fatal and let this instance to close out instead of handling as task-migrated. Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Boyang Chen	cc70614105	KAFKA-9743: Catch commit offset exception to eventually close dirty tasks (#8327 ) Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
belugabehr	99eb0d1006	KAFKA-9407: Return an immutable list from poll in SchemaSourceTask (#7939 ) Simple fix to return in all cases an immutable list from poll in SchemaSourceTask. Co-authored-by: David Mollitor <dmollitor@apache.org> Reviewers: Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <github@juma.me.uk>, Konstantine Karantasis <konstantine@confluent.io>	5 years ago
John Roesler	f538caf138	KAFKA-9742: Fix broken StandbyTaskEOSIntegrationTest (#8330 ) Relax the requirement that tasks' reported offsetSum is less than the endOffsetSum for those tasks. This was surfaced by a test for corrupted tasks, but it can happen with real corrupted tasks. Rather than throw an exception on the leader, we now de-prioritize the corrupted task. Ideally, that instance will not get assigned the task and the stateDirCleaner will make the problem "go away". If it does get assigned the task, then it will detect the corruption and delete the task directory before recovering the entire changelog. Thus, the estimate we provide accurately reflects the amount of lag such a corrupted task would have to recover (the whole log). Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bruno Cadonna <bruno@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	5 years ago
Sönke Liebau	eb7b8295ac	MINOR: Add clarifying "additionally" to streams aggregator docs that is missing in one place. (#8237 ) Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
Tom Bentley	6cfed8ad00	KAFKA-9651: Fix ArithmeticException (÷ by 0) in DefaultStreamPartitioner (#8226 ) In Streams `StreamsMetadataState.getMetadataWithKey`, we should use the inferred max topic partitions passed in directly from the caller than relying on cluster to contain its topic-partition information. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Hossein Torabi	afe26e2992	KAFKA-9563: Fix Kafka Connect documentation around consumer and producer overrides (#8124 ) Kafka Connect main doc required a fix to distinguish between worker level producer and consumer overrides and per-connector level producer and consumer overrides, after the latter were introduced with KIP-458. * [KAFKA-9563] Fix Kafka connect consumer and producer override documentation Co-authored-by: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Konstantine Karantasis <konstantine@confluent.io>	5 years ago
Matthias J. Sax	1ad5f346cb	KAFKA-9451: Enable producer per thread for Streams EOS (#8318 ) - KIP-447 - add new configs to enable producer per thread EOS - updates TaskManager to use single shared producer for eos-beta Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Tom Bentley	8aa1861816	KAFKA-9634: Add note about thread safety in the ConfigProvider interface (#8205 ) In Kafka Connect, a ConfigProvider instance can be used concurrently (e.g. via a PUT request to the `/connector-plugins/{connectorType}/config/validate` REST endpoint), but there is no mention of concurrent usage in the Javadocs of the ConfigProvider interface. It's worth calling out that implementations need to be thread safe. Reviewers: Konstantine Karantasis <konstantine@confluent.io>	5 years ago
nicolasguyomar	f8173c2df5	MINOR: Update Connect error message to point to the correct config validation REST endpoint (#7991 ) When incorrect connector configuration is detected, the returned exception message suggests to check the connector's configuration against the `{connectorType}/config/validate` endpoint. Changing the error message to refer to the exact REST endpoint which is `/connector-plugins/{connectorType}/config/validate` This aligns the exception message with the documentation at: https://kafka.apache.org/documentation/#connect_rest Reviewers: Konstantine Karantasis <konstantine@confluent.io>	5 years ago
Chia-Ping Tsai	4f6907947a	MINOR: fix linking errors in javadoc (#8198 ) This improvement fixes several linking errors to classes and methods from within javadocs. Related to #8291 Reviewers: Konstantine Karantasis <konstantine@confluent.io>	5 years ago
Alaa Zbair	5ccd3cd46d	KAFKA-8842: : Reading/Writing confused in Connect QuickStart Guide In step 7 of the QuickStart guide, "Writing data from the console and writing it back to the console" should be "Reading data from the console and writing it back to the console". Co-authored-by: Alaa Zbair <alaa.zbair@grenoble-inp.org> Reviewer: Konstantine Karantasis <konstantine@confluent.io>	5 years ago
A. Sophie Blee-Goldman	6cf27c9c77	KAFKA-6145: Pt 2.5 Compute overall task lag per client (#8252 ) Once we have encoded the offset sums per task for each client, we can compute the overall lag during assign by fetching the end offsets for all changelog and subtracting. If the listOffsets request fails, we simply return a "completely sticky" assignment, ie all active tasks are given to previous owners regardless of balance. Builds (but does not yet use) the statefulTasksToRankedCandidates map with the ranking: Rank -1: active running task Rank 0: standby or restoring task whose overall lag is within acceptableRecoveryLag Rank 1: tasks whose lag is unknown (eg during version probing) Rank 1+: all other tasks are ranked according to their actual total lag Implements: KIP-441 Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
Lucas Bradstreet	c16938fd96	MINOR: allow retries for unitTest and integrationTest runs (#8323 ) We will currently retry if you run gradle test, but not unitTest or integrationTest, which are used directly in Jenkins. This means that we have not been achieving the expected retry behavior. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Matthias J. Sax	635b5fd47c	KAFKA-9741: Update ConsumerGroupMetadata before calling onPartitionsRevoked() (#8325 ) If partitions are revoked, an application may want to commit the current offsets. Using transactions, committing offsets would be done via the producer passing in the current ConsumerGroupMetadata. If the metadata is not updates before the callback, the call to commitTransaction(...) fails as and old generationId would be used. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	c75dc5e2e0	KAFKA-9701 (fix): Only check protocol name when generation is valid (#8324 ) This bug was incurred by #7994 with a too-strong consistency check. It is because a reset generation operation could be called in between the joinGroupRequest -> joinGroupResponse -> SyncGroupRequest -> SyncGroupResponse sequence of events, if user calls unsubscribe in the middle of consumer#poll(). Proper fix is to avoid the protocol name check when the generation is invalid. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Konstantine Karantasis	406635bcc9	MINOR: Use Exit.exit instead of System.exit in MM2 (#8321 ) Exit.exit needs to be used in code instead of System.exit. Particularly in integration tests using System.exit is disrupting because it exits the jvm process and does not just fail the test correctly. Integration tests override procedures in Exit to protect against such cases. Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Randall Hauch <rhauch@gmail.com>	5 years ago
Boyang Chen	c249ea8e5d	KAFKA-9727: cleanup the state store for standby task dirty close and check null for changelogs (#8307 ) This PR fixes three things: * the state should be closed when standby task is restoring as well * the EOS standby task should also wipe out state under dirty close * the changelog reader should check for null as well Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Bruno Cadonna	cbdd0d6cf1	KAFKA-6145: Add constrained balanced assignment algorithm (#8262 ) Adds a currently unused component of the new Streams assignment algorithm. Implements: KIP-441 Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
John Roesler	960b216290	KAFKA-9734: Fix IllegalState in Streams transit to standby (#8319 ) Consolidate ChangelogReader state management inside of StreamThread to avoid having to reason about all execution paths in both StreamThread and TaskManager. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Agam Brahma	24d05aa601	KAFKA-9553; Improve measurement for loading groups and transactions (#8155 ) This patch modifies the loading time metric to account for time spent waiting for the loading time task to be scheduled. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Colin Patrick McCabe	56051e7639	KAFKA-8820: kafka-reassign-partitions.sh should support the KIP-455 API (#8244 ) Rewrite ReassignPartitionsCommand to use the KIP-455 API when possible, rather than direct communication with ZooKeeper. Direct ZK access is still supported, but deprecated, as described in KIP-455. As specified in KIP-455, the tool has several new flags. --cancel stops an assignment which is in progress. --preserve-throttle causes the --verify and --cancel commands to leave the throttles alone. --additional allows users to execute another partition assignment even if there is already one in progress. Finally, --show displays all of the current partition reassignments. Reorganize the reassignment code and tests somewhat to rely more on unit testing using the MockAdminClient and less on integration testing. Each integration test where we bring up a cluster seems to take about 5 seconds, so it's good when we can get similar coverage from unit tests. To enable this, MockAdminClient now supports incrementalAlterConfigs, alterReplicaLogDirs, describeReplicaLogDirs, and some other APIs. MockAdminClient is also now thread-safe, to match the real AdminClient implementation. In DeleteTopicTest, use the KIP-455 API rather than invoking the reassignment command.	5 years ago
Chia-Ping Tsai	c27f629e95	KAFKA-9654; Update epoch in `ReplicaAlterLogDirsThread` after new LeaderAndIsr (#8223 ) Currently when there is a leader change with a log dir reassignment in progress, we do not update the leader epoch in the partition state maintained by `ReplicaAlterLogDirsThread`. This can lead to a FENCED_LEADER_EPOCH error, which results in the partition being marked as failed, which is a permanent failure until the broker is restarted. This patch fixes the problem by updating the epoch in `ReplicaAlterLogDirsThread` after receiving a new LeaderAndIsr request from the controller. Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>	5 years ago
jiameixie	11e6aedff6	MINOR: Bump RocksDB version from 5.18.3 to 5.18.4 (#8284 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Matthias J. Sax	21cfd0b453	MINOR: Fix generic types in StreamsBuilder and Topology (#8273 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <guozhang@confluent.io>, John Roesler <john@confluent.io>	5 years ago
Matthias J. Sax	89cd2f2a0b	KAFKA-9441: Unify committing within TaskManager (#8218 ) - part of KIP-447 - commit all tasks at once using non-eos (and eos-beta in follow up work) - unified commit logic into TaskManager - split existing methods of Task interface in pre/post parts Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
A. Sophie Blee-Goldman	9ee8277cdd	KAFKA-6145: Add new assignment configs Add 4 new assignor configs in preparation for the new assignment algorithm: 1. acceptable.recovery.lag 2. balance.factor 3. max.warmup.replicas 4. probing.rebalance.interval.ms Implements: KIP-441 Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
Bruno Cadonna	0174c95f4f	MINOR: Fix javadoc warning in StreamsMetric (#8314 ) Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@apache.org>	5 years ago
Tom Bentley	4f1e8331ff	KAFKA-9435: DescribeLogDirs automated protocol (#7972 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	5 years ago
Lucas Bradstreet	5c0cf02947	MINOR: return unmodifiableMap for PartitionStates.partitionStateMap. (#7637 ) Makes the map returned by partitionStateMap unmodifiable to prevent mutation. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Boyang Chen	c7164a3866	KAFKA-8618: Replace Txn marker with automated protocol (#7039 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	5 years ago
A. Sophie Blee-Goldman	85c96f5230	KAFKA-9568: enforce rebalance if client endpoint has changed (#8299 ) Since the assignment info includes a map with all member's host info, we can just check the received map to make sure our endpoint is contained. If not, we need to force the group to rebalance and get our updated endpoint info. Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Guozhang Wang	b1999ba22d	KAFKA-5604: Remove the redundant TODO marker on the Streams side (#8313 ) The issue itself has been fixed a while ago on the producer side, so we can just remove this TODO marker now (we've removed the isZombie flag already anyways). Reviewers: John Roesler <vvcephei@apache.org>	5 years ago

1 2 3 4 5 ...

7393 Commits (f7d2b1baf7745b0f505edd186e1731cc5c9f382c) All Branches Search

7393 Commits (f7d2b1baf7745b0f505edd186e1731cc5c9f382c)

All Branches