src-kafka

Author	SHA1	Message	Date
Boyang Chen	0eab92012b	HOTFIX: fix compilation error (#8424 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <guozhang@confluent.io	5 years ago
Matthias J. Sax	ab5e4f52ec	MINOR: Refactor StreamsProducer (#8380 ) Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Andrew Choi <a24choi@edu.uwaterloo.ca>	5 years ago
A. Sophie Blee-Goldman	6e0d553350	MINOR: clean up Streams assignment classes and tests (#8406 ) First set of cleanup pushed to followup PR after KIP-441 Pt. 5. Main changes are: 1. Moved `RankedClient` and the static `buildClientRankingsByTask` to a new file 2. Moved `Movement` and the static `getMovements` to a new file (also renamed to `TaskMovement`) 3. Consolidated the many common variables throughout the assignment tests to the new `AssignmentTestUtils` 4. New utility to generate comparable/predictable UUIDs for tests, and removed the generic from `TaskAssignor` and all related classes Reviewers: John Roesler <vvcephei@apache.org>, Andrew Choi <a24choi@edu.uwaterloo.ca>	5 years ago
John Roesler	726a7d5de2	KAFKA-9812: fix infinite loop in test code (#8411 ) Reviewers: Boyang Chen <boyang@confluent.io>	5 years ago
Bill Bejeck	9783b85fdd	KAFKA-9739: Fixes null key changing child node (#8400 ) For some context, when building a streams application, the optimizer keeps track of the key-changing operations and any repartition nodes that are descendants of the key-changer. During the optimization phase (if enabled), any repartition nodes are logically collapsed into one. The optimizer updates the graph by inserting the single repartition node between the key-changing node and its first child node. This graph update process is done by searching for a node that has the key-changing node as one of its direct parents, and the search starts from the repartition node, going up in the parent hierarchy. The one exception to this rule is if there is a merge node that is a descendant of the key-changing node, then during the optimization phase, the map tracking key-changers to repartition nodes is updated to have the merge node as the key. Then the optimization process updates the graph to place the single repartition node between the merge node and its first child node. The error in KAFKA-9739 occurred because there was an assumption that the repartition nodes are children of the merge node. But in the topology from KAFKA-9739, the repartition node was a parent of the merge node. So when attempting to find the first child of the merge node, nothing was found (obviously) resulting in StreamException(Found a null keyChangingChild node for..) This PR fixes this bug by first checking that all repartition nodes for optimization are children of the merge node. This PR includes a test with the topology from KAFKA-9739. Reviewers: John Roesler <john@confluent.io>	5 years ago
Boyang Chen	7f640f13b4	KAFKA-9776: Downgrade TxnCommit API v3 when broker doesn't support (#8375 ) Revert the decision for the sendOffsetsToTransaction(groupMetadata) API to fail with old version of brokers for the sake of making the application easier to adapt between versions. This PR silently downgrade the TxnOffsetCommit API when the build version is small than 3. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	6ddbf4d800	KAFKA-9809: Shrink transaction timeout for streams (#8407 ) As documented in the KIP: We shall set `transaction.timout.ms` default to 10000 ms (10 seconds) on Kafka Streams. Reviewer: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
A. Sophie Blee-Goldman	2322bc0a6f	KAFKA-6145: Pt. 5 Implement high availability assignment (#8337 ) Adds a new TaskAssignor implementation, currently hidden behind an internal feature flag, that implements the high availability algorithm of KIP-441. Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
Guozhang Wang	a2092ecd7a	HOTFIX: remove redundant check for QueryableStateIntegrationTest	5 years ago
Matthias J. Sax	cc59150f40	KAFKA-9748: extend EosIntegrationTest for EOS-beta (#8331 ) Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Bruno Cadonna	e6a6b69acf	MINOR: Improve close tests of caching state store (#8386 ) Reviewers: John Roesler <vvcephei@apache.org>, Matthias J. Sax <matthias@confluent.io>, Andrew Choi <a24choi@edu.uwaterloo.ca>	5 years ago
Ismael Juma	90bbeedf52	MINOR: Fix Scala 2.13 compiler warnings (#8390 ) Once Scala 2.13.2 is officially released, I will submit a follow up PR that enables `-Xfatal-warnings` with the necessary warning exclusions. Compiler warning exclusions were only introduced in 2.13.2 and hence why we have to wait for that. I used a snapshot build to test it in the meantime. Changes: * Remove Deprecated annotation from internal request classes * Class.newInstance is deprecated in favor of Class.getConstructor().newInstance * Replace deprecated JavaConversions with CollectionConverters * Remove unused kafka.cluster.Cluster * Don't use Map and Set methods deprecated in 2.13: - collection.Map +, ++, -, --, mapValues, filterKeys, retain - collection.Set +, ++, -, -- * Add scala-collection-compat dependency to streams-scala and update version to 2.1.4. * Replace usages of deprecated Either.get and Either.right * Replace usage of deprecated Integer(String) constructor * `import scala.language.implicitConversions` is not needed in Scala 2.13 * Replace usage of deprecated `toIterator`, `Traversable`, `seq`, `reverseMap`, `hasDefiniteSize` * Replace usage of deprecated alterConfigs with incrementalAlterConfigs where possible * Fix implicit widening conversions from Long/Int to Double/Float * Avoid implicit conversions to String * Eliminate usage of deprecated procedure syntax * Remove `println`in `LogValidatorTest` instead of fixing the compiler warning since tests should not `println`. * Eliminate implicit conversion from Array to Seq * Remove unnecessary usage of 3 argument assertEquals * Replace `toStream` with `iterator` * Do not use deprecated SaslConfigs.DEFAULT_SASL_ENABLED_MECHANISMS * Replace StringBuilder.newBuilder with new StringBuilder * Rename AclBuffers to AclSeqs and remove usage of `filterKeys` * More consistent usage of Set/Map in Controller classes: this also fixes deprecated warnings with Scala 2.13 * Add spotBugs exclusion for inliner artifact in KafkaApis with Scala 2.12. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	5 years ago
Matthias J. Sax	6a49ede947	KAFKA-9441: Cleanup Streams metrics for removed task commit latency metrics (#8356 ) Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bruno Cadonna <bruno@confluent.io>	5 years ago
Guozhang Wang	353aa6206d	KAFKA-9753: Add active tasks process ratio (#8370 ) Measure the percentage ratio the stream thread spent on processing each task among all assigned active tasks (KIP-444). Also add unit tests to cover the added metrics in this PR and the previous #8358. Also trying to fix the flaky test reported in KAFKA-5842 Co-authored-by: John Roesler <vvcephei@apache.org> Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
Matthias J. Sax	6ad5407350	KAFKA-9719: Streams with EOS-beta should fail fast for older brokers (#8367 ) Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Bruno Cadonna	c595470713	KAFKA-9770: Close underlying state store also when flush throws (#8368 ) When a caching state store is closed it calls its flush() method. If flush() throws an exception the underlying state store is not closed. This commit ensures that state stores underlying a wrapped state stores are closed even when preceding operations in the close method throw. Co-authored-by: John Roesler <vvcephei@apache.org> Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>	5 years ago
John Roesler	3f5ad4640b	MINOR: optimize integration test shutdown (#8366 ) * delete topics before tearing down multi-node clusters to avoid leader elections during shutdown * tear down all nodes concurrently instead of sequentially Reviewers: Matthias J. Sax <matthias@confluent.io>	5 years ago
Guozhang Wang	9d9b59fccc	KAFKA-9756: Process more than one record of one task at a time (#8358 ) 1. Within a single while loop, process the tasks in AAABBBCCC instead of ABCABCABC. This also helps the follow-up PR to time the per-task processing ratio to record less time, hence less overhead. 2. Add thread-level process / punctuate / poll / commit ratio metrics. 3. Fixed a few issues discovered (inline commented). Reviewers: John Roesler <vvcephei@apache.org>	5 years ago
A. Sophie Blee-Goldman	e1cbefef60	HOTFIX: fix log message in version probing system test (#8341 ) Reviewer: Matthias J. Sax <matthias@confluent.io>	5 years ago
Bruno Cadonna	1fbddd853d	KAFKA-6145: Add balanced assignment algorithm (#8334 ) This algorithm assigns tasks to clients and tries to - balance the distribution of the partitions of the same input topic over stream threads and clients, i.e., data parallel workload balance - balance the distribution of work over stream threads. The algorithm does not take into account potentially existing states on the client. The assignment is considered balanced when the difference in assigned tasks between the stream thread with the most tasks and the stream thread with the least tasks does not exceed a given balance factor. The algorithm prioritizes balance over stream threads higher than balance over clients. Reviewers: John Roesler <vvcephei@apache.org>	5 years ago
A. Sophie Blee-Goldman	8012f47e2c	MINOR: move some client methods to new ClientUtils (#8328 ) * move to new ClientUtils * checkstyle * fix KafkaStreamsTest * checkstyle again	5 years ago
Guozhang Wang	1b36e11967	MINOR: Restore and global consumers should never have group.instance.id (#8322 ) And hence restore / global consumers should never expect FencedInstanceIdException. When such exception is thrown, it means there's another instance with the same instance.id taken over, and hence we should treat it as fatal and let this instance to close out instead of handling as task-migrated. Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
Boyang Chen	cc70614105	KAFKA-9743: Catch commit offset exception to eventually close dirty tasks (#8327 ) Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	5 years ago
John Roesler	f538caf138	KAFKA-9742: Fix broken StandbyTaskEOSIntegrationTest (#8330 ) Relax the requirement that tasks' reported offsetSum is less than the endOffsetSum for those tasks. This was surfaced by a test for corrupted tasks, but it can happen with real corrupted tasks. Rather than throw an exception on the leader, we now de-prioritize the corrupted task. Ideally, that instance will not get assigned the task and the stateDirCleaner will make the problem "go away". If it does get assigned the task, then it will detect the corruption and delete the task directory before recovering the entire changelog. Thus, the estimate we provide accurately reflects the amount of lag such a corrupted task would have to recover (the whole log). Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bruno Cadonna <bruno@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	5 years ago
Tom Bentley	6cfed8ad00	KAFKA-9651: Fix ArithmeticException (÷ by 0) in DefaultStreamPartitioner (#8226 ) In Streams `StreamsMetadataState.getMetadataWithKey`, we should use the inferred max topic partitions passed in directly from the caller than relying on cluster to contain its topic-partition information. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Matthias J. Sax	1ad5f346cb	KAFKA-9451: Enable producer per thread for Streams EOS (#8318 ) - KIP-447 - add new configs to enable producer per thread EOS - updates TaskManager to use single shared producer for eos-beta Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Chia-Ping Tsai	4f6907947a	MINOR: fix linking errors in javadoc (#8198 ) This improvement fixes several linking errors to classes and methods from within javadocs. Related to #8291 Reviewers: Konstantine Karantasis <konstantine@confluent.io>	5 years ago
A. Sophie Blee-Goldman	6cf27c9c77	KAFKA-6145: Pt 2.5 Compute overall task lag per client (#8252 ) Once we have encoded the offset sums per task for each client, we can compute the overall lag during assign by fetching the end offsets for all changelog and subtracting. If the listOffsets request fails, we simply return a "completely sticky" assignment, ie all active tasks are given to previous owners regardless of balance. Builds (but does not yet use) the statefulTasksToRankedCandidates map with the ranking: Rank -1: active running task Rank 0: standby or restoring task whose overall lag is within acceptableRecoveryLag Rank 1: tasks whose lag is unknown (eg during version probing) Rank 1+: all other tasks are ranked according to their actual total lag Implements: KIP-441 Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
Boyang Chen	c249ea8e5d	KAFKA-9727: cleanup the state store for standby task dirty close and check null for changelogs (#8307 ) This PR fixes three things: * the state should be closed when standby task is restoring as well * the EOS standby task should also wipe out state under dirty close * the changelog reader should check for null as well Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Bruno Cadonna	cbdd0d6cf1	KAFKA-6145: Add constrained balanced assignment algorithm (#8262 ) Adds a currently unused component of the new Streams assignment algorithm. Implements: KIP-441 Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
John Roesler	960b216290	KAFKA-9734: Fix IllegalState in Streams transit to standby (#8319 ) Consolidate ChangelogReader state management inside of StreamThread to avoid having to reason about all execution paths in both StreamThread and TaskManager. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Colin Patrick McCabe	56051e7639	KAFKA-8820: kafka-reassign-partitions.sh should support the KIP-455 API (#8244 ) Rewrite ReassignPartitionsCommand to use the KIP-455 API when possible, rather than direct communication with ZooKeeper. Direct ZK access is still supported, but deprecated, as described in KIP-455. As specified in KIP-455, the tool has several new flags. --cancel stops an assignment which is in progress. --preserve-throttle causes the --verify and --cancel commands to leave the throttles alone. --additional allows users to execute another partition assignment even if there is already one in progress. Finally, --show displays all of the current partition reassignments. Reorganize the reassignment code and tests somewhat to rely more on unit testing using the MockAdminClient and less on integration testing. Each integration test where we bring up a cluster seems to take about 5 seconds, so it's good when we can get similar coverage from unit tests. To enable this, MockAdminClient now supports incrementalAlterConfigs, alterReplicaLogDirs, describeReplicaLogDirs, and some other APIs. MockAdminClient is also now thread-safe, to match the real AdminClient implementation. In DeleteTopicTest, use the KIP-455 API rather than invoking the reassignment command.	5 years ago
Matthias J. Sax	21cfd0b453	MINOR: Fix generic types in StreamsBuilder and Topology (#8273 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <guozhang@confluent.io>, John Roesler <john@confluent.io>	5 years ago
Matthias J. Sax	89cd2f2a0b	KAFKA-9441: Unify committing within TaskManager (#8218 ) - part of KIP-447 - commit all tasks at once using non-eos (and eos-beta in follow up work) - unified commit logic into TaskManager - split existing methods of Task interface in pre/post parts Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
A. Sophie Blee-Goldman	9ee8277cdd	KAFKA-6145: Add new assignment configs Add 4 new assignor configs in preparation for the new assignment algorithm: 1. acceptable.recovery.lag 2. balance.factor 3. max.warmup.replicas 4. probing.rebalance.interval.ms Implements: KIP-441 Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
Bruno Cadonna	0174c95f4f	MINOR: Fix javadoc warning in StreamsMetric (#8314 ) Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@apache.org>	5 years ago
A. Sophie Blee-Goldman	85c96f5230	KAFKA-9568: enforce rebalance if client endpoint has changed (#8299 ) Since the assignment info includes a map with all member's host info, we can just check the received map to make sure our endpoint is contained. If not, we need to force the group to rebalance and get our updated endpoint info. Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Guozhang Wang	b1999ba22d	KAFKA-5604: Remove the redundant TODO marker on the Streams side (#8313 ) The issue itself has been fixed a while ago on the producer side, so we can just remove this TODO marker now (we've removed the isZombie flag already anyways). Reviewers: John Roesler <vvcephei@apache.org>	5 years ago
Guozhang Wang	c0cff61e8c	HOTFIX: do not depend on file modified time in StateDirectoryTest	5 years ago
A. Sophie Blee-Goldman	d38e97e319	MINOR: clean up required setup for StreamsPartitionAssignorTest (#8306 ) No logical or behavioral changes, just a bit of cleanup in this class before we have to write and fix a lot of these tests for KIP-441: * Moved creation of streamsMetadata mock to setUp (in exactly one test it will be overwritten with a strict mock) * Tried to clean up the use of helper methods for configuring the assignor. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Chia-Ping Tsai	f08c9c7de6	HOTFIX: fix flaky StateDirectoryTest.shouldReturnEmptyArrayIfListFilesReturnsNull (#8310 ) StateDirectoryTest.shouldReturnEmptyArrayIfListFilesReturnsNull always moves the stage dir to /tmp/state-renamed so it always fails if there is already a folder (for example, the stuff leaved by previous test). Reviewers: Boyang Chen <boyang@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
A. Sophie Blee-Goldman	673018504f	MINOR: cleanup and add tests to StateDirectoryTest (#8304 ) Adds tests for edge conditions of listAllTaskDirectories Also includes some minor cleanup of the StateDirectoryTest class Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	1e6d944813	HOTFIX: StateDirectoryTest should use Set instead of List (#8305 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
A. Sophie Blee-Goldman	045c6c3c48	MINOR: enforce non-negative invariant for checkpointed offsets (#8297 ) While discussing KIP-441 we realize we don't strictly enforce that all checkpointed offset sums are positive (or 0, though there's not much point to checkingpoint a 0 offset is there)? Rather than awkwardly try handle this within every user/reader of the checkpoint file, we should just make a guarantee that all returned checkpointed offsets are positive. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Guozhang Wang	605d55dc17	KAFKA-6647: Do note delete the lock file while holding the lock (#8267 ) 1. Inside StateDirectory#cleanRemovedTasks, skip deleting the lock file (and hence the parent directory) until releasing the lock. And after the lock is released only go ahead and delete the parent directory if manualUserCall == true. That is, this is triggered from KafkaStreams#cleanUp and users are responsible to make sure that Streams instance is not started and hence there are no other threads trying to grab that lock. 2. As a result, during scheduled cleanup the corresponding task.dir would not be empty but be left with only the lock file, so effectively we still achieve the goal of releasing disk spaces. For callers of listTaskDirectories like KIP-441 (cc @ableegoldman to take a look) I've introduced a new listNonEmptyTaskDirectories which excludes such dummy task.dirs with only the lock file left. 3. Also fixed KAFKA-8999 along the way to expose the exception while traversing the directory. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
Matthias J. Sax	dd2c6a0a6f	KAFKA-9533: Fix JavaDocs of KStream.transformValues (#8298 ) Reviewers: Bill Bejeck <bill@confluent.io>	5 years ago
John Roesler	7945cbc73b	MINOR: reuse pseudo-topic in FKJoin (#8296 ) Reuse the same pseudo-topic for serializing the LHS value in the foreign-key join resolver as we originally used to serialize it before sending the subscription request. Reviewers: Boyang Chen <boyang@confluent.io>	5 years ago
A. Sophie Blee-Goldman	542853d99b	KAFKA-6145: Pt 2. Include offset sums in subscription (#8246 ) KIP-441 Pt. 2: Compute sum of offsets across all stores/changelogs in a task and include them in the subscription. Previously each thread would just encode every task on disk, but we now need to read the changelog file which is unsafe to do without a lock on the task directory. So, each thread now encodes only its assigned active and standby tasks, and ignores any already-locked tasks. In some cases there may be unowned and unlocked tasks on disk that were reassigned to another instance and haven't been cleaned up yet by the background thread. Each StreamThread makes a weak effort to lock any such task directories it finds, and if successful is then responsible for computing and reporting that task's offset sum (based on reading the checkpoint file) This PR therefore also addresses two orthogonal issues: 1. Prevent background cleaner thread from deleting unowned stores during a rebalance 2. Deduplicate standby tasks in subscription: each thread used to include every (non-active) task found on disk in its "standby task" set, which meant every active, standby, and unowned task was encoded by every thread. Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>	5 years ago
Bruno Cadonna	0c256e16ab	KAFKA-9675: Fix bug that prevents RocksDB metrics to be updated (#8256 ) Reviewers: John Roesler <vvcephei@apache.org>	5 years ago
Matthias J. Sax	524182d7f4	MINOR: Update Streams IQ JavaDocs to not point to a deprecated method (#8271 ) Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago

... 3 4 5 6 7 ...

1881 Commits (12d98a3d7ad9b023c600bdbb5e3ffb42b07406b1)