src-kafka

Author	SHA1	Message	Date
Manikumar Reddy	8dff0b168a	Kafka 9626: Improve ACLAuthorizer.acls() performance This PR avoids creation of unnecessary sets in AclAuthorizer.acls() method implementation. Perf results: Old ``` Benchmark (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 5 5000 avgt 15 5.821 ? 0.309 ms/op AclAuthorizerBenchmark.testAclsIterator 5 10000 avgt 15 15.303 ? 0.107 ms/op AclAuthorizerBenchmark.testAclsIterator 5 50000 avgt 15 74.976 ? 0.543 ms/op AclAuthorizerBenchmark.testAclsIterator 10 5000 avgt 15 15.366 ? 0.184 ms/op AclAuthorizerBenchmark.testAclsIterator 10 10000 avgt 15 29.899 ? 0.129 ms/op AclAuthorizerBenchmark.testAclsIterator 10 50000 avgt 15 167.301 ? 1.723 ms/op AclAuthorizerBenchmark.testAclsIterator 15 5000 avgt 15 21.980 ? 0.114 ms/op AclAuthorizerBenchmark.testAclsIterator 15 10000 avgt 15 44.385 ? 0.255 ms/op AclAuthorizerBenchmark.testAclsIterator 15 50000 avgt 15 241.919 ? 3.955 ms/op ``` New ``` Benchmark (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 5 5000 avgt 15 0.666 ? 0.004 ms/op AclAuthorizerBenchmark.testAclsIterator 5 10000 avgt 15 1.427 ? 0.015 ms/op AclAuthorizerBenchmark.testAclsIterator 5 50000 avgt 15 21.410 ? 0.225 ms/op AclAuthorizerBenchmark.testAclsIterator 10 5000 avgt 15 1.230 ? 0.018 ms/op AclAuthorizerBenchmark.testAclsIterator 10 10000 avgt 15 4.303 ? 0.744 ms/op AclAuthorizerBenchmark.testAclsIterator 10 50000 avgt 15 36.724 ? 0.409 ms/op AclAuthorizerBenchmark.testAclsIterator 15 5000 avgt 15 2.433 ? 0.379 ms/op AclAuthorizerBenchmark.testAclsIterator 15 10000 avgt 15 9.818 ? 0.214 ms/op AclAuthorizerBenchmark.testAclsIterator 15 50000 avgt 15 52.886 ? 0.525 ms/op ``` Author: Manikumar Reddy <manikumar.reddy@gmail.com> Author: Lucas Bradstreet <lucas@confluent.io> Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>, Lucas Bradstreet <lucas@confluent.io> Closes #8199 from omkreddy/KAFKA-9626	5 years ago
Bruno Cadonna	ea0c027531	MINOR: Clean up process rate and latency metrics test (#8172 ) Reviewers: John Roesler <vvcephei@apache.org>	5 years ago
Lee Dongjin	75268cb8e9	KAFKA-9327: Document GroupMetadata metrics This commit adds documentation on following `GroupMetadata` gauges: - `NumOffsets` - `NumGroups` - `NumGroupsPreparingRebalance` - `NumGroupsCompletingRebalance` - `NumGroupsEmpty` - `NumGroupsStable` - `NumGroupsDead` cc/ gwenshap Author: Lee Dongjin <dongjin@apache.org> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7890 from dongjinleekr/feature/KAFKA-9327	5 years ago
high.lee	285ad85ff7	MINOR: Fix test name typo in StoresTest (#8006 ) Reviewers: Ron Dagostino <rdagostinoconfluent.io>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Nikolay	c07db1c7d9	KAFKA-9573: Fix VerifiableProducer and VerifiableConsumer to work with older Kafka versions (#8197 ) These classes are used by `upgrade_test.py` with old Kafka versions so they can only use functionality that exists in all Kafka versions. This change fixes the test for Kafka versions older than 0.11.0. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
A. Sophie Blee-Goldman	a1f2ece323	KAFKA-9525: add enforceRebalance method to Consumer API (#8087 ) As described in KIP-568. Waiting on acceptance of the KIP to write the tests, on the off chance something changes. But rest assured unit tests are coming ⚡️ Will also kick off existing Streams system tests which leverage this new API (eg version probing, sometimes broker bounce) Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Brian Bushree	72a5aa8b07	MINOR: add wait_for_assigned_partitions to console-consumer (#8192 ) what/why the throttling_test was broken by this PR (#7785) since it depends on the consumer having partitions-assigned before starting the producer this PR provides the ability to wait for partitions to be assigned in the console consumer before considering it started. caveat this does not support starting up the JmxTool inside the console-consumer for custom metrics while using this wait_until_partitions_assigned flag since the code assumes one JmxTool running per node. I think a proper fix for this would be to make JmxTool its own standalone single-node service alternatives we could use the EndToEnd test suite which uses the verifiable producer/consumer under the hood but I found that there were more changes necessary to get this working unfortunately (specifically doesn't seem like this test suite plays nicely with the ProducerPerformanceService) Reviewers: Mathew Wong <mwong@confluent.io>, Bill Bejeck <bbejeck.com>	5 years ago
Ego	0d83894fec	MINOR: Double quote `CLASSPATH` to prevent shell glob expansion. (#8191 ) In the event that `CLASSPATH` does not have an ending ":", the shell can expand the CLASSPATH globs to be space-separated list of paths/jars, which is not how the JVM CLI accepts arguments to -cp switch. So double quote the variable to prevent pattern expansion, and pass the glob pattern directly to the JVM. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Chia-Ping Tsai	0df03a681c	MINOR: Fix unrelated types comparison in MetadataRequestTest (#8195 ) The type of `leaderId` was changed to `Optional` in a recent change. Reviewers: Ismael Juma <ismael@juma.me.uk>	5 years ago
Guozhang Wang	e0551acce4	KAFKA-9623: Keep polling until the task manager is no longer rebalancing in progress (#8190 ) This bug is found via the flaky SmokeTestDriverIntegrationTest. Without this PR the test fails every 3-4 times, after this issue is fixed we've run the test 20+ locally without error. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>	5 years ago
Boyang Chen	ede07306a7	KAFKA-9620: Do not throw in the middle of consumer user callbacks (#8187 ) One way of fixing it forward. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Matthew Wong	294b62963b	throttle consumer timeout increase (#8188 ) The test_throttled_reassignment test fails because the consumer that is used to validate reassignment does not start on time to consume all messages. This does not seem like an issue with the throttling of the reassignment, since increasing the timeout allowed the test to pass multiple consecutive runs locally. This test seemed to rely on the default JmxTool for the console consumer that was removed in this commit: 179d0d7 The console consumer would check to see if it had partitions assigned to it before beginning to consume. Although the test occasionally failed with the JmxTool, it began to fail much more after the removal. Error messages of failures followed the below format with varying numbers of missed messages. They are the first messages by the producer. 535 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 515 more. Total Acked: 192792, Total Consumed: 192259. We validated that the first 535 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer. In the scope of the test, this error suggests that the test is falling into the race condition described in produce_consume_validate.py, which has the timeout to prevent the consumer from missing initial messages. This can serve as a temporary fix until the logic of consumer startup is addressed further. Reviewers: Jason Gustafson <jason@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	5 years ago
Ismael Juma	52f36d1987	MINOR: Revert Jetty to 9.4.25 (#8183 ) 9.4.25 renamed closeOutput to completeOutput (`c5acf96506`), which is a method used by recent Jersey versions including the latest (2.30.1). An example of the error: > java.lang.NoSuchMethodError: org.eclipse.jetty.server.Response.closeOutput()V > at org.glassfish.jersey.jetty.JettyHttpContainer$ResponseWriter.commit(JettyHttpContainer.java:326) The request still completes and hence why no test fails. We should think about how to improve the testing for this kind of problem, but I want to get the fix in before 2.5 RC0. Credit to @rigelbm for finding this. Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Andrew Choi <a24choi@edu.uwaterloo.ca>	5 years ago
Bruno Cadonna	d2c42125aa	Adapt docs about metrics of Streams according to KIP-444 (#8171 ) Adapts the docs about metrics of Streams according to https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%253A+Augment+metrics+for+Kafka+Streams Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	399d18fd8e	HOTFIX: testInitTransactionTimeout should use prepareResponse instead of respond (#8179 ) We have seen a flaky behavior due to using #respond instead of #prepareResponse call for the txn test. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Agam Brahma	fb5bd9eb7c	HOTFIX: fix failing test PlaintextAdminIntegrationTest.testElectPreferredLeaders (#8178 ) Confirmed the test passes after this (minor) change. Reviewers: Jason Gustafson <jason@confluent.io>, Boyang Chen <boyang@confluent.io>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Boyang Chen	bafdf8630e	KAFKA-9607: Do not clear partition queues during close (#8168 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Bruno Cadonna	39d5534f8d	MINOR: Remove tag from metric to measure process-rate on source nodes (#8175 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	a983545d3c	KAFKA-9610: do not throw illegal state when remaining partitions are not empty (#8169 ) For handleRevocation, it is possible that previous onAssignment callback has cleaned up the stream tasks, which means no corresponding task could be found for given partitions. We should not throw here as this is expected behavior. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Guozhang Wang	dfef2a3ec2	KAFKA-9614: Not initialize topology twice in StreamTask (#8173 ) We only initialize topology when transiting from restoring -> running. Also tighten some unit tests for this fix: a. restoring -> suspended should just write checkpoint file without committing. b. suspended -> restoring should not need any inner updates. c. restoring -> running should always try to fetch committed offsets, and forward timeout exceptions. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>	5 years ago
Chris Egerton	b9e3fa2322	KAFKA-9601: Stop logging raw connector config values (#8165 ) Author: Chris Egerton <chrise@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com>	5 years ago
John Roesler	7d6d386c4c	MINOR: Fix gradle error writing test stdout (#8133 ) Adds a couple of extra checks to the test-output-capturing logic in our gradle build. Previously, we were seeing a lot of error logs while attempting to write output for a test whose output file hadn't been initialized. Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>	5 years ago
Manikumar Reddy	922a95a18d	KAFKA-9594: Add a separate lock to pause the follower log append while checking if the log dir could be replaced. This PR adds new lock is used to prevent the follower replica from being updated while ReplicaAlterDirThread is executing maybeReplaceCurrentWithFutureReplica() to replace follower replica with the future replica. Now doAppendRecordsToFollowerOrFutureReplica() doesn't need to hold the lock on leaderIsrUpdateLock for local replica updation and ongoing log appends on the follower will not delay the makeFollower() call. Benchmark results for Partition.makeFollower() Old: ``` Benchmark Mode Cnt Score Error Units PartitionMakeFollowerBenchmark.testMakeFollower avgt 15 2046.967 ? 22.842 ns/op ``` New: ``` Benchmark Mode Cnt Score Error Units PartitionMakeFollowerBenchmark.testMakeFollower avgt 15 1278.525 ? 5.354 ns/op ``` Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Jun Rao <junrao@gmail.com> Closes #8153 from omkreddy/KAFKA-9594-LAISR	5 years ago
Chia-Ping Tsai	c1784947cd	HOTFIX: compile error in EmbeddedKafkaCluster (#8170 )	5 years ago
Boyang Chen	0dc0b71d5d	KAFKA-9602: Close the stream internal producer only in EOS (#8166 ) This bug reproduces through the trunk stream test, the producer was closed unexpectedly when EOS is not turned on. Will work on adding unit test to guard this logic. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
David Jacot	dd22b3f01b	KAFKA-9498; Topic validation during the topic creation triggers unnecessary TopicChange events (#8062 ) This PR avoids generating unnecessary TopicChange events during the topic validation. It does so by adding a registerWatch field in the GetChildrenRequest request. This allows to not register the watch when topics are queried from the topic validation logic. Reviewers: Ismael Juma <ismael@juma.me.uk>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>, Jun Rao <junrao@gmail.com>	5 years ago
A. Sophie Blee-Goldman	a28447a065	MINOR: don't assign standby tasks with no logged state (#8147 ) Right now the task assignor just blindly assigns N standby tasks per active task (where N = num.standbys) and attempts to distribute them evenly across all instances/threads. But only standby tasks that are stateful, and whose stores are changelog-enabled, will ever actually be created. This can result in a less-balanced assignment, and should be cleaned up in particular before implementing KIP-441 to remove the noise of ghost standbys. Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
high.lee	9064026639	KAFKA-8147: Add changelog topic configuration to KTable suppress (#8029 ) Implements: KIP-446 Reviewers: John Roesler <vvcephei@apache.org>	5 years ago
Bill Bejeck	9639ff494a	Revert "KAFKA-9533: ValueTransform forwards `null` values (#8108 )" (#8167 ) This reverts commit a41d3d8. Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>	5 years ago
Tom Bentley	5216da3de2	MINOR: Consistent terminal period in Errors.defaultExceptionMessage (#3909 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	5 years ago
Sönke Liebau	30ab2297f1	KAFKA-9308: Reworded the ssl part of the security documentation (#8009 ) Reworded the ssl part of the security documentation to fix various issues (mainly as noted by this jira, the problem that SAN extension values are not copied to certificates) and add some recommendations. Reviewers: Mickael Maison <mickael.maison@gmail.com>	5 years ago
Ron Dagostino	9d53ad794d	KAFKA-9567: Docs, system tests for ZooKeeper 3.5.7 These changes depend on [KIP-515: Enable ZK client to use the new TLS supported authentication](https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Enable+ZK+client+to+use+the+new+TLS+supported+authentication), which was only added to 2.5.0. The upgrade to ZooKeeper 3.5.7 was merged to both 2.5.0 and 2.4.1 via https://issues.apache.org/jira/browse/KAFKA-9515, but this change must only be merged to 2.5.0 (it will break the system tests if merged to 2.4.1). Author: Ron Dagostino <rdagostino@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Andrew Choi <li_andchoi@microsoft.com> Closes #8132 from rondagostino/KAFKA-9567	5 years ago
Jason Gustafson	5359b2e3bc	MINOR: Improve AuthorizerIntegrationTest (#7926 ) This patch improves the authorizer integration tests in the following ways: 1. We use a separate principal for inter-broker communications. This ensures that ACLs set in the test cases do not interfere with inter-broker communication. We had two test cases (`testCreateTopicAuthorizationWithClusterCreate` and `testAuthorizationWithTopicExisting`) which depend on topic creation and were timing out because of inter-broker metadata propagation failures. The timeouts were treated as successfully satisfying the expectation of authorization. So the tests passed, but not because of the intended reason. 2. Previously `GroupAuthorizerIntegrationTest` was inheriting _all_ of the tests from `AuthorizerIntegrationTest`. This seemed like overkill since the ACL evaluation logic is essentially the same. Totally this should take about 5-10 minutes off the total build time and make the authorizer integration tests a little more resilient to problems with inter-broker communication. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Chia-Ping Tsai	384eb16805	KAFKA-9599 create unique sensor to record group rebalance (#8159 ) The "offset deletion" and "group rebalance" should not be recorded by the same sensor since they are totally different. The code is introduced by #7276. Reviewers: David Jacot <djacot@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	1de9981fe3	KAFKA-9581: Remove rebalance exception withholding (#8145 ) The rebalance exception withholding is no longer necessary as we have better mechanism for catching and wrapping these exceptions. Throw them directly should be fine and simplify our current error handling. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Jason Gustafson	2df7ea5a4a	KAFKA-9530; Fix flaky test `testDescribeGroupWithShortInitializationTimeout` (#8154 ) With a short timeout, a call in KafkaAdminClient may timeout and the client might disconnect. Currently this can be exposed to the user as either a TimeoutException or a DisconnectException. To be consistent, rather than exposing the underlying retriable error, we handle both cases with a TimeoutException. Reviewers: Boyang Chen <boyang@confluent.io>, Ismael Juma <ismael@juma.me.uk>	5 years ago
Matthias J. Sax	7d504d668f	HOTFIX: fix NPE in Kafka Streams IQ (#8158 ) Reviewers: Vito Jeng <vito@is-land.com.tw>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Andras Katona	bf16e35861	MINOR: set scala version automatically based on gradle.properties Author: Andras Katona <akatona@cloudera.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7953 from akatona84/kafkarunclass-auto-scala-version	5 years ago
Lucas Bradstreet	1a8dcffe4a	KAFKA-9577; SaslClientAuthenticator incorrectly negotiates SASL_HANDSHAKE version (#8142 ) The SaslClientAuthenticator incorrectly negotiates supported SaslHandshakeRequest version and uses the maximum version supported by the broker whether or not the client supports it. This bug was exposed by a recent version bump in `0a2569e2b9`. This PR rolls back the recent SaslHandshake[Request,Response] bump, fixes the version negotiation, and adds a test to prevent anyone from accidentally bumping the version without a workaround such as a new ApiKey. The existing key will be difficult to support for clients < 2.5 due to the incorrect negotiation. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>, Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>	5 years ago
Matthias J. Sax	97d107a270	KAFKA-9441: Add internal TransactionManager (#8105 ) Upfront refactoring for KIP-447. Introduces `StreamsProducer` that allows to share a producer over multiple tasks and track the TX status. Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	5 years ago
Konstantine Karantasis	bbfecaef72	MINOR: Document endpoints for connector topic tracking (KIP-558) Update the site documentation to include the endpoints introduced with KIP-558 and a short paragraph on how this feature is used in Connect. Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Toby Drake <tobydrake7@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io> Closes #8148 from kkonstantine/kip-558-docs	5 years ago
Guozhang Wang	003dce5d51	MINOR: Standby task commit needed when offsets updated (#8146 ) This is a minor fix of a regression introduced in the refactoring PR: in current trunk standbyTask#commitNeeded always return false, which would cause standby tasks to never be committed until closed. To go back to the old behavior we would return true when new data has been applied and offsets being updated. Reviewers: Boyang Chen <boyang@confluent.io>, John Roesler <john@confluent.io>	5 years ago
Agam Brahma	84c4025fdd	KAFKA-9206; Throw KafkaException on CORRUPT_MESSAGE error in Fetch response (#8111 ) If a completed fetch has an error code signifying a _corrupt message_, throw a `KafkaException` that notes the fetch offset and the topic-partition. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago
Manikumar Reddy	747ef08d47	MINOR: Remove unwanted regexReplace on tests/kafkatest/__init__.py Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk> Closes #8072 from omkreddy/release-script	5 years ago
Lee Dongjin	b28aa4ece6	KAFKA-9586: Fix errored json filename in ops documentation This PR is the counterpart of apache/kafka-site#253. cc/ omkreddy Author: Lee Dongjin <dongjin@apache.org> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #8149 from dongjinleekr/feature/KAFKA-9586	5 years ago
Ron Dagostino	d9b8b86bdd	KAFKA-9575: Mention ZooKeeper 3.5.7 upgrade More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers. Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes. Author: Ron Dagostino <rdagostino@confluent.io> Reviewers: Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #8139 from rondagostino/KAFKA-9575	5 years ago
Guozhang Wang	3b6573c150	KAFKA-9481: Graceful handling TaskMigrated and TaskCorrupted (#8058 ) 1. Removed task field from TaskMigrated; the only caller that encodes a task id from StreamTask actually do not throw so we only log it. To handle it on StreamThread we just always enforce rebalance (and we would call onPartitionsLost to remove all tasks as dirty). 2. Added TaskCorruptedException with a set of task-ids. The first scenario of this is the restoreConsumer.poll which throws InvalidOffset indicating that the logs are truncated / compacted. To handle it on StreamThread we first close the corresponding tasks as dirty (if EOS is enabled we would also wipe out the state stores), and then revive them into the CREATED state. 3. Also fixed a bug while investigating KAFKA-9572: when suspending / closing a restoring task we should not commit the new offsets but only updating the checkpoint file. 4. Re-enabled the unit test.	5 years ago
A. Sophie Blee-Goldman	0d16c26e3c	HOTFIX: don't try to remove uninitialized changelogs from assignment & don't prematurely mark task closed (#8140 ) This fixes two issues which together caused the soak to crash/some test to fail occasionally. What happened was: In the main StreamThread loop we initialized a new task in TaskManager#checkForCompletedRestoration which includes registering, but not initializing, its changelogs. We then complete the loop and call poll, which resulted in a rebalance that revoked the newly-initialized task. In TaskManager#handleAssignment we then closed the task cleanly and go to remove the changelogs from the StoreChangelogReader only to get an IllegalStateException because the changelog partitions were not in the restore consumer's assignment (due to being uninitialized). This by itself should^ be a recoverable error, as we catch exceptions here and retry closing the task as unclean. Of course the task actually was successfully closed (clean) so we now get an unexpected exception Illegal state CLOSED while closing active task The fix(es) I'd propose are: 1. Keep the restore consumer's assignment in sync with the registered changelogs, ie the set ChangelogReader#changelogs but pause them until they are initialized edit: since the consumer does still perform some actions (gg fetches) on paused partitions, we should avoid adding uninitialized changelogs to the restore consumer's assignment. Instead, we should just skip them when removing. 2. Move the StoreChangelogReader#remove call to before the task.closeClean so that the task is only marked as closed if everything was successful. We should do so regardless, as we should (attempt to) remove the changelogs even if the clean close failed and we must do unclean. Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	5 years ago
Lee Dongjin	b2aec9496d	MINOR: Fix javadoc at org.apache.kafka.clients.producer.KafkaProducer.InterceptorCallback#onCompletion (#7337 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	5 years ago
Boyang Chen	776565f7a8	MINOR: Improve EOS example exception handling (#8052 ) The current EOS example mixes fatal and non-fatal error handling. This patch fixes this problem and simplifies the example. Reviewers: Jason Gustafson <jason@confluent.io>	5 years ago

1 2 3 4 5 ...

7180 Commits (8dff0b168a28a5ed75b1ee8a8c33ba9a9c67d1a1) All Branches Search

7180 Commits (8dff0b168a28a5ed75b1ee8a8c33ba9a9c67d1a1)

All Branches