Fixes case where multiple children merged from a key-changing node causes an NPE.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>
Provide custom configs to ChannelBuilders in addition to parsed `values()`
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Andrew Choi <andchoi@linkedin.com
When sun.security.jgss.native=true, we currently create a new server credential for every new client connection and add to the private credential set of the server's Subject. This is expensive and can result in an unbounded number of private credentials in the Subject used by the broker. The PR creates a credential immediately after login so that a single credential can be reused.
Reviewers: Guozhang Wang <wangguoz@gmail.com
https://issues.apache.org/jira/browse/KAFKA-9025
If a chroot is configured, ZkSecurityMigrator should prompt a confirm to user to ensure whether chroot is specified correctly.
Author: huxihx <huxi_2b@hotmail.com>
Author: huxi <huxi_2b@hotmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#7618 from huxihx/KAFKA-9025
As proposed in KIP-444, the user customizable metrics shall be refactored. The refactoring consists of:
* adding methods addLatencyRateTotalSensor and addRateTotalSensor to interface StreamMetrics
* implement the newly added methods in StreamsMetricsImpl
* deprecate methods recordThroughput() and recordLatency() in StreamsMetrics
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Brokers are supposed to force SASL clients to re-authenticate (and kill such connections in the absence of a timely and successful re-authentication) when KIP-368 SASL Re-Authentication is enabled via a positive connections.max.reauth.ms configuration value. There was a flaw in the logic that caused connections to not be killed in the absence of a timely and successful re-authentication if the client did not leverage the SaslAuthenticateRequest API (which was defined in KIP-152).
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
The two missing } in the javadoc of flatTransform() lead to the
following javadoc compile warnings:
.../KStream.java:1156: warning - End Delimiter } missing for possible See Tag in comment string
Reviewers: Bill Bejeck <bbejeck@gmail.com>
KIP-320 improved fetch semantics by adding leader epoch validation. This relies on
reliable propagation of leader epoch information from the controller. Unfortunately, we
have encountered a bug during partition reassignment in which the leader epoch in the
controller context does not get properly updated. This causes UpdateMetadata requests
to be sent with stale epoch information which results in the metadata caches on the
brokers falling out of sync.
This bug has existed for a long time, but it is only a problem due to the new epoch
validation done by the client. Because the client includes the stale leader epoch in its
requests, the leader rejects them, yet the stale metadata cache on the brokers prevents
the consumer from getting the latest epoch. Hence the consumer cannot make progress
while a reassignment is ongoing.
Although it is straightforward to fix this problem in the controller for the new releases
(which this patch does), it is not so easy to fix older brokers which means new clients
could still encounter brokers with this bug. To address this problem, this patch also
modifies the client to treat the leader epoch returned from the Metadata response as
"unreliable" if it comes from an older version of the protocol. The client in this case will
discard the returned epoch and it won't be included in any requests.
Also, note that the correct epoch is still forwarded to replicas correctly in the
LeaderAndIsr request, so this bug does not affect replication.
Reviewers: Jun Rao <junrao@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>
[KAFKA-9267](https://issues.apache.org/jira/browse/KAFKA-9267)
ZkSecurityMigrator might create a PERSISTENT /controller node with null data, it will lead to controller can't elect.
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: NanerLee <nanerlee@qq.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#7778 from NanerLee/fix-ZkSecurityMigrator
This patch fixes a race condition on reassignment completion. The previous code fetched metadata first and then fetched the reassignment state. It is possible in between those times for the reassignment to complete, which leads to spurious URPs being reported. The fix here is to change the order of these checks and to explicitly check for reassignment completion.
Note this patch fixes the flaky test `TopicCommandWithAdminClientTest.testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress`.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Remove in catch clause and move it to the callback.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
This patch eliminates some redundancy and general messiness around the usage of `BaseRequestTest` and specifically response deserialization.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Check connectivity with broker coordinator in intervals and stop tasks if coordinator is unreachable by setting `assignmentSnapshot` to null and resetting rebalance delay when there are no lost tasks. And, because we're now sometimes setting `assignmentSnapshot` to null and reading it from other methods and thread, made this member volatile and used local references to ensure consistent reads.
Adapted existing unit tests to verify additional debug calls, added more specific log messages to `DistributedHerder`, and added a new integration test that verifies the behavior when the brokers are stopped and restarted only after the workers lose their heartbeats with the broker coordinator.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Greg Harris <gregh@confluent.io>, Randall Hauch <rhauch@gmail.com>
If a non-consumer group is specified in `describeConsumerGroup`, the future will hang indefinitely because the future callback is never completed. This patch fixes the problem by completing the future exceptionally with an `IllegalArgumentException`.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>
This patch fixes an NPE in `DefaultMetadataUpdater` due to an inconsistency in event expectations. Whenever there is an authentication failure, we were treating it as a failed update even if was from a separate connection from an inflight metadata request. This patch fixes the problem by making the `MetadataUpdater` api clearer in terms of the events that are handled.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
This patch fixes a bug in `SocketServer` in the expiration of connections which have not re-authenticated quickly enough. Previously these connections were left hanging, but now they are properly closed and cleaned up. This was one cause of the flaky test failures in `EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl`.
Reviewers: Jason Gustafson<jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
KAFKA-8448 fixes problem with similar leak. The Log objects are being
held in ScheduledExecutor PeriodicProducerExpirationCheck callback. The
fix in KAFKA-8448 was to change the policy of ScheduledExecutor to
remove the scheduled task when it gets canceled (by calling
setRemoveOnCancelPolicy(true)).
This works when a log is closed using close() method. But when a log is
deleted either when the topic gets deleted or when the rebalancing
operation moves the replica away from broker, the delete() operation is
invoked. Log.delete() doesn't close the pending scheduled task and that
leaks Log instance.
Fix is to close the scheduled task in the Log.delete() method too.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Remove nullcheck, and add integration tests for restarting a failed task.
Authors: Cyrus Vafadari <cyrus@confluent.io>, Chris Egerton <chrise@confluent.io>
Reviewers: Arjun Satish <arjun@confluent.io>, Randall Hauch <rhauch@gmail.com>
Fix three independent causes of threads dying:
1. `ProducerFencedException` isn't properly handed while suspending a task, and leads to the thread dying.
2. `IllegalStateException`: an internal assertion is violated because a store can get orphaned when an exception is thrown during initialization, again leading to the thread dying.
3. `UnknownProducerIdException`: This exception isn't expected by the Streams code, so when we get it, the relevant thread dies. It's not clear whether we always need to catch this, and in the future, we won't expect it at all, but we are catching it now to be sure we're resilient if/when it happens. Important note: this might actually harm performance if the errors turn out to be ignorable, and we will now rebalance instead of ignoring them.
Also, add missing test coverage for the exception handling code.
Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
This reverts commit 90043d5f as it caused a regression in some cases:
> Caused by: java.io.IOException: Stream frame descriptor corrupted
> at org.apache.kafka.common.record.KafkaLZ4BlockInputStream.readHeader(KafkaLZ4BlockInputStream.java:132)
> at org.apache.kafka.common.record.KafkaLZ4BlockInputStream.<init>(KafkaLZ4BlockInputStream.java:78)
> at org.apache.kafka.common.record.CompressionType$4.wrapForInput(CompressionType.java:110)
I will investigate why after, but I want to get the safe fix into 2.4.0.
The reporter of KAFKA-9203 has verified that reverting this change
makes the problem go away.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Changes the ProducerMetadata to longer record a sentinel TOPIC_EXPIRY_NEEDS_UPDATE upon topic map emplacement, and instead set the expiry time directly. Previously the expiry time was being updated for all touched topics after a metadata fetch was processed, which could be seconds/minutes in the future.
Additionally propagates the current time further in the Producer, which should reduce the total number of current-time calls.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Race condition in concurrent `get` method invocation of lazy indexes might lead
to multiple `OffsetIndex`/`TimeIndex` objects being concurrently created. When
that happens position of `MappedByteBuffer` in `AbstractIndex` is advanced to
the last entry which in turn leads to a critical `BufferOverflowException` being
thrown whenever leader or replica tries to append a record to the segment.
Moreover, `file_=` setter is seemingly also vulnerable to the race, since multiple
threads can attempt to set a new file reference as well as create new
Time/OffsetIndex objects at the same time. This might lead to the discrepant
File references being held by e.g. LazyTimeIndex and its TimeIndex counterpart.
This patch attempts to fix the issue by making sure that index objects are
atomically constructed when loaded lazily via `get` method. Additionally, `file`
reference modifications are also made atomic and thread safe.
Note that the `Lazy*Index` mutation operations are executed with a lock held by
the callers, but `get` can be called without a lock (e.g. from `Log.read`).
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>, Shilin Lu, Ismael Juma <ismael@juma.me.uk>
https://issues.apache.org/jira/browse/KAFKA-9069
Make `getTopicMetadata` in AdminClientIntegrationTest always read metadata from controller to get a consistent view.
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*
*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*
Author: huxihx <huxi_2b@hotmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, José Armando García Sancio <jsancio@gmail.com>
Closes#7619 from huxihx/KAFKA-9069
This PR is a follow-up of #7087, fixing typos, styles, etc.
cc/ big-andy-coates ijuma
Author: Lee Dongjin <dongjin@apache.org>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#7217 from dongjinleekr/feature/trivial-admin-javadoc
This ends up spamming the logs and doesn't seem to be providing much useful information, rather than logging the full task (which includes the entire topology description) we should just log the task id.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>