Currently, when a dynamic change is made to the broker-level default log configuration, existing log configs will be recreated with an empty overridden configs. In such case, when updating dynamic broker configs a second round, the topic-level configs are lost. This can cause unexpected data loss, for example, if the cleanup policy changes from "compact" to "delete."
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
This ticket shall improve two aspects of the retrieval of sensors:
https://issues.apache.org/jira/browse/KAFKA-9152
Currently, when a sensor is retrieved with *Metrics.*Sensor() (e.g. ThreadMetrics.createTaskSensor()) after it was created with the same method *Metrics.*Sensor(), the sensor is added again to the corresponding queue in Sensors (e.g. threadLevelSensors) in StreamsMetricsImpl. Those queues are used to remove the sensors when removeAllLevelSensors() is called. Having multiple times the same sensors in this queue is not an issue from a correctness point of view. However, it would reduce the footprint to only store a sensor once in those queues.
When a sensor is retrieved, the current code attempts to create a new sensor and to add to it again the corresponding metrics. This could be avoided.
Both aspects could be improved by checking whether a sensor already exists by calling getSensor() on the Metrics object and checking the return value.
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
Also addresses KAFKA-8821
Note that we still have to fall back to using pattern subscription if the user has added any regex-based source nodes to the topology. Includes some minor cleanup on the side
Reviewers: Bill Bejeck <bbejeck@gmail.com>
Previously the idempotent producer and transactional producer use separate logic when initializing the producerId. This patch consolidates the two paths. We also do some cleanup in `TransactionManagerTest` to eliminate brittle expectations on `Sender`.
Reviewers: Bob Barrett <bob.barrett@confluent.io>, Viktor Somogyi <viktorsomogyi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
This patch adds a new API to the producer to implement transactional offset commit fencing through the group coordinator as proposed in KIP-447. This PR mainly changes on the Producer end for compatible paths to old `sendOffsetsToTxn(offsets, groupId)` vs new `sendOffsetsToTxn(offsets, groupMetadata)`.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
This commit makes `DistributedHerder` log that some error has happened during task reconfiguration only when it actually has happened.
Author: Ivan Yurchenko <ivan0yurchenko@gmail.com>
Reviewer: Randall Hauch <rhauch@gmail.com>
The previous version of MockConsumer does not allow the clients to test consecutive calls to poll while consuming only from a partial set of partitions due to the fact that it clears all the records after each call. This change makes MockConsumer clearing the records only for the partitions that are not paused (whose records are actually returned by the poll). The remaining paused partitions will retain the records.
Unit test added accordingly.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Although APIs section in Kafka documentation lists 5 core APIs (https://kafka.apache.org/documentation/#api), introduction page in Kafka documentation lists 4 of them. I've added the missing list element to fix this incoherence.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Since the leader epoch was not maintained in the fetch session cache, no validation would be done except for the initial (full) fetch request. This patch adds the leader epoch to the session cache and addresses the testing gaps.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin Patrick McCabe <cmccabe@apache.org>
The `KafkaController::replicasAreValid` method currently returns a
boolean indicating if replicas are valid or not. But the failure
condition loses any context on why replicas are not valid. This change
updates the metod to return the error conition if validation fails. This
allows caller to report the error to the client.
The change also renames the `replicasAreValid` method to
`validateReplicas` to reflect updated semantics.
Reviewers: Sean Li <seanli-rallyhealth@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
The producer's BufferPool may block allocations if its memory limit has hit capacity. If the producer is closed, it's possible for the allocation waiters to wait for max.block.ms if progress cannot be made, even when force-closed (immediate), which can cause indefinite blocking if max.block.ms is particularly high.
This patch fixes the problem by adding a `close()` method to `BufferPool`, which wakes up any waiters that have pending allocations and throws an exception.
Reviewers: Jason Gustafson <jason@confluent.io>
Fixed an intermittent failure in the test that was caused by a mocked instance not handling expandIsr. Also updated the test to use the offset from the batch to trigger expandIsr more often. With this change, the test may try to acquire write lock while appends are blocked on read lock, so separated out the test using write lock to make it safer.
Reviewers: Jason Gustafson <jason@confluent.io>
When the scheduled refreshTopicPartitions runs, check existing topics in
both source and target clusters in order to compute topic partitions to
be created on target.
If a temporary failure to create the target topic is encountered (e.g.
insufficient number of brokers), on the next refresh the target topic
creation will be re-attempted.
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@uk.ibm.com>
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Removes references to the old scala Acl classes from kafka.security.auth (Acl, Operation, ResourceType, Resource etc.) and replaces these with the Java API. Only the old SimpleAclAuthorizer, AuthorizerWrapper used to wrap legacy authorizer instances and tests using SimpleAclAuthorizer continue to use the old API. Deprecates the old scala API.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Add a new method to KafkaStreams to return an estimate of the lags for
all partitions of all local stores.
Implements: KIP-535
Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com>
Reviewed-by: John Roesler <vvcephei@apache.org>
During a reassignment, it can happen that the current leader of a partition is demoted and removed from the replica set at the same time. In this case, we rely on the StopReplica request in order to stop replica fetchers and to clear the group coordinator cache. This patch adds similar logic to ensure that the transaction coordinator state cache also gets cleared.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Add a new overload of KafkaStreams#store that allows users
to query standby and restoring stores in addition to active ones.
Closes: #7962
Implements: KIP-535
Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com>
Reviewed-by: John Roesler <vvcephei@apache.org>
Deprecate existing metadata query APIs in favor of new
ones that include standby hosts as well as partition
information.
Closes: #7960
Implements: KIP-535
Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com>
Reviewed-by: John Roesler <vvcephei@apache.org>
To be able to correctly fence zombie producer txn commit, we propose to add (member.id, group.instance.id, generation) into the transaction commit protocol to raise the same level of correctness guarantee as consumer commit.
Major changes involve:
1. Upgrade transaction commit protocol with (member.id, group.instance.id, generation). The client will fail if the broker is not supporting the new protocol.
2. Refactor group coordinator logic to handle new txn commit errors such as FENCED_INSTANCE_ID, UNKNOWN_MEMBER_ID and ILLEGAL_GENERATION. We loose the check on transaction commit when the member.id is set to empty. This is because the member.id check is an add-on safety for producer commit, and we also need to consider backward compatibility for old producer clients without member.id information. And if producer equips with group.instance.id, then it must provide a valid member.id (not empty definitely), the same as a consumer commit.
Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This fix makes the LogCleaner tolerant of gaps in the offset sequence. Previously, this could lead to endless loops of cleaning which required manual intervention.
Reviewers: Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>
Check for ISR updates using ISR read lock and acquire ISR write lock only if ISR needs to be updated. This avoids lock contention between request handler threads processing log appends on the leader holding the ISR read lock and request handler threads processing replica fetch requests that check/update ISR.
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Let consumer back-off and retry offset fetch when the specific offset topic has pending commits.
The major change lies in the broker side offset fetch logic, where a request configured with flag WaitTransaction to true will be required to back-off when some pending transactional commit is ongoing. This prevents any ongoing transaction being modified by third party, thus guaranteeing the correctness with input partition writer shuffling.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Loop call Consumer.endOffsets Throw TimeoutException: Failed to get offsets by times in 30000ms after a leader change.
Reviewers: Steven Lu, Guozhang Wang <wangguoz@gmail.com>
Prior to this commit, broker's Selector used to read all requests available on the socket when the socket is ready for read. These are queued up as staged receives. This can result in excessive memory usage when socket read buffer size is large. We mute the channel and stop reading any more data until all the staged requests are processed. This behaviour is slightly inconsistent since for the initial read we drain the socket buffer, allowing it to get filled up again, but if data arrives slighly after the initial read, then we dont read from the socket buffer until pending requests are processed.
To avoid holding onto requests for longer than required, this commit removes staged receives and reads one request at a time even if more data is available in the socket buffer. This is especially useful for produce requests which may be large and may take long to process. Additional data read from the socket for SSL is limited to the pre-allocated intermediate SSL buffers.
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
When `kafka-topics.sh` is used with the --zookeeper option, it prints a confirmation message. This patch adds the same message when --bootstrap-server is used.
Reviewers: Jason Gustafson <jason@confluent.io>