This removes the need to force object initialisation via hacks to register
the relevant Yammer metrics during start-up.
It also works around issues caused by tests that delete JVM-wide singleton
metrics (like `MetricsDuringTopicCreationDeletionTest`). Without this
change, they would never be registered again. After this change, they will
be registered again during KafkaServer start-up.
It would be even better not to rely on JVM side singleton metrics (like we do
for Kafka Metrics), but that's a bigger change that should be considered
separately.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3059 from ijuma/kafka-5244-broker-static-stats-and-controller-stats-as-classes
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3058 from hachikuji/KAFKA-5248
CachingSessionStore wasn't properly using the default keySerde if no Serde was supplied. I saw the below error in the logs for one of my test cases.
Author: Kyle Winkelman <kyle.winkelman@optum.com>
Reviewers: Damian Guy, Guozhang Wang
Closes#2963 from KyleWinkelman/CachingSessionStore-fix-keySerde-use
Producer id is used instead.
Also refactored TransactionLog schema code to follow
our naming convention and to have better structure.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#3041 from ijuma/eliminate-pid-terminology
RocksDBSessionStore wasn't properly using the default aggSerde if no Serde was supplied.
Author: Kyle Winkelman <kyle.winkelman@optum.com>
Reviewers: Damian Guy, Guozhang Wang
Closes#2971 from KyleWinkelman/RocksDBSessionStore-fix-aggSerde-use
The parse method was incorrectly referring to `ApiKeys.ADD_PARTITIONS_TO_TXN`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3056 from dguy/hotfix-add-offsets-to-txn-response
Without the histogram cleanup, the percentiles are calculated
incorrectly after purging of one or more samples: event counts
go out of sync with counts in histogram buckets, and bucket
with lower value gets chosen for the given quantile.
This change adds the necessary histogram cleanup.
Author: Ivan A. Melnikov <iv@altlinux.org>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#3002 from iv-m/kafka-5203-percentiles-fix
* Fix flakiness of `testBrokerTopicMetricsUnregisteredAfterDeletingTopic` by not
consuming messages. Filed KAFKA-5238 to track the issue that metrics for a deleted
topic may be re-created if there are fetch requests in the purgatory.
* Check the log size in `testBrokerTopicMetricsBytesInOut` before attempting to read
the `replicationBytesIn` metric. This helps understand where things have gone wrong
if if the metric has not increased (i.e. if it was an issue replicating or with the metric).
* Only remove the replication bytes in/out if the metrics are defined. This should not
affect the behaviour due to the tags, but it makes the code clearer. We've seen some
cases in Jenkins when the metric does not exist and it's still unclear how that can
happen.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3042 from ijuma/more-informative-assertion-for-flaky-metrics-test
This issue would only be triggered if a broker was restarted while
deletion was still taking place.
Included a few minor improvements to that method and its tests.
Author: Jaikiran Pai <jaikiran.pai@gmail.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3050 from jaikiran/KAFKA-5232-trunk
During the 0.11.0.0 cycle, a Java version of the class
was introduced so that Streams could use it. Given that
it includes the bulk of the functionality of the Scala
version of the class, it makes sense to consolidate them.
While doing this, I noticed that one of the tests for
the Java class (`shouldThrowOnInvalidTopicNames`) was
broken as it only checked if the first topic name in
the list was invalid.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3046 from ijuma/consolidate-topic-classes
`testBrokerTopicMetricsUnregisteredAfterDeletingTopic` seemed completely broken,
as it was creating a topic but producing/consuming to another one.
Authored with mpburg
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3034 from mimaison/Fix-testBrokerTopicMetricsBytesInOut
Author: Jeyhun Karimov <je.karimov@gmail.com>
Reviewers: Damian Guy, Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2466 from jeyhunkarimov/KAFKA-4144
The tests previously ignored the fact that the controller does not atomically
create the /controller znode and create/increment the /controller_epoch znode.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3038 from onurkaraman/KAFKA-5180
…it is accessed from multiple threads
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3000 from cmccabe/KAFKA-5198
This patch adds support for the `TxnOffsetCommitRequest` added in KIP-98. Desired handling for this request is [described here](https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#bookmark=id.55yzhvkppi6m) .
The functionality includes handling the stable state of receiving `TxnOffsetCommitRequests` and materializing results only when the commit marker for the transaction is received. It also handles partition emigration and immigration and rebuilds the required data structures on these events.
Tests are included for all the functionality.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2970 from apurvam/KAFKA-5160-broker-side-support-for-txnoffsetcommitrequest
1. Collapsed the `ownedPartitions`, `pendingTxnMap` and the `transactionMetadataCache` into a single in-memory structure, which is a two-layered map: first keyed by the transactionTxnLog, and then valued with the current coordinatorEpoch of that map plus another map keyed by the transactional id.
2. Use `transactionalId` across the modules in transactional coordinator, attach this id with the transactional marker entries.
3. Use two keys: `transactionalId` and `txnLogPartitionId` in the writeMarkerPurgatory as well as passing it along with the TxnMarkerEntry, so that `TransactionMarkerRequestCompletionHandler` can use it to access the two-layered map upon getting responses.
4. Use one queue per `broker-id` and `txnLogPartitionId`. Also when there is a possible update on the end point associated with the `broker-id`, update the Node without clearing the queue but relying on the requests to retry in the next round.
5. Centralize the error handling callback for appending-metadata-to-log and sending-markers-to-brokers in `TransactionStateManager#appendTransactionToLog`, and `TransactionMarkerChannelManager#addTxnMarkersToSend`.
6. Always update the in-memory transaction metadata AFTER the txn log has been appended and replicated, and then double check on the cache to make sure nothing has changed since log appending. The only exception is when initializing the pid for the first time, in which we will put a dummy into the cache but set its pendingState as `Empty` (so it will be valid to transit from `Empty` to `Empty`) so that it can be updated after the log append has completed.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Ismael Juma, Damian Guy, Jason Gustafson, Jun Rao
Closes#2964 from guozhangwang/K5130-refactor-tc-inmemory-cache
Also add tests and a few clean-ups.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#2937 from ijuma/metrics-recording-level-producer
Abort any ongoing transactions that haven't been touched for longer than the transaction timeout
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Jason Gustafson, Apurva Mehta, Ismael Juma, Guozhang Wang
Closes#2957 from dguy/kafka-5132
Also added 2 new metrics to account for incoming/outgoing traffic due to internal replication
- ReplicationBytesInPerSec
- ReplicationBytesOutPerSec
Author: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3003 from mimaison/KAFKA-5194
Suppose there are two valid records followed by one invalid records in the FetchResponse.PartitionData(). As of current implementation, PartitionRecords.fetchRecords(...) will throw exception without returning the two valid records. The next call to PartitionRecords.fetchRecords(...) will not return that two valid records either because the iterator has already moved across them.
We can fix this problem by defering exception to the next call of PartitionRecords.fetchRecords(...) if iterator has already moved across any valid record.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Jiangjie Qin <becket.qin@gmail.com>
Closes#2864 from lindong28/KAFKA-5078
Add broker shutdown for `LeaderEpochIntegrationTest`.
Move broker shutdown in other tests to `tearDown` to
ensure brokers are shutdown even if tests fail.
Also added assertion to `ZooKeeperTestHarness` to
verify that controller event thread is not running
since this thread may load JAAS configuration if ZK
ports are reused.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3026 from rajinisivaram/KAFKA-5173
A JoinGroupRequest V0 built with the Builder had
a rebalance timeout = -1 rather than equal to session timeout
as it would have been if coming from the wire and deserialized
from a V0 Struct
fix developed with mimaison
Author: Edoardo Comar <ecomar@uk.ibm.com>
Reviewers: Rajini Sivaram
Closes#2936 from edoardocomar/MINOR-JoinGroupRequestV0
Change the compression code used for the transaction log to UncompressedCoded as it fails during creation when the codec is set to NoCompressionCodec.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3022 from dguy/hotfix-tsm
Shutdown delayed delete purgatory thread, transaction marker purgatory thread and
send thread in `TransactionMarkerChannelManager` during broker shutdown.
Made `InterBrokerSendThread` interruptible so that it is shutdown.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3014 from rajinisivaram/KAFKA-5182
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3015 from apurvam/KAFKA-5213-illegalstateexception-in-ensureOpenForAppend
1. Call `closeSasl` in `MultipleListenersWithSameSecurityProtocolBaseTest`
2. Refactor the code to make it easier to reason about
3. Add an assert that may possibly help us narrow down how KAFKA-5184
can happen (it seems impossible).
4. Remove SaslTestHarness to make it easier to reason about setUp
and tearDown methods.
5. Fix *AdminClientIntegrationTest to have a single `tearDown`
6. Remove a *ReplicaFetcherTest and *TopicMetadataTest secure variants.
They are redundant from a security perspective given the consumer and
producer tests.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3010 from ijuma/kafka-5184-kafka-5173-sasl-issues
Replica deletion regressed from KIP-101. Replica deletion happens when a broker receives a StopReplicaRequest with delete=true. Ever since KAFKA-1911, replica deletion has been async, meaning the broker responds with a StopReplicaResponse simply after marking the replica directory as staged for deletion. This marking happens by moving a data log directory and its contents such as /tmp/kafka-logs1/t1-0 to a marked directory like /tmp/kafka-logs1/t1-0.8c9c4c0c61c44cc59ebeb00075a2a07f-delete, acting as a soft-delete. A scheduled thread later actually deletes the data. It appears that the regression occurs while the scheduled thread is actually trying to delete the data, which means the controller considers operations such as partition reassignment and topic deletion complete. But if you look at the log4j logs and data logs, you'll find that the soft-deleted data logs actually won't get deleted.
The bug is that upon log deletion, we attempt to flush the LeaderEpochFileCache to the original file location instead of the moved file location. Restarting the broker actually allows for the soft-deleted directories to get deleted.
This patch avoids the issue by simply not flushing the LeaderEpochFileCache upon log deletion.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2986 from onurkaraman/KAFKA-5099
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Gwen Shapira
Closes#2795 from ewencp/release-script and squashes the following commits:
f1d0590 [Ewen Cheslack-Postava] Don't expose promotion to the user since it is not implemented yet.
1a6947a [Ewen Cheslack-Postava] Handle cleanup if there's a failure during generation of release notes.
fa58401 [Ewen Cheslack-Postava] Fix hard-coded uses of trunk
639bcca [Ewen Cheslack-Postava] Try to cleanup after most failures.
a3a7245 [Ewen Cheslack-Postava] Fix SCRIPT_DIR to be an absolute path so git clones against the REPO work when it is also your cwd
de54c97 [Ewen Cheslack-Postava] Load/save preferences in a .release-settings.json file so you don't have to keep entering the same info repeatedly
b559a61 [Ewen Cheslack-Postava] Check that the user doesn't have any oustanding diffs before starting the rest of the script
ff0b330 [Ewen Cheslack-Postava] Store original starting branch to switch back to instead of using a default.
b793562 [Ewen Cheslack-Postava] Use 2.12 instead of specific Scala version so we use the default 2.12 version.
382b7f9 [Ewen Cheslack-Postava] MINOR: Add a release script that helps generate release candidates.
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Eno Thereska, Guozhang Wang
Closes#2931 from mjsax/kafka-5140-flaky-reset-integration-test
a KStream.to() sink is also a topic
... so the KStreamTestDriver to fetch it when required
Author: Wim Van Leuven <wim.vanleuven@bigboards.io>
Author: Wim Van Leuven <wim.vanleuven@highestpoint.biz>
Reviewers: Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2716 from wimvanleuven/KAFKA-4927
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Guozhang Wang
Closes#2951 from mjsax/kafka-5126-add-transactions-to-mock-producer
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: dan norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2977 from cmccabe/KAFKA-5176
This only removes deprecated methods,
fields and constructors in a small number of classes.
Deprecated producer configs is tracked via KAFKA-3353
and the old clients and related (tools, etc.) won't
be removed in 0.11.0.0.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2995 from ijuma/kafka-3763-remove-deprecated-0.11
These configs have been deprecated since 0.9.0.0:
block.on.buffer.full, metadata.fetch.timeout.ms and timeout.ms
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2987 from ijuma/kafka-3353-remove-deprecated-producer-configs