This patch fixes a regression in the `StopReplica` response handling. We should only send the event on receiving the `StopReplica` response if we had requested deletion in the request.
Reviewers: Lucas Bradstreet <lucas@confluent.io>, Jason Gustafson <jason@confluent.io>
* `MockAdminClient` should behave the same way as `Admin` for `createTopics()`
* Changed from throwing an `IllegalArgumentException` to `InvalidReplicationFactorException` when `brokers.size() < replicationFactor`
Author: jeff kim <jeff.kim@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8617 from jeffkbkim/MockAdminClient-InvalidReplicationFactorException
In the case described in the JIRA, there was a 50%+ increase in the total fetch request rate in
2.4.0 due to this change.
I included a few additional clean-ups:
* Simplify `findPreferredReadReplica` and avoid unnecessary collection copies.
* Use `LongSupplier` instead of `Supplier<Long>` in `SubscriptionState` to avoid unnecessary boxing.
Added a unit test to ReplicaManagerTest and cleaned up the test class a bit including
consistent usage of Time in MockTimer and other components.
Reviewers: Gwen Shapira <gwen@confluent.io>, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
Avoid calling into ConfigCommand and TopicCommand from tests that are not related
to these commands. It's better to just invoke the admin APIs.
Change a few cases where we were testing the deprecated --zookeeper flag to testing
the --bootstrap-server flag instead. Unless we're explicitly testing the deprecated code
path, we should be using the non-deprecated flags.
Move testCreateWithUnspecifiedReplicationFactorAndPartitionsWithZkClient from
TopicCommandWithAdminClientTest.scala into TopicCommandWithZKClientTest.scala,
since it makes more sense in the latter.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This patch ensures that `SslEngineFactory` is closed. The default implementation (**DefaultSslEngineFactory**) does not have any releasable object so we didn't notice this issue. However, it would be better to fix this issue for custom engine factories.
Reviewers: Jason Gustafson <jason@confluent.io>
Comparing all other test cases, the shouldAllowConcurrentAccesses starts an async producer sending records throughout the test other than just synchronously sent and acked a few records before we start the streams application. Right after the streams app is started, we check that at least one record is sent to the output topic (i.e. completed processing). However since only this test starts the producer async and did not wait for it to complete, it is possible that the async producer gets too longer to produce some records and causing it to fail.
To follow what other tests did, I let this test to first send one round of records synchronously before starting the async producing.
Also encountered some new scala warnings that I fixed along with this PR.
Reviewers: Matthias J. Sax <matthias@confluent.io>
KafkaAdminClientTest.testAlterClientQuotas() is uncalled. It is clearly intended to be a test method, but lacks `Test`.
Author: Tom Bentley <tbentley@redhat.com>
Reviewers: Gwen Shapira, Brian Byrne
Closes#8456 from tombentley/MINOR-annotate-test-method
Adjust `checkLogAppendTimeNonCompressed` to assert
`shallowOffsetOfMaxTimestamp` correctly for message format 2.
Reviewers: Ismael Juma <ismael@juma.me.uk>
1. Added a recordInternal function to let all other public functions trigger, so that shouldRecord would only be checked once.
2. In Streams, pass along the current wall-clock time inside InternalProcessorContext when process / punctuate which can be passed in to the record function to reduce the calling frequency of SystemTime.milliseconds().
Reviewers: John Roesler <vvcephei@apache.org>
Fetches which hit purgatory are currently counted twice in fetch request rate metrics. This patch moves the metric update into `fetchMessages` so that they are only counted once.
Reviewers: Ismael Juma <ismael@juma.me.uk>
In the docs, the `baseTimestamp` term is used when `firstTimestamp` is defined in the on-disk format and implementation.
Reviewers: Jason Gustafson <jason@confluent.io>
This reverts commit 29e08fd2c2.
There turned out to be more than expected problems with adding the generic parameters.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Since we cannot guarantee to reassign the correct number of
stand-by tasks when reusing the previous assignment and the
reassignment is rather a micro-optimization, it is removed
to keep the algorithm correct and simple.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>
ConfigProvider extends Closeable, but were not closed in the following contexts:
* AbstractConfig
* WorkerConfigTransformer
* Worker
This commit ensures that ConfigProviders are close in the above contexts.
It also adds MockFileConfigProvider.assertClosed()
Gradle executes test classes concurrently, so MockFileConfigProvider
can't simply use a static field to hold its closure state.
Instead use a protocol whereby the MockFileConfigProvider is configured
with some unique ket identifying the test which also used when calling
assertClosed().
Reviewers: Konstantine Karantasis <konstantine@confluent.io>
This allows kafka tools to work on Cygwin where $JAVA_HOME typically contains a space (e.g. "C:\Program Files\Java\jdkXXX")
Author: sebwills <sw@sebwills.com>
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8481 from sebwills/patch-1
1. fix typo: `atleast` -> `at least`
2. add missing `--` from `--bootstrap-servers` argument to be consistent
3. rephrase a sentence, to make it more clear:
before: `LinkedIn is currently running JDK 1.8 u5 (looking to upgrade to a newer version) with the G1 collector`
It will misguide the users to use JDK 1.8 u5, while the JDK 1.8 u251 is already released, which will include many important bug fixes. I did some rephrase as below:
after: `At the time this is written, LinkedIn is running JDK 1.8 u5 (looking to upgrade to a newer version) with the G1 collector`
Reviewers: Konstantine Karantasis <konstantine@confluent.io>
Simple logging additions at TRACE level that should help when the worker can't get caught up to the end of an internal topic.
Reviewers: Gwen Shapira <cshapi@gmail.com>, Aakash Shah <ashah@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>
The href attribute missed the starting double quote, so the hyperlink is interpreted as https://docs.oracle.com/.../agent.html", with a redundant tailing double quote.
Add the missing starting double quote back to fix this issue.
Reviewers: Konstantine Karantasis <konstantine@confluent.io>
After KIP-219, responses are sent immediately and we rely on a combination
of clients and muting of the channel to throttle. The result of this is that
we need to track `apiThrottleTimeMs` as an explicit value instead of
inferring it. On the other hand, we no longer need
`apiRemoteCompleteTimeNanos`.
Extend `BaseQuotaTest` to verify that throttle time in the request channel
metrics are being set. Given the nature of the throttling numbers, the test
is not particularly precise.
I included a few clean-ups:
* Pass KafkaMetric to QuotaViolationException so that the caller doesn't
have to retrieve it from the metrics registry.
* Inline Supplier in SocketServer (use SAM).
* Reduce redundant `time.milliseconds` and `time.nanoseconds`calls.
* Use monotonic clock in ThrottledChannel and simplify `compareTo` method.
* Simplify `TimerTaskList.compareTo`.
* Consolidate the number of places where we update `apiLocalCompleteTimeNanos`
and `responseCompleteTimeNanos`.
* Added `toString` to ByteBufferSend` and `MultiRecordsSend`.
* Restrict access to methods in `QuotaTestClients` to expose only what we need
to.
Reviewers: Jun Rao <junrao@gmail.com>
Class kafka.examples.SimpleConsumerDemo was removed. But the java-simple-consumer-demo.sh was not removed and README was not updated.
This commit removes java-simple-consumer-demo.sh and updates the demo instructions in the examples README.
Author: Jiamei Xie <jiamei.xie@arm.com>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>
* The DeadLetterQueueReporter has a KafkaProducer that it must close to clean up resources
* Currently, the producer and its threads are leaked every time a task is stopped
* Responsibility for cleaning up ErrorReporters is transitively assigned to the
ProcessingContext, RetryWithToleranceOperator, and WorkerSinkTask/WorkerSinkTask classes
* One new unit test in ErrorReporterTest asserts that the producer is closed by the dlq reporter
Reviewers: Arjun Satish <arjun@confluent.io>, Chris Egerton <chrise@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>
If a broker contains 8k replicas, we would previously issue 8k ZK calls to retrieve topic
configs when processing the first LeaderAndIsr request. That should translate to 0 after
these changes.
Credit to @junrao for identifying the problem.
Reviewers: Jun Rao <junrao@gmail.com>
The downgrade test does not currently support 2.4 and 2.5. When you enable them, it fails as a result of consumer group static membership. This PR makes the downgrade test work with all of our released versions again.
Author: Lucas Bradstreet <lucas@confluent.io>
Reviewers: Boyang Chen, Gwen Shapira
Closes#8518 from lbradstreet/downgrade-test-2.4-2.5
This PR fixes and improves two major issues:
1. When calling KafkaStreams#store we can always get an InvalidStateStoreException, and even waiting for Streams state to become RUNNING is not sufficient (this is also how OptimizedKTableIntegrationTest failed). So I wrapped all the function with a Util wrapper that captures and retries on that exception.
2. While trouble-shooting this issue, I also realized a potential bug in test-util's produceKeyValuesSynchronously, which creates a new producer for each of the record to send in that batch --- i.e. if you are sending N records with a single call, within that call it will create N producers used to send one record each, which is very slow and costly.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>
* add a config to set the TaskAssignor
* set the default assignor to HighAvailabilityTaskAssignor
* fix broken tests (with some TODOs in the system tests)
Implements: KIP-441
Reviewers: Bruno Cadonna <bruno@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
These two options are essentially incompatible, as caching will do nothing to reduce downstream traffic and writes when it has to allow non-unique keys (skipping records where the value is also the same is a separate issue, see KIP-557). But enabling caching on a store that's configured to retain duplicates is actually more than just ineffective, and currently causes incorrect results.
We should just log a warning and disable caching whenever a store is retaining duplicates to avoid introducing a regression. Maybe when 3.0 comes around we should consider throwing an exception instead to alert the user more aggressively.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>
Fixed typos in two MM2 configs that define the replication factor for internal Connect topics.
Only a single test was affected.
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>
Makes the main thread wait for workers to be ready to test the
desired functionality before proceeding.
Reviewers: Ted Yu <yuzhihong@gmail.com>, Ismael Juma <ismael@juma.me.uk>
This PR updates the algorithm which limits the number of members within a group (`group.max.size`) to fix the following two issues:
1. As described in KAFKA-9885, we found out that multiple members of a group can be evicted if the leader of the consumer offset partition changes before the group is persisted. This happens because the current eviction logic always evict the first member rejoining the group.
2. We also found out that dynamic members, when required to have a known member id, are not always limited. The caveat is that the current logic only considers unknown members and uses the group size, which does not include the so called pending members, to accept or reject a member. In this case, when they rejoins, they are not unknown member anymore and thus could bypass the limit. See `testDynamicMembersJoinGroupWithMaxSizeAndRequiredKnownMember` for the whole scenario.
This PR changes the logic to address the above two issues and extends the tests coverage to cover all the member types.
Reviewers: Jason Gustafson <jason@confluent.io>
In this commit we made sure that the auto leader election only happens after the newly starter broker is in the isr.
No accompany tests are added due to the fact that:
this is a change to the private method and no public facing change is made
it is hard to create tests for this change without considerable effort
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jun Rao <junrao@gmail.com>
A broker throws IllegalStateException if the broker epoch in the LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest is larger than its current broker epoch. However, there is no guarantee that the broker would receive the latest broker epoch before the controller: when the broker registers with ZK, there are few more instructions to process before this broker "knows" about its epoch, while the controller may already get notified and send UPDATE_METADATA request (as an example) with the new epoch. This will result in clients getting stale metadata from this broker.
With this PR, a broker accepts LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest if the broker epoch is newer than the current epoch.
Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
Add an option to kafka-configs.sh `--add-config-file` that adds the configs from a properties file.
Testing: Added new tests to ConfigCommandTest.scala
Author: Aneel Nazareth <aneel@confluent.io>
Reviewers: David Jacot <djacot@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8184 from WanderingStar/KAFKA-9612