When choosing the next record to process, we should look at the head record's timestamp of each partition and choose the lowest rather than choosing the lowest of the partition's streamtime.
This change effectively makes RecordQueue return the timestamp of the head record rather than its streamtime. Streamtime is removed (replaced) from RecordQueue as it was only being tracked in order to choose the next partition to poll from.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>
To make static consumer group members more persistent, we want to avoid kicking out unjoined members through rebalance timeout. Essentially we allow static members to participate in a rebalance using their old subscription without sending a JoinGroup. The only catch is that an unjoined static member might be the current group leader, and we may need to elect a different leader.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
See https://cwiki.apache.org/confluence/display/KAFKA/KIP-449%3A+Add+connector+contexts+to+Connect+worker+logs
Added LoggingContext as a simple mechanism to set and unset Mapped Diagnostic Contexts (MDC) in the loggers to provide for each thread useful parameters that can be used within the logging configuration. MDC avoids having to modify lots of log statements, since the parameters are available to all log statements issued by the thread, no matter what class makes those calls.
The design intentionally minimizes the number of changes to any existing classes, and does not use Java 8 features so it can be easily backported if desired, although per this KIP it will be applied initially only in AK 2.3 and later and must be enabled via the Log4J configuration.
Reviewers: Jason Gustafson <jason@conflent.io>, Guozhang Wang <wangguoz@gmail.com>
WorkerSourceTask is catching the exception from wrong package org.apache.kafka.common.errors. It is not clear from the API standpoint as to which package the connect framework supports - the one from common or connect. The safest thing would be to support both the packages even though it's less desirable.
Author: Magesh Nandakumar <magesh.n.kumar@gmail.com>
Reviewers: Arjun Satish <arjun@confluent.io>, Randall Hauch <rhauch@gmail.com>
SslFactory: split the part of SslFactory that creates SSLEngine instances into SslEngineBuilder. When (re)configuring, we simply create a new SslEngineBuilder. This allows us to make all the builder fields immutable. It also simplifies the logic for reconfiguring. Because we sometimes need to test old SslEngine instances against new ones, being able to use both the old and the new builder at once is useful.
Create an enum named SslClientAuth which encodes the possible values for ssl.client.auth. This will simplify the handling of this configuration.
SslTransportLayer#maybeProcessHandshakeFailure should treat an SSLHandshakeException with a "Received fatal alert" message as a handshake error (and therefore an authentication error.)
SslFactoryTest: add some line breaks for very long lines.
ConfigCommand#main: when terminating the command due to an uncaught exception, log the exception using debug level in slf4j, in addition to printing it to stderr. This makes it easier to debug failing junit tests, where stderr may not be kept, or may be reordered with respect to other slf4j messages. The use of debug level is consistent with how we handle other types of exceptions in ConfigCommand#main.
StateChangeLogMerger#main: spell out the full name of scala.io.Source rather than abbreviating it as io.Source. This makes it clearer that it is part of the Scala standard library. It also avoids compiler errors when other libraries whose groupId starts with "io" are used in the broker.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Enable reconfiguration of SSL keystores and truststores in client-side channel builders used by brokers for controller, transaction coordinator and replica fetchers. This enables brokers using TLS mutual authentication for inter-broker listener to use short-lived certs that may be updated before expiry without restarting brokers.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Updated AbstractConfig to be able to resolve variables in config values when the configuration includes config provider properties.
Author: Tejal Adsul <tejal@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@gmail.com>, Randall Hauch <rhauch@gmail.com>
`ConsumerTopicCreationTest` relied on `KafkaConsumer#poll` to send a `MetadataRequest` within 100ms to verify if a topic is auto created or not. This is brittle and does not guarantee if the request made it to the broker or was processed successfully. This PR fixes the flaky test by adding another topic; we wait until we consume a previously produced record to this topic. This ensures MetadataRequest was processed and we could then check if the topic we're interested in was created or not.
Reviewers: Boyang Chen <bchen11@outlook.com>, Jason Gustafson <jason@confluent.io>
MINOR: update documentation for the log cleaner max compaction lag feature (KIP-354) implemented in KAFKA-7321
Author: Xiongqi Wu <xiowu@linkedin.com>
Reviewer: Joel Koshy <jjkoshy@gmail.com>
The main problem we are trying to solve here is the batching of StopReplica requests and the lack of test coverage for `ControllerChannelManager`. Addressing the first problem was straightforward, but the second problem required quite a bit of work because of the dependence on `KafkaController` for all of the events. It seemed to make sense to separate the events from the processing of events so that we could remove this dependence and improve testability. With the refactoring, I was able to add test cases covering most of the logic in `ControllerChannelManager` including the generation of requests and the expected response handling logic. Note that I have not actually changed any of the event handling logic in `KafkaController`.
While refactoring this logic, I found that the event queue time metric was not being correctly computed. The problem is that many of the controller events were singleton objects which inherited the `enqueueTimeMs` field from the `ControllerEvent` trait. This would never get updated, so queue time would be skewed.
Reviewers: Jun Rao <junrao@gmail.com>
Currently in maybeFailWithError, we always wrap the lastError as a KafkaException. For ProducerFencedException however, we should just throw the exception itself; however we throw a new instance since the previous book-kept one's call trace is not from this call, and hence could be confusing.
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@conflent.io>
KAFKA-7321: Add a Maximum Log Compaction Lag (KIP-354)
Records become eligible for compaction after the specified time interval.
Author: Xiongqi Wu <xiowu@linkedin.com>
Reviewer: Joel Koshy <jjkoshy@gmail.com>
The log cleaner attempts to preserve the last entry for each producerId in order to ensure that sequence/epoch state is not lost. The current validation checks only the last sequence number for each producerId in order to decide whether a batch should be retained. There are two problems with this:
1. Sequence numbers are not unique alone. It is the tuple of sequence number and epoch which is uniquely defined.
2. The group coordinator always writes batches beginning with sequence number 0, which means there could be many batches which have the same sequence number.
The complete fix for the second issue would probably add proper sequence number bookkeeping in the coordinator. For now, we have left the coordinator implementation unchanged and changed the cleaner logic to use the last offset written by a producer instead of the last sequence number.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
For session-windows, the result record should have the window-end timestamp as record timestamp.
Rebased to resolve merge conflicts. Removed unused classes TupleForwarder and ForwardingCacheFlushListener (replace with TimestampedTupleForwarder, SessionTupleForwarder, TimestampedCacheFlushListerner, and SessionCacheFlushListener)
Reviewers: John Roesler <john@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Consolidated the unit tests by having {RocksDB/InMemory}{Window/Session}StoreTest extend {Window/Session}BytesStoreTest. Besides some implementation-specific tests (eg involving segment maintenance) all tests were moved to the abstract XXXBytesStoreTest class. The test coverage now is a superset of the original test coverage for each store type.
The only difference made to existing tests (besides moving them) was to switch from list-based equality comparison to set based, in order to reflect that the stores make no guarantees regarding the ordering of records returned from a range fetch.
There are some implementation-specific tests that were left in the corresponding test class. The RocksDBWindowStoreTest, for example, had several tests pertaining to segments and/or the underlying filesystem. Another key difference is that the in-memory versions should delete expired records aggressively, while the RocksDB versions should only remove entirely expired segments.
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
When Kafka Connect does not have cluster ACLs to create topics,
it fails to even access its internal topics which already exist.
This was originally fixed in KAFKA-6250 by ignoring the cluster
authorization error, but now Kafka 2.0 returns a different response
code that corresponds to a different error. Add a patch to ignore this
new error as well.
Reviewers: Jason Gustafson <jason@confluent.io>
Due to the lack of a blank line, a code section in TROGDOR.md
is not properly rendered. This PR fixes it.
Reviewers: Jason Gustafson <jason@confluent.io>
Corrects the system tests to check for either a 404 or a 409 error and sleeping until the Connect REST API becomes available. This corrects a previous change to how REST extensions are initialized (#6651), which added the ability of Connect throwing a 404 if the resources are not yet started. The integration tests were already looking for 409.
Author: Magesh Nandakumar <magesh.n.kumar@gmail.com>
Reviewer: Randall Hauch <rhauch@gmail.com>
Contrary to the previous explanation, a command example in
vagrant/README.md lacks the option to specify the aws provider.
Author: Kengo Seki <sekikn@apache.org>
Reviewers: Gwen Shapira
Closes#6702 from sekikn/add-missing-option
For now, `vagrant/vagrant-up.sh --aws` fails because
the `vagrant hostmanager` command in that script lacks
the `--aws` option. This PR adds it.
I ran `vagrant/vagrant-up.sh --aws` with and without
`--no-parallel` option and confirmed both worked
as expected.
Author: Kengo Seki <sekikn@apache.org>
Reviewers: Gwen Shapira
Closes#6703 from sekikn/KAFKA-8344
this PR will fix a typo related to docs:
http://kafka.apache.org/21/documentation.html#rep-throttle
```bash
$ bin/kafka-reassign-partitions.sh --zookeeper myhost:2181--execute --reassignment-json-file bigger-cluster.json —throttle 50000000
```
I think `myhost:2181` should be `localhost:2181` and followed by a `space`
Author: opera443399 <pc@pcswo.com>
Reviewers: Gwen Shapira
Closes#6704 from opera443399/docs-ops-typo
The debug log lines in the `Plugins` class that log header and key/value converter configurations should be altered as the configurations for these converters may contain secrets that should not be logged in plaintext. Instead, only the keys for these configs are safe to expose.
Author: Chris Egerton <cegerton@oberlin.edu>
Reviewer: Randall Hauch <rhauch@gmail.com>
Expand ConnectClusterState interface and implementation with methods that provide the immutable cluster details and the connector configuration. This includes unit tests for the new methods.
Author: Chris Egerton <cegerton@oberlin.edu>
Reviews: Arjun Satish <arjun@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
Following KIP-453, this PR adds a default close() method to the RocksDBConfigSetter interface and calls it when closing a store.
Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Bruno Cadonna <bruno@confluent.io>
This patch adds support to retry all group operations after COORDINATOR_LOAD_IN_PROGRESS and COORDINATOR_NOT_AVAILABLE in AdminClient group operations. Previously we only had logic to retry after FindCoordinator failures.
Reviewers: Yishun Guan <gyishun@gmail.com>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>
Because of how conversions between Java collections and Scala collections work, ImplicitLinkedHashMultiSet objects were being treated as unordered in some contexts where they shouldn't be. This broke JOIN_GROUP handling.
This patch renames ImplicitLinkedHashMultiSet to ImplicitLinkedHashMultCollection. The order of Collection objects will be preserved when converting to scala. Adding Set and List "views" to the Collection gives us a more elegant way of accessing that functionality when needed.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <john@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>
* Fixed bug in Struct.equals where we returned prematurely and added tests
* Update RequestResponseTest to check that `equals` and `hashCode` of
the struct is the same after serialization/deserialization only when possible.
* Use `Objects.equals` and `Long.hashCode` to simplify code
* Removed deprecated usages of `JUnitTestSuite`
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Implements KIP-361 to provide a consumer configuration to specify whether subscribing or assigning a non-existent topic would result in it being automatically created or not.
Reviewers: Jason Gustafson <jason@confluent.io>
Adds recording of close of a stand-by task to the task-closed metric
Adds unit tests to verify the recording
Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>
KAFKA-7903: automatically generate OffsetCommitRequest (#6583) introduced a change that cause consumer breakage when OffsetCommitResponse versions < 3 are parsed, as they do not include a throttle_time_ms field. This PR fixes the parsing by supplying the correct version to the OffsetCommitResponse constructor in AbstractResponse.parseResponse.
I have tested this change against many of the compatibility system tests, and it has fixed all the failures that I have tested thus far.
Author: Lucas Bradstreet <lucas@confluent.io>
Reviewers: Gwen Shapira, Boyang Chen
Closes#6698 from lbradstreet/offset-commit-response-throttle-field
Part of KIP-345 effort. The strategy is to extract user passed in group.instance.id config and pass it in with given thread-id (because consumer is currently per-thread level).
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Fix registration of Connect REST extensions to prevent deadlocks when extensions get the list of connectors before the herder is available. Added integration test to check the behavior.
Author: Chris Egerton <cegerton@oberlin.edu>
Reviewers: Arjun Satish <arjun@confluent.io>, Randall Hauch <rhauch@gmail.com>
If a node is currently throttled, we should take it out of the running for `leastLoadedNode`. Additionally, current logic seems to favor connecting to new nodes rather than using existing connections which have one or more in flight requests. The javadoc is slightly vague about whether this is expected, but it seems not.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>