Dynamic metrics reporter updates described in KIP-226. This includes:
- Addition and removal of metrics reporters
- Reconfiguration of custom metrics reporter configs
- Tests for metrics reporter updates at default cluster-level and as per-broker config for testing
Reviewers: Jason Gustafson <jason@confluent.io>
This keeps a separate count of the number of in flight requests so that sensor threads will not need to deal with ConcurrentModfiicationException.
This would probably still be correct with volatile rather than AtomicInteger, but FindBugs flags the use of volatile as the count is incremented and decremented.
Author: Sean McCauliff <smccauliff@linkedin.com>
Reviewers: Jiangjie (Becket) Qin <becket.qin@gmail.com>
Closes#4460 from smccauliff/KAFKA-6345
The org.apache.kafka.common.utils.Base64 class defers Base64 encoding/decoding to the java.util.Base64 class beginning with JRE 1.8 but leverages javax.xml.bind.DatatypeConverter under JRE 1.7. The implementation of the encodeToString(bytes[]) method returned under JRE 1.7 by Base64.urlEncoderNoPadding() blindly removed the last two trailing characters of the Base64 encoding under the assumption that they would always be the string "==" but that is incorrect; padding can be "=", "==", or non-existent. This commit fixes that problem.
The commit also adds a Base64.urlDecoder() method that defers to java.util.Base64 under JRE 1.8+ but leverages javax.xml.bind.DatatypeConverter under JRE 1.7.
Finally, there is a unit test to confirm that encode/decode are inverses in both the Base64 and Base64URL cases.
* KAFKA-3625: Add public test utils for Kafka Streams
- add new artifact test-utils
- add TopologyTestDriver
- add MockTime, TestRecord, add TestRecordFactory
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>
In MetadataResponse deserialization, if the partition leader key is set
to -1, the leader is set to null. The MetadataResponse#toStruct code
should handle this correctly as well.
Also fix a case in KafkaApis where we were not taking into account the
possibility of the leader being null.
RequestResponseTest should test this as well.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
This is something I did after my working hours, I would ask people reviewing this do the same, don't take time for this during your work hours.
I try to keep such a PR as limited as possible, for clarity of reading.
==========
Using an empty string concat in order to achieve the String representation of the value you want is bad for 2 reasons, as explained here: (https://stackoverflow.com/questions/1572708/is-conversion-to-string-using-int-value-bad-practice
Readability: it shows what you're trying to do.
Depending on your compiler, it might attempt to create your String by first creating a StringBuffer, appending your value to it and then doing .toString() on that. Which is inefficient.
Also, the Metrics.java file had an empty string being added for the sole reason that the page width forced a string to continue on a new line. Removed that.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Enable dynamic broker configuration (see KIP-226 for details). Includes
- Base implementation to allow specific broker configs and custom configs to be dynamically updated
- Extend DescribeConfigsRequest/Response to return all synonym configs and their sources in the order of precedence
- Extend AdminClient to alter dynamic broker configs
- Dynamic update of SSL keystores
Reviewers: Ted Yu <yuzhihong@gmail.com>, Jason Gustafson <jason@confluent.io>
When the coordinator is marked unknown, we explicitly disconnect its connection and cancel pending requests. Currently the disconnect happens before the coordinator state is set to null, which means that callbacks which inspect the coordinator state will see it still as active. If there are offset commit requests which need to be cancelled, each request callback will inspect the coordinator state and attempt to mark the coordinator dead again. In the worst case, if there are many pending offset commit requests that need to be cancelled, this can cause a stack overflow. To fix the problem, we need to set the coordinator to null prior to cancelling pending requests.
I have added a test case which reproduced the stack overflow with many pending offset commits. I have also added a basic test case to verify that callbacks for in-flight or unsent requests see the coordinator as unknown which prevents them from attempting to resend.
This patch also includes some minor cleanups which were noticed along the way.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
- Add capability to create delegation token
- Add authentication based on delegation token.
- Add capability to renew/expire delegation tokens.
- Add units tests and integration tests
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#3616 from omkreddy/KAFKA-4541
This is the implementation of KIP-225.
It marks the previous metrics as deprecated in the documentation and adds new metrics using tags.
Testing verifies that both the new and the old metric report the same value.
Author: cmolter <cmolter@apple.com>
Reviewers: Jiangjie (Becket) Qin <becket.qin@gmail.com>
Closes#4362 from lahabana/kafka-5890
* Implement MockAdminClient.deleteTopics
* Use MockAdminClient instead of MockKafkaAdminClientEnv in StreamsResetterTest
* Rename MockKafkaAdminClientEnv to AdminClientUnitTestEnv
* Use MockAdminClient instead of MockKafkaAdminClientEnv in TopicAdminTest
* Rename KafkaAdminClient to AdminClientUnitTestEnv in KafkaAdminClientTest.java
* Migrate StreamThreadTest to MockAdminClient
* Fix style errors
* Address review comments
* Fix MockAdminClient call
Reviewers: Matthias J. Sax <matthias@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
LogContext to have two different implementations of Logger. One will be picked based on availability of LocationAwareLogger API.
Reviewers: Jason Gustafson <jason@confluent.io>
* Return offset of next record of records left after restore completed
* Changed check for restoring partition to remove the "+1" in the guard condition
Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
We should remove the map entry from mbeans if it becomes
empty during metric removal.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Satish Duggana <satish.duggana@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Utils with static methods should not be instantiated, hence marking the classes `final` and adding a `private` constructor.
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
1. Create default internal topic configs in StreamsConfig, especially for repartition topics change the segment size and time to smaller value.
2. Consolidate the default internal topic settings to InternalTopicManager and simplify InternalTopicConfig correspondingly.
3. Add an integration test for purging data.
4. MINOR: change TopologyBuilderException to IllegalStateException in StreamPartitionAssignor (part of https://issues.apache.org/jira/browse/KAFKA-5660).
Here are a few public facing APIs that get added:
1. AbstractConfig#originalsWithPrefix(String prefix, boolean strip): this for simplify the logic of passing admin and topic prefixed configs to consumer properties.
2. KafkaStreams constructor with Time object for convienent mocking in tests.
Will update KIP-204 accordingly if people re-votes these changes.
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#4315 from guozhangwang/K6150-segment-size
When consumer uses plaintext and there is remaining data in consumer's buffer, consumer.poll() will read all data available from the socket buffer to consumer buffer. However, if consumer uses ssl and there is remaining data, consumer.poll() may only read 16 KB (the size of SslTransportLayer.appReadBuffer) from socket buffer. This will reduce efficient of consumer.poll() by asking user to call more poll() to get the same amount of data.
Furthermore, we observe that for users who naively sleep a constant time after each consumer.poll(), some partition will lag behind after they switch from plaintext to ssl. Here is the explanation why this can happen.
Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of these partitions are all different and consumer is consuming these 10 partitions. Let's also assume that socket read buffer size is large enough and consume sleeps 1 sec between consumer.poll(). 1 sec is long enough for consumer to receive the FetchResponse back from broker.
When consumer uses plaintext, each consumer.poll() will read all data from the socket buffer and it means 1 MB data is read from each partition.
When consumer uses ssl, each consumer.poll() is likely to find that there is some data available in the memory. In this case consumer only reads 16 KB data from other sockets, particularly the socket for the broker with the large partition. Then the throughput of the large partition will be limited to 16KB/sec.
Arguably user should not sleep 1 sec if its consumer is lagging behind. But on Kafka dev side it is nice to keep the previous behavior and optimize consumer.poll() to read as much data from socket as possible.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jason Gustafson <jason@confluent.io>
Closes#4248 from lindong28/KAFKA-6258
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4335 from mjsax/minor-improve-KafkaStreams-javadocs
This is required for ACLs where SSL principals contain
special characters (e.g. comma) that are escaped using
backslash. The strings need to be quoted for JSON to
ensure that the JSON stored in ZK is valid.
Also converted `SslEndToEndAuthorizationTest` to use a
principal with special characters to ensure that this
path is tested.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4303 from rajinisivaram/KAFKA-6319
- Rename `delete()` to `deleteIfExists()` in `LogSegment`, `AbstractIndex`
and `TxnIndex`. Throw exception in case of IO errors for more informative
errors and to make it less likely that errors are ignored, `boolean` is used
for the case where the file does not exist (like `Files.deleteIfExists()`).
- Fix an instance of delete while open (should fix KAFKA-6322 and
KAFKA-6075).
- `LogSegment.deleteIfExists` no longer throws an exception if any of
the files it tries to delete does not exist (fixes KAFKA-6194).
- Remove unnecessary `FileChannel.force(true)` when deleting file.
- Introduce `LogSegment.open()` and use it to improve encapsulation
and reduce duplication.
- Expand functionality of `LogSegment.onBecomeInactiveSegment()`
to reduce duplication and improve encapsulation.
- Use `AbstractIndex.deleteIfExists()` instead of deleting files manually.
- Improve logging when deleting swap files.
- Use CorruptIndexException instead of IllegalArgumentException.
- Simplify `LogCleaner.cleanSegments()` to reduce duplication and
improve encapsulation.
- A few other clean-ups in Log, LogSegment, etc.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Ted Yu <yuzhihong@gmail.com>
Closes#4040 from ijuma/kafka-5829-follow-up
- set auto.offset.reste to "none" for restore and global consumer
- handle InvalidOffsetException for restore and global consumer
- add corresponding tests
- some minor cleanup
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com, Bill Bejeck <bill@confluent.io>, GuozhangWang <wangguoz@gmail.com>
Closes#4215 from mjsax/kafka-6121-restore-global-consumer-handle-reset
The NetworkClient internally ApiVersion requests to each broker following connection establishment. If this request happens to fail (perhaps due to an incompatible broker), the NetworkClient includes the response in the result of poll(). Applications will generally not be expecting this response which may lead to failed assertions (or in the case of AdminClient, an obscure log message).
I've added test cases which await the ApiVersion request sent by NetworkClient to be in-flight, and then disconnect the connection and verify that the response is not included from poll().
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4280 from hachikuji/KAFKA-6289
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#4242 from mjsax/kafka-4857-admit-client
- Ensure that `partitionsBeingReassigned` is fully populated before
`removePartitionFromReassignedPartitions` is invoked. This is
necessary to avoid premature deletion of the `reassign_partitions`
znode.
- Modify and add tests to verify the fixes.
- Add documentation.
- Use `info` log message if assignedReplicas == newReplicas and
remove control flow based on exceptions.
- General logging improvements.
- Simplify `initializePartitionAssignment` by relying on logic already
present in `maybeTriggerPartitionReassignment`.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#4283 from ijuma/kafka-6193-flaky-shouldPerformMultipleReassignmentOperationsOverVariousTopics
From 0.11 to 1.0, we moved `DescribeClusterOptions timeoutMs(Integer timeoutMs)` from
DescribeClusterOptions to AbstractOptions (similarly for other Options classes). This can
cause code compiled against 0.11.0.x to fail when it is executed with 1.0 kafka-clients jar.
This patch adds back these methods to restore binary compatibility with 0.11.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4257 from lindong28/KAFKA-6174
1. Add the repartition topics information into ProcessorTopology: personally I do not like leaking this information into the topology but it seems not other simple way around.
2. StreamTask: added one more function to expose the consumed offsets from repartition topics only.
3. TaskManager: use the AdminClient to send the gathered offsets to delete only if a) previous call has completed and client intentionally ignore-and-log any errors, or b) no requests have ever called before.
NOTE that this code depends on the assumption that purge is only called right after the commit has succeeded, hence we presume all consumed offsets are committed.
4. MINOR: Added a few more constructor for ProcessorTopology for cleaner unit tests.
5. MINOR: Extracted MockStateStore out of the deprecated class.
6. MINOR: Made a pass over some unit test classes for clean ups.
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#4270 from guozhangwang/K6150-purge-repartition-topics
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4284 from mjsax/minor-improve-eos-docs
When a socket is closed, we must remove corresponding selection keys from internal collections. This fixes an NPE which is caused by attempting to access the selection key's attached channel after it had been cleared after disconnecting.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4276 from hachikuji/KAFKA-6260
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#3912 from mjsax/kafka-5936-producer-close
Only expect responseAsString to be set if request logging is
enabled _and_ responseSend is defined.
Also fixed a couple of issues that would manifest themselves
if trace logging is enabled:
- `MemoryRecords.toString` should not throw exception if data is corrupted
- Generate `responseString` correctly if unsupported api versions request is
received.
Unit tests were added for every issue fixed. Also changed
SocketServerTest to run with trace logging enabled as
request logging breakage has been a common issue.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4250 from ijuma/fix-issues-when-trace-logging-is-enabled
1. Add The AdminClient into Kafka Streams, which is shared among all the threads.
2. Add ADMIN_PREFIX to StreamsConfig.
3. Also made a few tweaks on the metrics of the AdminClient, which is slightly different from the StreamsKafkaClient (note these changes will not be reflected in this PR but only take place when we eventually replace StreamsKafkaClient):
3.1. "clientId" tag will be set as "clientId-admin": in StreamsKafkaClient it is whatever user sets, and hence could even be null.
3.2. "groupPrefix" will be set as "admin-client": in StreamsKafkaClient it will be "kafka-client".
So the metrics from `StreamsKafkaClient` to `AdminClient` would be changed from
`kafka.admin.client:type=kafka-client-metrics,client-id=`
to
`kafka.admin.client:type=admin-client-metrics,client-id=myApp-UUID-admin`
Author: Guozhang Wang <wangguoz@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ted Yu <yuzhihong@gmail.com>
Closes#4211 from guozhangwang/K6170-admin-client
Implements KIP-224:
- adding new StreamsConfig `retires`
- uses `retires` and `retry.backoff.ms` to handle TimeoutException in GlobalStateManager
- adds two new tests to trigger TimeoutException in global consumer
- some minor code cleanup to reduce number of parameters we need to pass around
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Closes#4206 from mjsax/kafka-6122-global-consumer-timeout-exception
This is the PR related to the [KIP-204](https://cwiki.apache.org/confluence/display/KAFKA/KIP-204+%3A+Adding+records+deletion+operation+to+the+new+Admin+Client+API) in order to add the `deleteRecords` operation to the new Admin Client (it's already available in the "legacy" one).
Other than that, unit test and integration tests are added as well (such integration tests come from the "legacy" integration tests in order to test the new addition in the same way as the "legacy" one).
Author: Paolo Patierno <ppatierno@live.com>
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4132 from ppatierno/kafka-5925
Added unit test for ApiVersion and testApiVersions from
Scala to Java.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4220 from ijuma/kafka-6210-iae-if-1.0.0-inter-broker-protocol-version