Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Damian Guy <damian@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Changed WorkerSinkTaskContext to only resume the consumer topic partitions when the connector/task is not in the paused state.
The context tracks the set of topic partitions that are explicitly paused/resumed by the connector, and when the WorkerSinkTask resumes the tasks it currently resumes all topic partitions *except* those that are still explicitly paused in the context. Therefore, the change above should result in the desired behavior.
Several debug statements were added to record when the context is called by the connector.
This can be backported to older releases, since this bug goes back to 0.10 or 0.9.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4716 from rhauch/kafka-6661
Some changes required to get the Streams system tests working via Docker
To test:
TC_PATHS="tests/kafkatest/tests/streams" bash tests/docker/run_tests.sh
That command will take about 3.5 hours, and should pass. Note there are a couple of ignored tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
In dsi-api.html, when reading from Kafka to a KTable, the doc says "In the case of a KStream.." where `Kstream` here is not correct. They should be `KTable`.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Prior to this patch, we caught some exceptions when executing the command, which meant that it would return with status code zero. This patch fixes this and makes the expected exit behavior explicit. Test cases have been added to verify the change.
Reviewers: Ismael Juma <ismael@juma.me.uk>
If there is lock contention while multiple threads check if a delayed operation may be completed (e.g. a produce request with acks=-1), the threads perform completion only if the lock is free, to avoid deadlocks. This leaves a timing window when an operation becomes ready to complete after another thread has acquired the lock and performed the check for completion, but not yet released the lock. The PR adds an additional flag to ensure that the operation is completed in this case.
Currently in KafkaAdminClient.describeTopics(), for each topic in the request, a complete map of cluster and errors will be constructed for every topic and partition. This unnecessarily increases the complexity of describeTopics() to O(n^2). This patch improves the complexity to O(n).
Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin Patrick McCabe <colin@cmccabe.xyz>, Jason Gustafson <jason@confluent.io>
AgentClient and CoordinatorClient should have the option of logging failures to custom log4j objects. There should also be builders for these objects, to make them easier to extend in the future.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Partition high watermark may become -1 if the initial value is out of range. This situation can occur during partition reassignment, for example. The bug was fixed and validated with unit test in this patch.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
As titled, not starting new transaction since during restoration producer would have not activity and hence may cause txn expiration. Also delay starting new txn in resuming until initializing topology.
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bill@confluent.io>
To build jar you need to specify `scalaVersion` instead of `scala_version`.
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Jason Gustafson <jason@confluent.io>
It's a critical bug that only affects the server, but we
don't have an easy way to use 3.4.11 for the client
only.
Reviewers: Jun Rao <junrao@gmail.com>, Damian Guy <damian.guy@gmail.com>
Ensures that ZK watch is set for each live broker for listener update notifications in the controller. Also avoids reading all brokers from ZooKeeper when a broker metadata is modified by passing in brokerId to BrokerModifications and reading only the updated broker.
The existing listener update test verifies both these changes. Earlier, the test did not detect missing watch for the last broker since metadata of all brokers were read from ZK (adding a watch for all) when any broker was updated.
Reviewers: Jun Rao <junrao@gmail.com>
This will allow us to trace leaked instances back to the job,
so that we can figure out what happened and fix the leak.
Reviewers: Ismael Juma <ismael@juma.me.uk>
`batch.baseOffset` is an expensive operation (even says so in its javadoc), and yet was called for every single record in a batch when loading offsets. This means that for N records in a gzipped batch, the entire batch will be unzipped N times. The fix is to compute and cache the base offset once as we decompress and process the batch.
Reviewers: Dong Lin <lindong28@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
It generates the producer payload (key and value) and makes sure that the values are
populated to target a realistic compression rate (0.3 - 0.4) if compression is used.
The generated payload is deterministic and can be replayed from a given position.
For now, all generated values are constant size, and key types can be configured
to be either null or 8 bytes.
Added messageSize parameter to producer spec, that specifies produced
key + message size.
Now that we have augmented WindowSerde with non-arg parameters, extract it out as part of the public APIs so that users who want to I/O windowed streams can use it. This is originally introduced by @vitaly-pushkar
This PR grows out to be a much larger one, as I found a few tech debts and bugs while working on it. Here is a summary of the PR:
Public API changes (I will propose a KIP after a first round of reviews):
Add TimeWindowedSerializer, TimeWindowedDeserializer, SessionWindowedSerializer, SessionWindowedDeserializer into o.a.k.streams.kstream. The serializers would implemented an internal WindowedSerializer interface for the serializeBaseKey function used in 3) below.
Add WindowedSerdes into o.a.k.streams.kstream. The reason to now add them into o.a.k.clients's Serdes is that it then needs dependency of streams.
Add "default.windowed.key.serde.inner" and "default.windowed.value.serde.inner" into StreamsConfig, used when "default.key.serde" is specified to use time or session windowed serde. Note this requires the serde class, not the type class.
Consolidated serde format from multiple classes, including SessionKeySerde.java for session, and WindowStoreUtils for time window, into SessionKeySchema and WindowKeySchema.
Bug fix: WindowedStreamPartitioner needs to consider both time window and session window serdes.
Removed RocksDBWindowBytesStore etc optimization since after KIP-182 all the serde know happens on metered store, hence this optimization is not worth.
Bug fix: for time window, the serdes used for store and the serdes used for piping (source and sink node) are different: the former needs to append sequence number but not for the later.
Other minor cleanups: remove unnecessary throws, etc.
Authors: Guozhang Wang <wangguoz@gmail.com>, Vitaly Pushkar <vitaly.pushkar@gmail.com>
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bill@confluent.io>, Xi Hu
Author: John Roeler <john@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>
If the result of a fetch from a Window Store results in a null byte array we should return null rather than passing it to the serde to deserialize.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Logging can get spammy during the reconnect blackout period because any requests we send to ConsumerNetworkClient will immediately be failed when poll() returns. This patch checks for connection failures prior to sending fetches and offset lookups and skips sending to any failed nodes. Test cases added for both.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Sorts TaskIds on first assignment evenly distributing tasks by topicGroupId should help with evening the load of work across topologies. This PR is an initial "strawman" approach which will be followed up (at a later date YTBD) by scoring or assigning weight to processing nodes to ensure even processing distribution.
Added a new test to existing unit test.
NetworkClient should use FIFO order when completing inflight requests following a disconnect.
I've added new unit tests for `InFlightRequests` and `NetworkClient` which verify completion order.
Reviewers: Jun Rao <junrao@gmail.com>
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Damian Guy <damian@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
We need to reset the auto-commit deadline after sending the offset commit request so that we do not resend it while the request is still inflight.
Added unit tests ensuring this behavior and proper backoff in the case of a failure.
Reviewers: Guozhang Wang <wangguoz@gmail.com>