While using an iterator from IQ, it's possible to get an InvalidStateStoreException if the StreamThread closes the store during a range query.
Added a unit test to SegmentIteratorTest for this condition.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Changes to keep the operation name as is and make the sensor name unique.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
In #4919 we propagate the SerDes for each of these aggregation operators.
As @guozhangwang mentioned in that PR:
```
reduce: inherit the key and value serdes from the parent XXImpl class.
count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes.
aggregate: inherit the key serdes, do not set for value serdes internally.
```
Although it's all good for reduce and count, it is quiet unsafe to have aggregate without Materialized given. In fact I don't see why we would not give a Materialized for the aggregate since the result type will always be different (otherwise use reduce) and also the value Serde is simply not propagated.
This has been discussed previously in a broader PR before but I believe for aggregate we could pass implicitly a Materialized the same way we pass a Joined, just to avoid the stupid case. Then if the user wants to specialize, he can give his own Materialized.
Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>
Currently, the AbstractStream class defines a copy-constructor that allow to extend KStream and KTable APIs with new methods without impacting the public interface.
However adding new processor or/and store to the topology is made throught the internalTopologyBuilder that is not accessible from AbstractStream subclasses defined outside of the package (package visibility).
Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Add the new stricter-timeout version of `poll` proposed in KIP-266.
The pre-existing variant `poll(long timeout)` would block indefinitely for metadata
updates if they were needed, then it would issue a fetch and poll for `timeout` ms
for new records. The initial indefinite metadata block caused applications to become
stuck when the brokers became unavailable. The existence of the timeout parameter
made the indefinite block especially unintuitive.
This PR adds `poll(Duration timeout)` with the semantics:
1. iff a metadata update is needed:
1. send (asynchronous) metadata requests
2. poll for metadata responses (counts against timeout)
- if no response within timeout, **return an empty collection immediately**
2. if there is fetch data available, **return it immediately**
3. if there is no fetch request in flight, send fetch requests
4. poll for fetch responses (counts against timeout)
- if no response within timeout, **return an empty collection** (leaving async fetch request for the next poll)
- if we get a response, **return the response**
The old method, `poll(long timeout)` is deprecated, but we do not change its semantics, so it remains:
1. iff a metadata update is needed:
1. send (asynchronous) metadata requests
2. poll for metadata responses *indefinitely until we get it*
2. if there is fetch data available, **return it immediately**
3. if there is no fetch request in flight, send fetch requests
4. poll for fetch responses (counts against timeout)
- if no response within timeout, **return an empty collection** (leaving async fetch request for the next poll)
- if we get a response, **return the response**
One notable usage is prohibited by the new `poll`: previously, you could call `poll(0)` to block for metadata updates, for example to initialize the client, supposedly without fetching records. Note, though, that this behavior is not according to any contract, and there is no guarantee that `poll(0)` won't return records the first time it's called. Therefore, it has always been unsafe to ignore the response.
* Removed Scala producers, request classes, kafka.tools.ProducerPerformance, encoders,
tests.
* Updated ConsoleProducer to remove Scala producer support (removed `BaseProducer`
and several options that are not used by the Java producer).
* Updated a few Scala consumer tests to use the new producer (including a minor
refactor of `produceMessages` methods in `TestUtils`).
* Updated `ClientUtils.fetchTopicMetadata` to use `SimpleConsumer` instead of
`SyncProducer`.
* Removed `TestKafkaAppender` as it looks useless and it defined an `Encoder`.
* Minor import clean-ups
No new tests added since behaviour should remain the same after these changes.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Dong Lin <lindong28@gmail.com>
Closes#5045 from ijuma/kafka-6921-remove-old-producer
test_broker_type_bounce_at_start tries to validate that when the controller is down, the streams client will always fail trying to create the topic; with the current behavior of admin client it is actually not always true: the actual behavior depends on the admin client internals as well as when the controller becomes unavailable during the leader assign partitions phase. I'd suggest at least ignore this test for now until the admin client has more stable (personally I'd even suggest removing this test as its coverage benefits is smaller than its introduced issues to me).
Also adding a few more log4j entries as a result of investigating this issue.
Reviewers: Matthias J. Sax <matthias@confluent.io>
The type inference doesn't currently work for the join functions in Scala as it doesn't know yet the types of the given KStream[K, V] or KTable[K, V].
The fix here is to curry the joiner function. I personally prefer this notation but this also means it differs more from the Java API.
I believe the diff with the Java API is worth in this case as it's not only solving the type inference but also fits better the Scala way of coding (ex: fold).
Moreover any Scala dev will bug and spend little time on these functions trying to understand why the type inference is not working and then get frustrated to be obliged to be explicit here where it's not harmful to be inferred.
Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>, Ismael Juma <ismael@juma.me.uk>
This is a follow-up to #5022 which added documentation to the Processor
interface. This commit adds similar documentation to Transformer and
ValueTransformer.
Also, s/processor/transformer/ in the close() docs.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The wakeup-based strategy caused more problems than it
solved, so we'll instead focus on KIP-266.
Revert commit 2d8049b.
Keep the metrics addition and the new test util.
Also keep the tests for shutdown, although they must be ignored until
poll(Duration) is done in the scope of KIP-266.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. In InternalTopologyBuilder#topicGroups, which is used in StreamsPartitionAssignor, look for book-kept storeToChangelogTopic map before creating a new internal changelog topics. In this way if the source KTable is created, its source topic stored in storeToChangelogTopic will be used.
2. Added unit test (confirmed that without 1) it will fail).
3. MINOR: removed TODOs that are related to removed KStreamBuilder.
4. MINOR: removed TODOs in StreamsBuilderTest util functions and replaced with TopologyWrapper.
5. MINOR: removed StreamsBuilderTest#testFrom as it is already covered by TopologyTest#shouldNotAllowToAddSourcesWithSameName, plus it requires KStreamImpl.SOURCE_NAME which should be a package private field of the KStreamImpl.
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias
J. Sax <matthias@confluent.io>
We would like to also export the producer metrics from StreamThread just like consumer metrics, so that we could gain more visibility of stream application. The approach is to pass in the threadProducer into the StreamThread so that we could export its metrics in dynamic.
Note that this is a pure internal change that doesn't require a KIP, and in the future we also want to export admin client metrics. A followup KIP for admin client will be created once this is merged.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. Remove TopologyBuilder, TopologyBuilderException, KStreamBuilder,
2. Completed the leftover work of https://issues.apache.org/jira/browse/KAFKA-5660, when we remove TopologyBuilderException.
3. Added MockStoreBuilder to replace MockStateStoreSupplier, remove all XXStoreSupplier except StateStoreSupplier as it is still referenced in the logical streams graph.
4. Minor: rename KStreamsFineGrainedAutoResetIntegrationTest.java to FineGrainedAutoResetIntegrationTest.java.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Also removed the InternalValueTransformerWithKey / Supplier which is used to mock away the deprecated punctuate function.
Reviewers: Matthias J. Sax <matthias@confluent.io>
1. Remove the deprecated StateStoreSuppliers, and the corresponding Stores.create() functions and factories: only the base StateStoreSupplier and MockStoreSupplier were still preserved as they are needed by the deprecated TopologyBuilder and KStreamBuilder. Will remove them in a follow-up PR.
2. Add TopologyWrapper.java as the original InternalTopologyBuilderAccessor was removed, but I realized it is still needed as of now.
3. Minor: removed StateStoreTestUtils.java and inline its logic in its callers since now with StoreBuilder it is just a one-liner.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This is continuation of #4978.
From Guozhang:
I think to fix this issue, in init we could consider switching the steps of 1 and 2:
initInternal(context);
underlying.init(context, root);
since
volatile boolean open = false;
it should be sufficient. In this case the check on step 3) will fail if underlying.init is not completed and we will throw InvalidStateStoreException.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Serdes are confusing in the Scala wrapper:
* We have wrappers around Serializer, Deserializer and Serde which are not very useful.
* We have Serdes in 2 places org.apache.kafka.common.serialization.Serde and in DefaultSerdes, instead we should be having only one place where to find all the Serdes.
I wanted to do this PR before the release as this is a breaking change.
This shouldn't add more so the current tests should be enough.
Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>
I'm breaking KAFKA-6813 into a couple of "smaller" PRs and this is the first one. It focused on:
Remove deprecated APIs in KStream, KTable, KGroupedStream, KGroupedTable, SessionWindowedKStream, TimeWindowedKStream.
Also found a couple of overlooked bugs while working on them:
2.a) In KTable.filter / mapValues without the additional parameter indicating the materialized stores, originally we will not materialize the store. After KIP-182 we mistakenly diverge the semantics: for KTable.mapValues it is still the case, for KTable.filter we will always materialize.
2.b) In XXStream/Table.reduce/count, we used to try to reuse the serdes since their types are pre-known (for reduce it is the same types for both key / value, for count it is the same types for key, and Long for value). This was somehow lost in the past refactoring.
2.c) We are enforcing to cast a Serde<V> to Serde<VR> for XXStream / Table.aggregate, for which the returned value type is NOT known, such the enforced casting should not be applied and we should require users to provide us the value serde if they believe the default ones are not applicable.
2.d) Whenever we are creating a new MaterializedInternal we are effectively incrementing the suffix index for the store / processor-node names. However in some places this MaterializedInternal is only used for validation, so the resulted processor-node / store suffix is not monotonic.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
Updated RocksDBSegmentedBytesStoreTest class to include time window serdes.
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
Reviewer: Matthias J. Sax <matthias@confluent.io>, Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
Updated the upgrade doc as well since we do not have an overloaded function without the deprecated parameter before. Also renamed the 1.2 release version to 2.0.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>