src-kafka

Commit Graph

Author	SHA1	Message	Date
Randall Hauch	56623efd73	KAFKA-4667: Connect uses AdminClient to create internal topics when needed (KIP-154) The backing store for offsets, status, and configs now attempts to use the new AdminClient to look up the internal topics and create them if they don’t yet exist. If the necessary APIs are not available in the connected broker, the stores fall back to the old behavior of relying upon auto-created topics. Kafka Connect requires a minimum of Apache Kafka 0.10.0.1-cp1, and the AdminClient can work with all versions since 0.10.0.0. All three of Connect’s internal topics are created as compacted topics, and new distributed worker configuration properties control the replication factor for all three topics and the number of partitions for the offsets and status topics; the config topic requires a single partition and does not allow it to be set via configuration. All of these new configuration properties have sensible defaults, meaning users can upgrade without having to change any of the existing configurations. In most situations, existing Connect deployments will have already created the storage topics prior to upgrading. The replication factor defaults to 3, so anyone running Kafka clusters with fewer nodes than 3 will receive an error unless they explicitly set the replication factor for the three internal topics. This is actually desired behavior, since it signals the users that they should be aware they are not using sufficient replication for production use. The integration tests use a cluster with a single broker, so they were changed to explicitly specify a replication factor of 1 and a single partition. The `KafkaAdminClientTest` was refactored to extract a utility for setting up a `KafkaAdminClient` with a `MockClient` for unit tests. Author: Randall Hauch <rhauch@gmail.com> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #2984 from rhauch/kafka-4667	8 years ago
Konstantine Karantasis	45f2261763	KAFKA-3487: Support classloading isolation in Connect (KIP-146) Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io> Closes #3028 from kkonstantine/KAFKA-3487-Support-classloading-isolation-in-Connect	8 years ago
Edoardo Comar	e2fafddbbf	MINOR: Improve doc for num.x.threads configs Author: Edoardo Comar <ecomar@uk.ibm.com> Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk> Closes #2847 from edoardocomar/MINOR-server.prop.threads.comments	8 years ago
Damian Guy	f69d94158c	KAFKA-5059: Implement Transactional Coordinator Author: Damian Guy <damian.guy@gmail.com> Author: Guozhang Wang <wangguoz@gmail.com> Author: Apurva Mehta <apurva@confluent.io> Reviewers: Guozhang Wang, Jason Gustafson, Apurva Mehta, Jun Rao Closes #2849 from dguy/exactly-once-tc	8 years ago
Ben Stopford	0baea2ac13	KIP-101: Alter Replication Protocol to use Leader Epoch rather than High Watermark for Truncation This PR replaces https://github.com/apache/kafka/pull/2743 (just raising from Confluent repo) This PR describes the addition of Partition Level Leader Epochs to messages in Kafka as a mechanism for fixing some known issues in the replication protocol. Full details can be found here: [KIP-101 Reference](https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation) The key elements are: - Epochs are stamped on messages as they enter the leader. - Epochs are tracked in both leader and follower in a new checkpoint file. - A new API allows followers to retrieve the leader's latest offset for a particular epoch. - The logic for truncating the log, when a replica becomes a follower, has been moved from Partition into the ReplicaFetcherThread - When partitions are added to the ReplicaFetcherThread they are added in an initialising state. Initialising partitions request leader epochs and then truncate their logs appropriately. This test provides a good overview of the workflow `EpochDrivenReplicationProtocolAcceptanceTest.shouldFollowLeaderEpochBasicWorkflow()` The corrupted log use case is covered by the test `EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards()` Remaining work: There is a do list here: https://docs.google.com/document/d/1edmMo70MfHEZH9x38OQfTWsHr7UGTvg-NOxeFhOeRew/edit?usp=sharing Author: Ben Stopford <benstopford@gmail.com> Author: Jun Rao <junrao@gmail.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com> Closes #2808 from benstopford/kip-101-v2	8 years ago
Akhilesh Naidu	3e69ef6e3c	KAFKA-4276: Add REST configuration in connector properties Addition of REST configuration in connect-distributed.properties config file gwenshap ewencp - Please review. Author: Akhilesh Naidu <akhilesh_naidu@persistent.com> Reviewers: Gwen Shapira Closes #2505 from akhilesh1194/JIRA_KAFKA-4276	8 years ago
Onur Karaman	063d534c51	KAFKA-3959: enforce offsets.topic.replication.factor upon __consumer_offsets auto topic creation (KIP-115) Kafka brokers have a config called "offsets.topic.replication.factor" that specify the replication factor for the "__consumer_offsets" topic. The problem is that this config isn't being enforced. If an attempt to create the internal topic is made when there are fewer brokers than "offsets.topic.replication.factor", the topic ends up getting created anyway with the current number of live brokers. The current behavior is pretty surprising when you have clients or tooling running as the cluster is getting setup. Even if your cluster ends up being huge, you'll find out much later that __consumer_offsets was setup with no replication. The cluster not meeting the "offsets.topic.replication.factor" requirement on the internal topic is another way of saying the cluster isn't fully setup yet. The right behavior should be for "offsets.topic.replication.factor" to be enforced. Topic creation of the internal topic should fail with GROUP_COORDINATOR_NOT_AVAILABLE until the "offsets.topic.replication.factor" requirement is met. This closely resembles the behavior of regular topic creation when the requested replication factor exceeds the current size of the cluster, as the request fails with error INVALID_REPLICATION_FACTOR. Author: Onur Karaman <okaraman@linkedin.com> Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io> Closes #2177 from onurkaraman/KAFKA-3959	8 years ago
Ismael Juma	d25671884b	KAFKA-4565; Separation of Internal and External traffic (KIP-103) Author: Ismael Juma <ismael@juma.me.uk> Reviewers: Gwen Shapira <cshapi@gmail.com>, Jason Gustafson <jason@confluent.io> Closes #2354 from ijuma/kafka-4565-separation-of-internal-and-external-traffic	8 years ago
Mark Rose	ecc0fc447a	MINOR: Clarify log deletion configuration options in server.properties I spent a bit of time tracking down why files were being deleted before they reached log.retention.hours of age. It turns out that the time and size log retention schemes function independently, and not as the original comment "The minimum age of a log file to be eligible for deletion" might indicate to a new user. Author: Mark Rose <markrose@markrose.ca> Reviewers: Guozhang Wang <wangguoz@gmail.com> Closes #28 from MarkRose/fix_misleading_configuration_file_for_trunk	8 years ago
Guozhang Wang	ae237be1bb	KAFKA-3769: Create new sensors per-thread in KafkaStreams Author: Guozhang Wang <wangguoz@gmail.com> Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Michael G. Noll <michael@confluent.io>, Greg Fodor <gfodor@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io> Closes #1530 from guozhangwang/K3769-per-thread-metrics	8 years ago
Geoff Anderson	801fee89d8	MINOR: cleanup apache license in python files ijuma As discussed in https://github.com/apache/kafka/pull/1645, this patch removes an extraneous line from several __init__.py files, and a few others as well Author: Geoff Anderson <geoff@confluent.io> Reviewers: Ismael Juma <ismael@juma.me.uk> Closes #1659 from granders/minor-cleanup-init-files	8 years ago
Liquan Pei	527b98d82f	KAFKA-3421: Connect developer guide update and several fixes This is a follow up of KAKFA-3421 to update the connect developer guide to include the configuration validation. Also includes a couple of minor fixes. Author: Liquan Pei <liquanpei@gmail.com> Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io> Closes #1366 from Ishiihara/connect-dev-doc	9 years ago
Liquan Pei	f89f5fb907	KAFKA-3582; Remove references to Copcyat from Kafka Connect property files junrao Author: Liquan Pei <liquanpei@gmail.com> Reviewers: Jun Rao <junrao@gmail.com> Closes #1236 from Ishiihara/minor-fix	9 years ago
gaob13	73470b028c	MINOR: Remove the very misleading comment lines It is not true in practice. Maybe the implied feature is not yet implemented or removed. These lines can be super misleading. Please merge. Thank you. Author: gaob13 <gaob13@mails.tsinghua.edu.cn> Reviewers: Ismael Juma, Ewen Cheslack-Postava Closes #793 from gaob13/trunk	9 years ago
Ashish Singh	83f714bae1	KAFKA-3344: Remove previous system test's leftover test-log4j.properties Author: Ashish Singh <asingh@cloudera.com> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #1024 from SinghAsDev/KAFKA-3344	9 years ago
Jason Gustafson	f7d019ed40	KAFKA-3093: Add Connect status tracking API Author: Jason Gustafson <jason@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #920 from hachikuji/KAFKA-3093	9 years ago
Grant Henke	ee1770e00e	KAFKA-2988; Change default configuration of the log cleaner Author: Grant Henke <granthenke@gmail.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com> Closes #686 from granthenke/compaction	9 years ago
Ismael Juma	23d607dc21	MINOR: Update `config/producer.properties` to have new producer properties Also include some trivial clean-ups in `ProducerConfig`and `BaseProducer`. Author: Ismael Juma <ismael@juma.me.uk> Reviewers: Gwen Shapira Closes #710 from ijuma/use-new-producer-properties-in-config	9 years ago
Gwen Shapira	b93f48f749	KAFKA-2422: Allow copycat connector plugins to be aliased to simpler names …names Author: Gwen Shapira <cshapi@gmail.com> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #687 from gwenshap/KAFKA-2422	9 years ago
Ewen Cheslack-Postava	83eaf3284f	KAFKA-2379: Add basic documentation for Kafka Connect. Author: Ewen Cheslack-Postava <me@ewencp.org> Reviewers: Gwen Shapira Closes #475 from ewencp/kafka-2379-connect-docs	9 years ago
Ewen Cheslack-Postava	f2031d4063	KAFKA-2774: Rename Copycat to Kafka Connect Author: Ewen Cheslack-Postava <me@ewencp.org> Reviewers: Gwen Shapira Closes #456 from ewencp/kafka-2774-rename-copycat	9 years ago
Ewen Cheslack-Postava	2e61773590	KAFKA-2371: Add distributed support for Copycat. This adds coordination between DistributedHerders using the generalized consumer support, allowing automatic balancing of connectors and tasks across workers. A few pieces that require interaction between workers (resolving config inconsistencies, forwarding of configuration changes to the leader worker) are incomplete because they require REST API support to implement properly. Author: Ewen Cheslack-Postava <me@ewencp.org> Reviewers: Jason Gustafson, Gwen Shapira Closes #321 from ewencp/kafka-2371-distributed-herder	9 years ago
Ewen Cheslack-Postava	36d4469326	KAFKA-2372: Add Kafka-backed storage of Copycat configs. This also adds some other needed infrastructure for distributed Copycat, most importantly the DistributedHerder, and refactors some code for handling Kafka-backed logs into KafkaBasedLog since this is shared betweeen offset and config storage. Author: Ewen Cheslack-Postava <me@ewencp.org> Reviewers: Gwen Shapira, James Cheng Closes #241 from ewencp/kafka-2372-copycat-distributed-config	9 years ago
Grant Henke	fe4818e094	KAFKA-2633; Default logging from tools to Stderr Author: Grant Henke <granthenke@gmail.com> Reviewers: Gwen Shapira Closes #296 from granthenke/tools-log4j	9 years ago
Ewen Cheslack-Postava	48b4d6938d	KAFKA-2373: Add Kafka-backed offset storage for Copycat. Author: Ewen Cheslack-Postava <me@ewencp.org> Reviewers: Gwen Shapira, James Cheng Closes #202 from ewencp/kafka-2373-copycat-distributed-offset	9 years ago
Parth Brahmbhatt	0990b6ba6a	KAFKA-2211: Adding simpleAclAuthorizer implementation and test cases. Author: Parth Brahmbhatt <brahmbhatt.parth@gmail.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com> Closes #195 from Parth-Brahmbhatt/KAFKA-2211	9 years ago
Ewen Cheslack-Postava	3803e5cb37	KAFKA-2475: Make Copycat only have a Converter class instead of Serializer, Deserializer, and Converter. The Converter class now translates directly between byte[] and Copycat's data API instead of requiring an intermediate runtime type like Avro's GenericRecord or Jackson's JsonNode. Author: Ewen Cheslack-Postava <me@ewencp.org> Reviewers: Gwen Shapira Closes #172 from ewencp/kafka-2475-unified-serializer-converter and squashes the following commits: `566c52f` [Ewen Cheslack-Postava] Checkstyle fixes `320d0df` [Ewen Cheslack-Postava] Restrict offset format. `85797e7` [Ewen Cheslack-Postava] Add StringConverter for using Copycat with raw strings. `698d65c` [Ewen Cheslack-Postava] Move and update outdated comment about handing of types for BYTES type in Copycat. `4bed051` [Ewen Cheslack-Postava] KAFKA-2475: Make Copycat only have a Converter class instead of Serializer, Deserializer, and Converter.	9 years ago
Ewen Cheslack-Postava	f6acfb0891	KAFKA-2366; Initial patch for Copycat This is an initial patch implementing the basics of Copycat for KIP-26. The intent here is to start a review of the key pieces of the core API and get a reasonably functional, baseline, non-distributed implementation of Copycat in place to get things rolling. The current patch has a number of known issues that need to be addressed before a final version: * Some build-related issues. Specifically, requires some locally-installed dependencies (see below), ignores checkstyle for the runtime data library because it's lifted from Avro currently and likely won't last in its current form, and some Gradle task dependencies aren't quite right because I haven't gotten rid of the dependency on `core` (which should now be an easy patch since new consumer groups are in a much better state). * This patch currently depends on some Confluent trunk code because I prototyped with our Avro serializers w/ schema-registry support. We need to figure out what we want to provide as an example built-in set of serializers. Unlike core Kafka where we could ignore the issue, providing only ByteArray or String serializers, this is pretty central to how Copycat works. * This patch uses a hacked up version of Avro as its runtime data format. Not sure if we want to go through the entire API discussion just to get some basic code committed, so I filed KAFKA-2367 to handle that separately. The core connector APIs and the runtime data APIs are entirely orthogonal. * This patch needs some updates to get aligned with recent new consumer changes (specifically, I'm aware of the ConcurrentModificationException issue on exit). More generally, the new consumer is in flux but Copycat depends on it, so there are likely to be some negative interactions. * The layout feels a bit awkward to me right now because I ported it from a Maven layout. We don't have nearly the same level of granularity in Kafka currently (core and clients, plus the mostly ignored examples, log4j-appender, and a couple of contribs). We might want to reorganize, although keeping data+api separate from runtime and connector plugins is useful for minimizing dependencies. * There are a variety of other things (e.g., I'm not happy with the exception hierarchy/how they are currently handled, TopicPartition doesn't really need to be duplicated unless we want Copycat entirely isolated from the Kafka APIs, etc), but I expect those we'll cover in the review. Before commenting on the patch, it's probably worth reviewing https://issues.apache.org/jira/browse/KAFKA-2365 and https://issues.apache.org/jira/browse/KAFKA-2366 to get an idea of what I had in mind for a) what we ultimately want with all the Copycat patches and b) what we aim to cover in this initial patch. My hope is that we can use a WIP patch (after the current obvious deficiencies are addressed) while recognizing that we want to make iterative progress with a bunch of subsequent PRs. Author: Ewen Cheslack-Postava <me@ewencp.org> Reviewers: Ismael Juma, Gwen Shapira Closes #99 from ewencp/copycat and squashes the following commits: `a3a47a6` [Ewen Cheslack-Postava] Simplify Copycat exceptions, make them a subclass of KafkaException. `8c108b0` [Ewen Cheslack-Postava] Rename Coordinator to Herder to avoid confusion with the consumer coordinator. `7bf8075` [Ewen Cheslack-Postava] Make Copycat CLI speific to standalone mode, clean up some config and get rid of config storage in standalone mode. `656a003` [Ewen Cheslack-Postava] Clarify and expand the explanation of the Copycat Coordinator interface. `c0e5fdc` [Ewen Cheslack-Postava] Merge remote-tracking branch 'origin/trunk' into copycat `0fa7a36` [Ewen Cheslack-Postava] Mark Copycat classes as unstable and reduce visibility of some classes where possible. `d55d31e` [Ewen Cheslack-Postava] Reorganize Copycat code to put it all under one top-level directory. `b29cb2c` [Ewen Cheslack-Postava] Merge remote-tracking branch 'origin/trunk' into copycat `d713a21` [Ewen Cheslack-Postava] Address Gwen's review comments. `6787a85` [Ewen Cheslack-Postava] Make Converter generic to match serializers since some serialization formats do not require a base class of Object; update many other classes to have generic key and value class type parameters to match this change. `b194c73` [Ewen Cheslack-Postava] Split Copycat converter option into two options for key and value. `0b5a1a0` [Ewen Cheslack-Postava] Normalize naming to use partition for both source and Kafka, adjusting naming in CopycatRecord classes to clearly differentiate. `e345142` [Ewen Cheslack-Postava] Remove Copycat reflection utils, use existing Utils and ConfigDef functionality from clients package. `be5c387` [Ewen Cheslack-Postava] Minor cleanup `122423e` [Ewen Cheslack-Postava] Style cleanup `6ba87de` [Ewen Cheslack-Postava] Remove most of the Avro-based mock runtime data API, only preserving enough schema functionality to support basic primitive types for an initial patch. `4674d13` [Ewen Cheslack-Postava] Address review comments, clean up some code styling. `25b5739` [Ewen Cheslack-Postava] Fix sink task offset commit concurrency issue by moving it to the worker thread and waking up the consumer to ensure it exits promptly. `0aefe21` [Ewen Cheslack-Postava] Add log4j settings for Copycat. `220e42d` [Ewen Cheslack-Postava] Replace Avro serializer with JSON serializer. `1243a7c` [Ewen Cheslack-Postava] Merge remote-tracking branch 'origin/trunk' into copycat `5a618c6` [Ewen Cheslack-Postava] Remove offset serializers, instead reusing the existing serializers and removing schema projection support. `e849e10` [Ewen Cheslack-Postava] Remove duplicated TopicPartition implementation. `dec1379` [Ewen Cheslack-Postava] Switch to using new consumer coordinator instead of manually assigning partitions. Remove dependency of copycat-runtime on core. `4a9b4f3` [Ewen Cheslack-Postava] Add some helpful Copycat-specific build and test targets that cover all Copycat packages. `31cd1ca` [Ewen Cheslack-Postava] Add CLI tools for Copycat. `e14942c` [Ewen Cheslack-Postava] Add Copycat file connector. `0233456` [Ewen Cheslack-Postava] Add copycat-avro and copycat-runtime `11981d2` [Ewen Cheslack-Postava] Add copycat-data and copycat-api	9 years ago
Gwen Shapira	53f31432a0	kafka-1809; Refactor brokers to allow listening on multiple ports and IPs; patched by Gwen Shapira; reviewed by Joel Koshy and Jun Rao	10 years ago
Manikumar Reddy	49d7f8ee1c	KAFKA-1723; num.partitions documented default is 1 while actual default is 2; patched by Manikumar Reddy; reviewed by Jun Rao	10 years ago
Raman Gupta	15f9e2762d	kafka-1801; Remove non-functional variable definition in log4j.properties; patched by Raman Gupta; reviewed by Jun Rao	10 years ago
James Oliver	37356bfee0	kafka-1493; Use a well-documented LZ4 compression format and remove redundant LZ4HC option; patched by James Oliver; reviewed by Jun Rao	10 years ago
Ewen Cheslack-Postava	9c17747baa	KAFKA-589 Clean shutdown after startup connection failure; reviewed by Neha Narkhede	10 years ago
Anton Karamanov	f489493c38	kafka-1414; Speedup broker startup after hard reset; patched by Anton Karamanov; reviewed by Jay Kreps and Jun Rao	10 years ago
Manikumar Reddy	b428d8cc48	kafka-1531; zookeeper.connection.timeout.ms is set to 10000000 in configuration file in Kafka tarball; patched by Manikumar Reddy; reviewed by Jun Rao	10 years ago
Sriharsha Chintalapani	6d1992274c	kafka-1438 (follow-up); Migrate client tools out of perf; patched by Sriharsha Chintalapani; reviewed by Jun Rao	11 years ago
Joe Stein	547ccedcfa	KAFKA-1456 Add LZ4 and LZ4C as a compression codec patch by James Oliver reviewed by Joe Stein	11 years ago
Jay Kreps	23d7fc4706	KAFKA-1251: Add metrics to the producer.	11 years ago
Neha Narkhede	74c54c7eeb	KAFKA-1281 add the new producer to existing tools; reviewed by Jun Rao and Guozhang Wang	11 years ago
Joe Stein	77118a935e	KAFKA-1289 Misc. nitpicks in log cleaner for new 0.8.1 features patch by Jay Kreps, reviewed by Sriram Subramanian and Jun Rao	11 years ago
Joel Koshy	ac239da502	Update description of num.partitions config in sample server properties - issue reported by Vaibhav Puranik	11 years ago
Jun Rao	58789d7cbe	kafka-1127; kafka and zookeeper server should start in daemon mode and log to correct position; patched by Raymond Liu; reviewed by Jun Rao	11 years ago
Neha Narkhede	e602ed0582	KAFKA-1113 log.cleanup.interval.mins property should be renamed; Trivial patch, no review	11 years ago
Roger Hoover	a55ec0620f	kafka-1092; Add server config parameter to separate bind address and ZK hostname; patched by Roger Hoover; reviewed by Jun Rao	11 years ago
Jun Rao	2c6d3c7b45	kafka-1076; system tests in 0.8 are broken due to wrong log4j config; patched by Joel Koshy; reviewed by Jay Kreps and Jun Rao	11 years ago
Jay Kreps	df18fe13ad	KAFKA-615 fsync asynchronous from log roll. Patch reviewed by Jun and Sriram.	11 years ago
Jay Kreps	14af713252	Misc. minor house-keeping fixes: add reasonable GC settings, fix up README, fix up example configs, fix the logging for tools, use a log directory for logs instead of the root directory.	11 years ago
Jun Rao	86e314aa73	kafka-937; fix bug exposed in ConsumerOffsetChecker; patched by Jun Rao; reviewed by Alexey Ozeritskiy	12 years ago
Swapnil Ghike	2d7403174f	kafka-871; Rename ZkConfig properties; patched by Swapnil Ghike; reviewed by Jun Rao	12 years ago
Neha Narkhede	66b1038957	KAFKA-829 Mirror maker needs to share the migration tool request channel; reviewed by Jun Rao	12 years ago

1 2

87 Commits (647afeff6a2e3fd78328f6989e8d9f96bcde5121)