
汽车之家实时计算引擎高级工程师
社招全职5-10年技术地点:北京状态:招聘
任职要求
1、 学历与专业:本科及以上学历,计算机科学、信息技术、数学等相关专业。 2、工作经验:拥有 6 年以上大数据平台建设经验。 3、技术能力: 编程能力:精通 Java 或 Golang,熟悉Python语言,深入理解语言核心特性、内存模型、并发编程模式,具备高性能代码调优能力和排查复杂线上问题的经验。 架构能力:深入理解分布式系统原理,能在设计中应用容错机制、分片策略、负载均衡、高可用设计等。 大数据技术:深度掌握大数据生态,精通 Hadoop/Flink/Paimon/Iceberg 至少一种,有定制开发和优化经验;熟练掌握Kafka/Pulsar。 4、创新能力:具备敏锐的技术洞察力和较强的技术创新能力,推动新技术的引入和落地。 5、沟通协作:具备良好的沟通表达能力、团队协作能力,能够有效推动目标的实现。 6、加分项: 1)有 Data + AI 深度融合并成功落地经验(多模态数据湖、AI辅助数据开发、AI辅助数据治理、数据智能体等)。 2)有算法开发、模型训练经验。 3)对开源社区有重要贡献。
工作职责
1、开发维护实时湖仓引擎(Flink、Paimon),建设平台化能力,攻克在高并发、低延迟、海量数据处理等方面的关键技术难题,保障公司业务稳定高效运行。 2、规划构建一体化的 Data + AI 计算平台,实现数据与 AI 的深度融合,提升数据开发效率和数据价值。 3、结合技术发展趋势与业务需求,主导技术选型及技术路线规划,推动大数据平台能力达到行业领先水平。
包括英文材料
学历+
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
分布式系统+
https://www.distributedsystemscourse.com/
The home page of a free online class in distributed systems.
https://www.youtube.com/watch?v=7VbL89mKK3M&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A
高可用+
https://redis.io/blog/high-availability-architecture/
A high available architecture is when there are a number of different components, modules, or services that work together to maintain optimal performance, irrespective of peak-time loads.
https://www.ibm.com/think/topics/high-availability
High availability (HA) is a term that refers to a system’s ability to be accessible and reliable close to 100% of the time.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Iceberg+
https://iceberg.apache.org/spark-quickstart/
This guide will get you up and running with Apache Iceberg™ using Apache Spark™, including sample code to highlight some powerful features.
https://www.baeldung.com/apache-iceberg-intro
This tutorial will discuss Apache Iceberg, a popular open table format in today’s big data landscape.
https://www.youtube.com/watch?v=TsmhRZElPvM
You’ve probably heard about Apache Iceberg™—after all, it’s been getting a lot of buzz.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Pulsar+
https://pulsar.apache.org/docs/next/functions-develop-tutorial/
Write a function for word count.
https://www.baeldung.com/apache-pulsar
Apache Pulsar is a distributed open source Publication/Subscription based messaging system developed at Yahoo.
https://www.youtube.com/watch?v=TKs5T6N78Tc
Discover the seven key features of Apache Pulsar that make it perfect for providing a centralized messaging & data streaming service for an Enterprise.
数据治理+
https://www.ibm.com/think/topics/data-governance
Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data.
https://www.youtube.com/watch?v=uPsUjKLHLAg
Building data fabric eliminates the technological complexities of data governance so users can connect to the right data at the right time, regardless of where it resides.
智能体+
https://learn.microsoft.com/en-us/shows/ai-agents-for-beginners/
In this 10-lesson course we take you from concept to code while covering the fundamentals of building AI agents.
https://www.ibm.com/think/ai-agents
Your one-stop resource for gaining in-depth knowledge and hands-on applications of AI agents.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
相关职位
社招6年以上核心本地商业-基
1.负责构建基于Kubernetes的云原生分布式计算平台,支撑离线计算、实时流处理等多场景需求,实现计算资源的弹性伸缩与高效调度; 2.解决数仓生产场景下业务规模增长带来的引擎扩展性问题,通过优化Shuffle服务、向量化执行引擎以及内核改造,支撑每天百万级Spark Job,EB级数据处理,单作业百TB Shuffle,不断提升生产引擎稳定性和扩展性,保障核心数据的稳定产出; 3.综合调度、引擎层内核改造优化等多种技术方案持续提升计算效率,降低计算成本; 4.设计并实现云原生环境下的弹性扩缩容策略,结合K8s编排能力与计算引擎特性,应对流量洪峰与资源碎片化挑战。
更新于 2025-08-08
社招技术
1、负责滴滴Flink引擎及平台的完善和优化,从稳定性、性能和功能等多方面进行架构设计和实现 2、深入理解业务,发现用户对于实时的需求痛点,帮助业务解决问题的同时从中提炼出通用和潜在需求,用于指导规划 3、规划实时计算及实时数据湖等产品技术方向,培养发展技术团队
更新于 2025-08-05
社招3年以上技术类-算法
● 利用深度学习/机器学习/大模型在内的算法能力解决和攻坚国际质量风险领域的技术难题 ● 结合大模型与Agent技术,构建智能化风险平台(如智能攻防、核对规则挖掘、工单问答等),提升质量风险能力。 ● 搭建高扩展性的大规模数据处理与特征工程框架,支持Agent系统所需的实时数据流处理、多模态数据融合与低延迟推理; ● 对技术有较强的钻研及学习精神,能够深入了解开源技术、现有系统技术等相关技术原理,出现问题时能够通过较强的技术手段较好的解决问题。
更新于 2025-07-31