
有赞大数据开发工程师-数据应用方向
社招全职地点:杭州状态:招聘
任职要求
1. 精通Java开发语言,具备扎实的编程基础和良好的编码规范,熟悉Spring、MyBatis等开发框架,具备独立设计和开发复杂平台或服务的能力; 2. 熟悉Hadoop/Spark/Flink等分布式计算技术,熟悉其运行机制和体系结构; 3. 熟悉Mysql/Doris/Clickhouse/Kylin等存储引擎的数据存储及使用方法,以及不同场景下的OLAP技术选型; 4. 具备良好的系统设计能力,对高并发、高可用架构有一定理解; 5. 优秀的沟通表达和跨团队协作能力,逻辑清晰,积极主动,乐于接受挑战; 6. 有语义层(Semantic Layer)、指标中台、统一数据服务平台相关设计或研发经验者优先; 7. 懂业务善思考,对数据敏感,有实际数据驱动业务的工作经验者优先。
工作职责
我们是有赞的核心数据团队,主要负责商家后台数据中心的建设与运营。我们致力于打造业界领先的、基于语义层的统一数据服务平台(OneService),构建完善的指标管理体系,赋能商家精细化运营,提升数据驱动决策的效率。加入我们,你将有机会参与从0到1或从1到N构建企业级数据服务核心基础设施的过程。 岗位职责: 1. 负责统一数据服务平台(基于语义层)的设计、研发与持续优化,打造公司级标准、高效、易用的数据消费入口; 2. 负责指标中心/指标管理系统的规划与建设,建立规范化的指标生产、维护、管理流程与配套工具,保障指标口径统一与准确; 3. 提升平台数据服务能力,支持多样化的数据消费场景(如:BI报表、自助分析、数据API、运营活动等),赋能业务实现数据驱动,包括但不限于数据可视化产品,数据赋能运营等场景的设计和落地; 4. 持续优化数据平台的技术架构,提升系统稳定性、查询性能和用户体验,降低数据接入和使用成本; 5. 负责数据产品的开发及维护工作,与数据分析师、产品经理、业务方紧密合作,理解数据需求,将其转化为标准化的数据平台能力。 工作内容: 1. 负责商家数据中心的设计、开发与持续迭代优化(基于离线/实时数仓); 2. 深入参与OneService平台的设计与研发,包括但不限于语义层建模、查询引擎对接、API服务开发、权限管理等核心模块; 3. 负责指标维度管理系统的设计与实现,支持指标的定义、血缘、变更、生命周期管理等功能; 4. 基于Spark/Flink等计算引擎进行数据处理、加工与分析,构建高效、稳定的数据链路; 5. 应用OLAP技术(如Doris, ClickHouse, Kylin等)优化海量数据查询性能,并根据业务场景进行技术选型和实践; 6. 参与数据服务相关在线系统的架构设计与开发,保障服务的高可用和高性能。
包括英文材料
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Spring+
https://liaoxuefeng.com/books/java/spring/index.html
Spring是一个支持快速开发Java EE应用程序的框架。它提供了一系列底层容器和基础设施,并可以和大量常用的开源框架无缝集成,可以说是开发Java EE应用程序的必备。
https://spring.io/guides/gs/rest-service
https://spring.io/quickstart
Level up your Java code and explore what Spring can do for you.
MyBatis+
https://mybatis.org/mybatis-3/getting-started.html
https://www.baeldung.com/mybatis
MyBatis is an open source persistence framework which simplifies the implementation of database access in Java applications.
开发框架+
[英文] Understanding Modern Development Frameworks: A Guide for Developers and Technical Decision-makers
https://www.freecodecamp.org/news/understanding-modern-development-frameworks-guide-for-devs/
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
MySQL+
https://juejin.cn/post/7190306988939542585
这是一篇 MySQL 通关一篇过硬核经验学习路线,包括数据库相关知识,SQL语句的使用,数据库约束,设计等。
[英文] MySQL Tutorial
https://www.mysqltutorial.org/
your go-to resource for mastering MySQL in a fast, easy, and enjoyable way.
https://www.youtube.com/watch?v=5OdVJbNCSso
MySQL SQL tutorial for beginners
https://www.youtube.com/watch?v=7S_tz1z_5bA
This beginner-friendly course teaches you SQL from scratch.
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
OLAP+
https://www.youtube.com/watch?v=iw-5kFzIdgY
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
高可用+
https://redis.io/blog/high-availability-architecture/
A high available architecture is when there are a number of different components, modules, or services that work together to maintain optimal performance, irrespective of peak-time loads.
https://www.ibm.com/think/topics/high-availability
High availability (HA) is a term that refers to a system’s ability to be accessible and reliable close to 100% of the time.
相关职位
实习核心本地商业-点
1.数据仓库的建设、组织和管理,沉淀数据资产; 2.数据建设过程的各项工具研发,如数据安全、数据质量、数据开发的工具链等; 3.数据应用的研发,如商业智能、挖掘、分析报告、数据可视化等; 4.其他服务于业务各环节的数据运营工作。
更新于 2025-02-26
校招核心本地商业-业
1.数据仓库的建设、组织和管理,沉淀数据资产; 2.数据建设过程的各项工具研发,如数据安全、数据质量、数据开发的工具链等; 3.数据应用的研发,如商业智能、挖掘、分析报告、数据可视化等; 4.其他服务于业务各环节的数据运营工作。
更新于 2025-02-26
校招核心本地商业-点
1.数据仓库的建设、组织和管理,沉淀数据资产; 2.数据建设过程的各项工具研发,如数据安全、数据质量、数据开发的工具链等; 3.数据应用的研发,如商业智能、挖掘、分析报告、数据可视化等; 4.其他服务于业务各环节的数据运营工作。
更新于 2025-07-29