
有赞大数据开发工程师-数据智能方向
社招全职地点:杭州状态:招聘
任职要求
1. 精通 Java,具备扎实的编码能力与良好的代码风格,熟悉 Spring、MyBatis 等主流框架,有独立开发大型系统经验; 2. 熟悉 Hadoop、Spark、Flink 等主流分布式计算框架,掌握其核心机制与调优手段; 3. 熟悉 MySQL、Doris、ClickHouse、Kylin 等数据库的特性与使用场景,具备良好的技术选型与性能优化能力; 4. 具备良好的系统设计与架构能力,对高并发、高可用、可扩展系统有实践经验; 5. 具备良好的沟通协作能力与产品思维,能够推动跨团队协作与多角色协同; 6. 有语义层建设、指标平台、统一数据服务平台经验者优先; 7. 有知识库系统、文档结构化、RAG检索增强生成等相关经验者优先; 8. 有 LLM / AIGC 项目经验、对智能数据产品或数据智能方向有理解和兴趣者优先; 9. 具备业务理解力,能够从数据中抽象出可服务化的通用能力,推动数据驱动业务落地。
工作职责
1. 负责统一数据服务平台(OneService)的架构设计、研发与持续优化,构建公司级标准、高效、智能的数据消费入口,支持自然语言查询、报告生成等多种数据访问方式; 2. 规划并建设指标管理体系,包括指标定义、血缘、版本、生命周期等模块,打造规范化、工具化的指标生产与管理流程,保障口径一致与结果准确; 3. 持续提升平台的数据服务能力,支持 BI 报表、自助分析、API 调用、运营活动等多样化消费场景,推动数据在经营分析、运营决策中的智能化落地; 4. 探索并集成大语言模型(LLM)与语义建模能力,提升用户通过自然语言与数据交互的体验与效率,降低数据使用门槛; 5. 参与知识库系统打通的能力建设,将结构化数据能力与非结构化知识服务融合,打造商家经营相关的“数+知”一体化智能服务; 6. 优化数据平台整体架构,提升系统稳定性与查询性能,降低接入与维护成本,持续迭代用户体验; 7. 与数据分析师、产品经理、业务团队紧密协作,抽象共性需求,构建平台级的数据产品能力,推动数据资产高效复用。
包括英文材料
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Spring+
https://liaoxuefeng.com/books/java/spring/index.html
Spring是一个支持快速开发Java EE应用程序的框架。它提供了一系列底层容器和基础设施,并可以和大量常用的开源框架无缝集成,可以说是开发Java EE应用程序的必备。
https://spring.io/guides/gs/rest-service
https://spring.io/quickstart
Level up your Java code and explore what Spring can do for you.
MyBatis+
https://mybatis.org/mybatis-3/getting-started.html
https://www.baeldung.com/mybatis
MyBatis is an open source persistence framework which simplifies the implementation of database access in Java applications.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
MySQL+
https://juejin.cn/post/7190306988939542585
这是一篇 MySQL 通关一篇过硬核经验学习路线,包括数据库相关知识,SQL语句的使用,数据库约束,设计等。
[英文] MySQL Tutorial
https://www.mysqltutorial.org/
your go-to resource for mastering MySQL in a fast, easy, and enjoyable way.
https://www.youtube.com/watch?v=5OdVJbNCSso
MySQL SQL tutorial for beginners
https://www.youtube.com/watch?v=7S_tz1z_5bA
This beginner-friendly course teaches you SQL from scratch.
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
高可用+
https://redis.io/blog/high-availability-architecture/
A high available architecture is when there are a number of different components, modules, or services that work together to maintain optimal performance, irrespective of peak-time loads.
https://www.ibm.com/think/topics/high-availability
High availability (HA) is a term that refers to a system’s ability to be accessible and reliable close to 100% of the time.
RAG+
https://www.youtube.com/watch?v=sVcwVQRHIc8
Learn how to implement RAG (Retrieval Augmented Generation) from scratch, straight from a LangChain software engineer.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
相关职位
社招A203086
1、支持新一代AI Native数据智能产品的开发,参与产品的需求分析、技术架构设计、编码实现等工作; 2、支持数据智能产品的架构和服务搭建,包含但不限于Prompt Engineering、Multi-Agent、RAG等; 3、支持内部以及ToB商业化客户; 4、AI新技术新方向的探索和选型。
更新于 2025-05-14
社招A00487
1、支持新一代AI Native数据智能产品的开发,参与产品的需求分析、技术架构设计、编码实现等工作; 2、支持数据智能产品的架构和服务搭建,包含但不限于Prompt Engineering、Multi-Agent、RAG等; 3、支持内部以及ToB商业化客户; 4、AI新技术新方向的探索和选型。
更新于 2025-03-25
社招A232096
1、支持新一代AI Native数据智能产品的开发,参与产品的需求分析、技术架构设计、编码实现等工作; 2、参与字节跳动大数据智能体相关产品的建设,为内部产品及ToB用户提供稳定高质量的数据服务; 3、持续的性能优化和架构升级,不断提升团队效率和产品的用户体验; 4、AI新技术新方向的探索和选型,提供全栈场景,不设边界。
更新于 2025-04-29