小红书【2026校招】Java开发工程师-数据平台
校招全职后端开发地点:上海 | 杭州状态:招聘
任职要求
1、本科及以上学历,计算机、软件工程等相关专业优先; 2、熟练掌握Java编程语言,熟悉SQL及Hive优化; 3、有高可用系统的设计经验和能力,具备高并发、海量数据的处理能力; 4、深入理解Hadoop生态组件(HDFS/YARN/Hive/Spark/Flink等),有实际项目落地经验; 5、对大模型有强烈的兴趣和好奇心,积极拥抱AI,有大模型的使用和调优经验优先; 6、熟悉Linux环境及Shell脚本,具备任务调度系统开发经验。
工作职责
1、负责生产平台Dataverse 从Data For BI 数据平台升级为Data For AI+BI 数据平台,包括打造Notebook,个人开发环境,支持代码类任务(PySpark、Scala Spark、UDF、Ray、Python等)的高效开发和调试; 2、负责生产平台Dataverse Data+AI 数据血缘的建设,从在线、近线、到离线,覆盖算法链路特征、索引、模型、词表、样本等血缘链路的建设,支持算法全链路排障、内容理解和数据治理; 3、打造DataEngineer Agent、DataScience Agent,辅助数据开发工程师、数据科学家完成日常的数据处理、分析、建模的工作; 4、负责生产平台Dataverse 日常需求迭代,稳定性保障和问题排查等工作,具体模块包括数据同步、任务开发、数据测试、数据发布、任务运维及调度系统。
包括英文材料
学历+
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
高可用+
https://redis.io/blog/high-availability-architecture/
A high available architecture is when there are a number of different components, modules, or services that work together to maintain optimal performance, irrespective of peak-time loads.
https://www.ibm.com/think/topics/high-availability
High availability (HA) is a term that refers to a system’s ability to be accessible and reliable close to 100% of the time.
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
Yarn+
[英文] Introduction
https://yarnpkg.com/getting-started
Yarn is an established open-source package manager used to manage dependencies in JavaScript projects.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
脚本+
[英文] Scripting language
https://en.wikipedia.org/wiki/Scripting_language
https://zhuanlan.zhihu.com/p/571097954
一个脚本通常是解释执行而非编译。脚本语言通常都有简单、易学、易用的特性,目的就是希望能让程序员快速完成程序的编写工作。
相关职位

校招
1. 参与搜索/推荐/用户增长相关的业务迭代,负责对应的技术架构设计,并完成高质量的代码实现和单元测试; 2. 参与需求评审,并对产品方案提出自己的想法和建议; 3. 对在线系统进行极致的性能优化,解决各类潜在系统技术风险,保证系统的安全、稳定、快速运行。
更新于 2025-08-07
校招后端开发
1、参与公司电商广告业务的后端研发,包括不限于电商、广告、本地生活等业务场景; 2、通过代码复用、工程/架构升级等方式持续性提升个人以及团队的研发效率; 3、关注线上产品的体验和质量,优化产品性能和交互,为用户提供顺畅购买体验的产品链路。
更新于 2025-09-06
校招
1、参与软件项目的架构设计、详细设计、开发测试工作,严格把控代码质量,确保系统稳定性及可扩展性; 2、参与系统问题定位与分析工作,快速定位和解决系统运行中的问题,优化问题解决流程; 3、遵循 RESTful API 设计规范完成接口设计开发工作,确保数据交互的高效性和安全性,实现系统高效集成。
更新于 2025-06-22