小红书【2026校招】可观测研发工程师
校招全职基础后端地点:上海 | 杭州状态:招聘
任职要求
1、本科及以上学历,计算机、软件工程等相关专业优先; 2、精通 Java 或 Go 语言,熟悉并发编程、分布式系统、性能优化等,有扎实的编程基础; 3、熟悉云原生可观测性体系的相关产品及组件,包括不仅限于:OpenTelemetry、CAT、SkyWalking、Prometheus、VictoriaMetrics、ELK、ClickHouse、eBPF 等,了解 Kubernetes 基本原理并能运用; 4、熟悉基础技术开源组件,如 Linux、网络、存储、MQ 等,深入了解细节,掌握实现原理优先。 【加分项】 1、熟悉和使用 AI 场景相关技术,包括不仅限于:PyTorch、LLaMA-Factory、Spring AI、Langfuse、wandb 等; 2、善于发现、解决问题,善于总结、思考、跨团队协作,能吃苦耐劳; 3、关注业界新技术,好奇求知,有强烈的责任心及抗压能力。
工作职责
1、负责可观测体系研发工作,围绕 Metrics、Logging、Tracing、Profiling 四大支柱,从全栈领域展开可观测基础能力建设; 2、负责监控平台、全链路追踪、日志服务、计算引擎(流式分析、实时告警、时序检测等)、告警、eBPF 等可观测相关技术架构及产品设计; 3、保障可观测相关基础服务,在高并发环境下的高性能、高可用,推动技术、产品持续优化迭代; 4、落地 AI Infra 可观测、AI 应用可观测、可观测 AI+ 等相关技术,提高 AI 场景稳定性以及传统可观测产品使用体验和效率。
包括英文材料
学历+
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
分布式系统+
https://www.distributedsystemscourse.com/
The home page of a free online class in distributed systems.
https://www.youtube.com/watch?v=7VbL89mKK3M&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A
OpenTelemetry+
https://logz.io/learn/opentelemetry-guide/#overview
Every journey in Observability begins with instrumenting an application to emit telemetry data – primarily logs, metrics and traces – from each service as it executes.
[英文] Getting Started
https://opentelemetry.io/docs/languages/go/getting-started/
This page will show you how to get started with OpenTelemetry in Go.
https://www.youtube.com/watch?v=hLvwoow3XTk
OpenTelemetry can help, with its powerful capabilities for monitoring and analyzing hybrid applications, including collecting and analyzing telemetry data, metrics, and traces.
https://www.youtube.com/watch?v=Txe4ji4EDUA
In the observability space, the project making this possible is OpenTelemetry.
Prometheus+
https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/
Prometheus is an open source monitoring system for which Grafana provides out-of-the-box support.
https://prometheus.io/docs/tutorials/getting_started/
Prometheus is a system monitoring and alerting system.
ELK+
https://logz.io/learn/complete-guide-elk-stack/
With millions of downloads for its various components since first being introduced, the ELK Stack is the world’s most popular log management platform.
https://www.baeldung.com/ops/elk
In this tutorial, we’ll learn about the basics of the ELK stack.
https://www.youtube.com/watch?v=jk4RoEYCZTo
explains how to install and configure ELK (Elastic Search, Logstash, Kibana) Stack, a log management solution for analyzing and visualizing your data.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
eBPF+
https://ebpf.io/get-started/
eBPF is a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading a kernel module.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
消息队列+
https://www.youtube.com/watch?v=xErwDaOc-Gs
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
LLaMA-Factory+
https://llamafactory.readthedocs.io/en/latest/
LLaMA Factory is an easy-to-use and efficient platform for training and fine-tuning large language models.
Spring+
https://liaoxuefeng.com/books/java/spring/index.html
Spring是一个支持快速开发Java EE应用程序的框架。它提供了一系列底层容器和基础设施,并可以和大量常用的开源框架无缝集成,可以说是开发Java EE应用程序的必备。
https://spring.io/guides/gs/rest-service
https://spring.io/quickstart
Level up your Java code and explore what Spring can do for you.
相关职位
校招基础后端
网络工程研发: 建设小红书基础设施网络平台,对小红书网络进行全方位的监控、管理、运营优化,提升网络整体稳定性、效率,并优化网络成本。 1、负责网络可观测体系建设,研发链路水位、拥塞监控、流量调度、稳定性分析等平台功能; 2、负责网络自动化工具和系统的开发,包括拓扑自动生成、配置自动下发、变更自动化、异常检测与自愈; 3、与网络工程师协作,将底层网络能力平台化、服务化,提升整体运维效率与稳定性; 4、参与网络数据建模与分析,支撑容量规划、风险识别、成本优化与策略制定。 网络控制面研发: 1、参与网络SDN 控制系统的设计、研发和优化工作; 2、参与网络产品的控制面研发,包括但不局限于DNS、NAT、负载均衡、IPAM等产品; 3、参与大型分布式系统的性能和架构优化。
更新于 2025-09-13

实习研发
后端研发工程师-工程方向 职位描述 1、负责公司算法工程相关的系统设计与后端开发,支撑算法工程的性能和稳定性; 2、协助参与信息流推荐系统的开发和优化工作,在项目中锻炼自己应对高并发高吞吐需求场景的能力; 3、在导师指导下,参与设计和开发高并发、高可用的后端系统,学习如何确保系统的性能、稳定性、可扩展性、可观测性; 4、关注业界先进技术动态,在公司的引导下研究分析业内主流产品技术实现,参与到当前系统架构设计的优化工作中;
更新于 2025-03-21

校招研发
职位描述 1、负责公司后端相关的系统设计与后端开发,支撑算法工程的性能和稳定性; 2、协助参与信息流推荐系统的开发和优化工作,在项目中锻炼自己应对高并发高吞吐需求场景的能力; 3、在导师指导下,参与设计和开发高并发、高可用的后端系统,学习如何确保系统的性能、稳定性、可扩展性、可观测性; 4、关注业界先进技术动态,在公司的引导下研究分析业内主流产品技术实现,参与到当前系统架构设计的优化工作中;
更新于 2025-03-25