腾讯混元大模型SRE运维工程师(北京)
社招全职3年以上AI技术地点:深圳状态:招聘
任职要求
1.计算机相关本科及以上学历,有3年以上行业运维经验; 2.熟悉Linux操作系统,具备扎实的系统管理和网络知识; 3.熟练掌握至少一种编程语言(如Python、Go、Shell等),有自动化运维工具开发经验; 4.熟悉大模型、云原生相关技术以及LLMOps部署流程,具备容器化和微服务架构经验; 5.熟悉常用的基础硬件评测技术,具备资源规划和成本控制经验,能够制定并执行资源优化、成本控制计划; 6.具备较强的故障排除和问题解决能力,能够在高压环境下快速响应和处理系统问题; 7.具备良好的沟通能力和团队协作精神,能够与跨部门团队有效合作; 8.具备快速学习和适应新技术的能力,乐于接受挑战和不断提升自我。 加分项 1.有AI大模型、广告、搜索以及推荐系统相关运维经验优先; 2.有大规模GPU集群管理经验者优先。
工作职责
1.负责大模型服务的稳定性和高可用性,确保平台在高并发和大流量下的稳定运行,设计和实施监控、报警和自动化运维平台建设等,及时发现和解决问题; 2.负责故障的快速定位和修复,制定并执行应急预案,确保业务连续性,参与故障复盘,分析根本原因,提出改进措施,防止类似问题再次发生; 3.开发和维护自动化运维平台与工具,提高运维效率,减少人为操作失误。进行资源使用优化,提高资源利用率,提升系统性能; 4.分析和深入发掘现有系统的不足,数据驱动找到薄弱点,推动系统优化落地改进; 5.负责资源规划和管理,确保资源的合理分配和高效利用,进行资源成本分析,监控和评估资源使用情况,提出成本优化方案,同时能结合业界硬件演进roadmap与技术平台需求不断推动最优配置选型与迭代。
包括英文材料
学历+
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
TCP/IP+
[英文] What is TCP/IP?
https://www.techtarget.com/searchnetworking/definition/TCP-IP
TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to interconnect network devices on the internet.
HTTP+
https://developer.mozilla.org/zh-CN/docs/Web/HTTP
超文本传输协议(HTTP)是一个用于传输超媒体文档(例如 HTML)的应用层协议。它是为 Web 浏览器与 Web 服务器之间的通信而设计的,但也可以用于其他目的。
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
脚本+
[英文] Scripting language
https://en.wikipedia.org/wiki/Scripting_language
https://zhuanlan.zhihu.com/p/571097954
一个脚本通常是解释执行而非编译。脚本语言通常都有简单、易学、易用的特性,目的就是希望能让程序员快速完成程序的编写工作。
NoSQL+
https://nosql-database.org/
Everything about NoSQL Systems – Types, Benefits, and Real-World Uses
https://piaosanlang.gitbooks.io/mongodb/content/section1.1.html
NoSQL(NoSQL = Not Only SQL ),即"不仅仅是SQL",指的是非关系型的数据库。是对不同于传统的关系型数据库管理系统的统称。
https://www.youtube.com/watch?v=0buKQHokLK8
NoSQL databases can operate in multiple modes: as key-value store, document store or wide column store.
Nginx+
[英文] Beginner’s Guide
https://nginx.org/en/docs/beginners_guide.html
This guide gives a basic introduction to nginx and describes some simple tasks that can be done with it.
https://www.youtube.com/watch?v=9t9Mp0BGnyI
NGINX is open-source web server software used for reverse proxy, load balancing, and caching. It's important to understand, especially if you are a backend developer.
Apache+
https://www.apache.org/
The Apache® Software Foundation (ASF) provides software for the public good, guided by community over code.
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
DevOps+
https://roadmap.sh/devops
Step by step guide for DevOps, SRE or any other Operations Role in 2025
https://zhuanlan.zhihu.com/p/562036793
DevOps中的Dev指的是Development(开发),Ops指的是Operations(运维),用一句话来说,DevOps就是打通开发运维的壁垒,实现开发运维一体化。
Docker+
https://www.youtube.com/watch?v=GFgJkfScVNU
Master Docker in one course; learn about images and containers on Docker Hub, running multiple containers with Docker Compose, automating workflows with Docker Compose Watch, and much more. 🐳
https://www.youtube.com/watch?v=kTp5xUtcalw
Learn how to use Docker and Kubernetes in this complete hand-on course for beginners.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
微服务+
https://learn.microsoft.com/en-us/training/modules/dotnet-microservices/
Microservice applications are composed of small, independently versioned, and scalable customer-focused services that communicate with each other by using standard protocols and well-defined interfaces.
https://microservices.io/
Microservices - also known as the microservice architecture - is an architectural style that structures an application as a collection of two or more services.
https://spring.io/microservices
Building small, self-contained, ready to run applications can bring great flexibility and added resilience to your code.
https://www.ibm.com/think/topics/microservices
Microservices, or microservices architecture, is a cloud-native architectural approach in which a single application is composed of many loosely coupled and independently deployable smaller components or services.
https://www.youtube.com/watch?v=CqCDOosvZIk
https://www.youtube.com/watch?v=hmkF77F9TLw
Learn about software system design and microservices.
推荐系统+
[英文] Recommender Systems
https://www.d2l.ai/chapter_recommender-systems/index.html
Recommender systems are widely employed in industry and are ubiquitous in our daily lives.
相关职位
社招3年以上TEG技术
1.负责混元大模型相关研发工作,包括文本创作、文本理解、数学、翻译、Agent FunctionCalls等专项; 2.负责混元在公司内相关业务场景落地,根据业务需求优化混元模型,提升业务效果; 3.负责跟踪和探索大语言模型的前沿问题,结合实际场景,提供全面的技术解决方案,参与前沿算法与应用的研究。
更新于 2025-06-19
社招TEG技术
1.基于大模型训练对于数据的需求进行互联网数据抓取,对提供给大模型训练/搜索等场景的语料进行清洗,提升语料纯度; 2.建设对标业内前沿的大模型训练数据集和数据清洗能力,提升数据质量和多样性,并验证数据价值和效果。
更新于 2025-06-18
社招3年以上TEG技术
1.负责大语言模型后台系统的性能优化、流程建设、稳定性和研效提升; 2.负责建设混元大模型的研发体系后端; 3.参与新技术调研及实际业务场落地,不断提升业务指标。
更新于 2025-06-17