小红书【Ace顶尖实习生】面向多Agent架构的分布式强化学习训练技术研究
校招全职机器学习平台地点:北京 | 上海 | 杭州状态:招聘
任职要求
1、不限年级,本科及以上在读,计算机/人工智能/软件工程等相关专业优先; 2、熟悉Linux/Unix平台上的C++编程,熟悉网络编程-多线程编程,有良好的编程习惯; 3、熟悉其中一种主流的深度学习训练或推理框架(TensorFlow / PyTorc…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
本课题的研究目标是针对多Agent协同场景构建基于课程学习与分层强化学习的RL框架,从优先级经验回放(PER)、分布式经验复用和Actor-Critic异步计算优化等角度,攻克多目标冲突下的样本利用率低效问题。 该技术旨在突破传统RL训练在复杂任务(如小红书社区点点RL训练任务)中收敛慢、资源消耗高的瓶颈,实现训练效率提升3倍以上,支撑Agent服务快速迭代上线需求。
包括英文材料
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
Unix+
[英文] The UNIX® Standard
https://www.opengroup.org/membership/forums/platform/unix
https://www.youtube.com/watch?v=IrDUcdpPmdI
UNIX is an operating system which was first developed in the 1970s, and has been under constant development ever since.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
网络编程+
https://www.youtube.com/watch?v=2HrYIl6GpYg
I will make a simple HTTP web server with the C Programming Language.
https://www.youtube.com/watch?v=8z6okCgdREo
This tutorial is for Gophers who have written a command line or an API application, but have little to no experience in lower-level concepts like reading and writing to sockets, working with channels, and managing multiple goroutines.
https://www.youtube.com/watch?v=bdIiTxtMaKA&list=PL9IEJIKnBJjH_zM5LnovnoaKlXML5qh17
https://www.youtube.com/watch?v=bzja9fQWzdA
Implement the ubiquitous TCP protocol that underlies much of the traffic on the internet!
[英文] 📺Network Programming with Python Course (build a port scanner, mailing client, chat room, DDOS)
https://www.youtube.com/watch?v=FGdiSJakIS4
Learn network programming in Python by building four projects. You will learn to build a mailing client, a DDOS script, a port scanner, and a TCP Chat Room.
https://www.youtube.com/watch?v=gntyAFoZp-E
https://www.youtube.com/watch?v=JiuouCJQzSQ
Explore the fundamentals of networking in Rust by building a simple TCP server.
https://www.youtube.com/watch?v=JRTLSxGf_6w
https://www.youtube.com/watch?v=sFizpxHkIlI
In this video we'll cover SOCKET PROGRAMMING in JAVA.
https://www.youtube.com/watch?v=sXW_sNGvqcU
多线程+
https://liaoxuefeng.com/books/java/threading/basic/index.html
和单线程相比,多线程编程的特点在于:多线程经常需要读写共享数据,并且需要同步。
https://www.youtube.com/watch?v=_uQgGS_VIXM&list=PLsc-VaxfZl4do3Etp_xQ0aQBoC-x5BIgJ
https://www.youtube.com/watch?v=IEEhzQoKtQU
https://www.youtube.com/watch?v=mTGdtC9f4EU&list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4
https://www.youtube.com/watch?v=TPVH_coGAQs&list=PLk6CEY9XxSIAeK-EAh3hB4fgNvYkYmghp
https://www.youtube.com/watch?v=xPqnoB2hjjA
This video is an introduction to multithreading in modern C++.
https://www.youtube.com/watch?v=YKBwKy5PrpQ
Rust threading is easy to implement and improves the efficiency of your applications on multi-core systems!
编程规范+
[英文] Google Style Guides
https://google.github.io/styleguide/
Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style.
还有更多 •••
相关职位
实习策略算法
本课题的研究目标是打造行业领先的多语言能力强化的大语言模型,结合业务海量的多语言互译语料,以及平台生活化特色的笔记评论数据,利用数据合成、RL冷启训练、SFT和RLHF等技术,实现: 1、在小红书多语言大模型翻译场景取得领先效果; 2、多语言场景下,AI搜索的检索和生成技术。
更新于 2025-07-20北京|上海|杭州
校招基础后端
本课题聚焦在多模态数据场景下如何高效的组织其向量索引数据,设计并实现配套的混合查询优化技术,能根据用户请求动态选择合理的索引组合,实现耗时、吞吐的合理平衡。 预期成果是能用单一向量数据库产品原生支持异构多模态数据的向量索引构建和混合查询,为多模态大模型场景提供高效的向量支持。
更新于 2025-11-21上海|杭州|北京
校招机器学习平台
随着大型语言模型(LLMs)的快速发展,其在复杂任务中的推理效率问题日益凸显。本课题聚焦于LLMs的推理加速,旨在研究高效的Chain-of-Thought(CoT)压缩算法,以优化模型的推理过程,减少计算开销并提高响应速度,同时保持推理的准确性;同时,课题将深入分析现有LLMs的推理机制,探索如何通过算法创新来实现CoT的高效压缩。 具体研究内容包括但不限于:基于模型结构进行优化、基于推理过程进行优化、基于Prompt进行优化、以及基于数据驱动的压缩策略等。通过本课题的研究,期望能够为LLMs的高效推理提供新的理论和技术支持,推动其在更多实际场景中的广泛应用。
更新于 2025-11-21北京|上海|杭州