字节跳动Machine Learning System Engineer | 机器学习系统研发工程师-Data AML-筋斗云人才计划

校招全职A158012A2025-05-26地点：新加坡状态：招聘

扫码手机上打开

任职要求

1. Got doctor degree, with priority given to candidates majoring in Computer Science, Software Engineering, or related fields;
2. Proficiency in 1 or more programming languages such as C/C++/Go/Python/Java in a Linux environment;
3. Deep understanding of distributed system principles, with experience in designing, developing, and maintaining large-scale distributed systems;
4. Possess excellent logical analysis skills, able to abstract and decompose complex business logic effectively, with a collaborative team spirit;
5. Strong sense of responsibility, with good learning ability, communication skills, and self-motivation;
6. Good habits in technical documentation, including timely writing and updating of work processes and technical docs as required.

Bonus Qualifications:
1. Familiarity with Kubernetes architecture and extensive experience in cloud-native system development;
2. Experience with at least one mainstream machine learning framework (e.g., TensorFlow, PyTorch, MXNet);
…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

Team Introduction:
Data AML is ByteDance's machine learning middle platform, providing training and inference systems for recommendation, advertising, CV (computer vision), speech, and NLP (natural language processing) across businesses such as Douyin, Toutiao, and Xigua Video.
AML provides powerful machine learning computing capabilities to internal business units and conducts research on general and innovative algorithms to solve key business challenges. Additionally, through Volcano Engine, it delivers core machine learning and recommendation system capabilities to external enterprise clients.
Beyond business applications, AML is also engaged in cutting-edge research in areas such as AI for Science and scientific computing.

Research Project Introduction:
Large-scale recommendation systems are being increasingly applied to short video, text community, image and other products, and the role of modal information in recommendation systems has become more prominent. ByteDance's practice has found that modal information can serve as a generalization feature to support business scenarios such as recommendation, and the research on end-to-end ultra-large-scale multimodal recommendation systems has enormous potential. It is expected to further explore directions such as multimodal cotraining, 7B/13B large-scale parameter models, and longer sequence end-to-end based on algorithm-engineering CoDesign.

Engineering research directions include:
Representation of multimodal samples
Construction of high-performance multimodal inference engines based on the PyTorch framework
Development of high-performance multimodal training frameworks
Application of heterogeneous hardware in multimodal recommendation systems

1. Algorithmic research directions include:
2. Design of reasonable recommendation-advertising and multimodal cotraining architectures
3. Sparse Mixture of Experts (Sparse MOE)
4. Memory Network
5. Hybrid precision techniques

团队介绍：
Data AML是字节跳动公司的机器学习中台，为抖音/今日头条/西瓜视频等业务提供推荐/广告/CV/语音/NLP的训练和推理系统。为公司内业务部门提供强大的机器学习算力，并在这些业务的问题上研究一些具有通用性和创新性的算法。同时，也通过火山引擎将一些机器学习/推荐系统的核心能力提供给外部企业客户。此外，AML还在AI for Science，科学计算等领域做一些前沿研究。

课题介绍：
大规模推荐系统正在越来越多的应用到短视频、文本社区、图像等产品上，模态信息在推荐系统中的作用也越来越大。 字节实践中发现模态信息能够很好的作为泛化特征支持推荐等业务场景，端到端的超大规模多模态推荐系统的研究具有非常大的想象空间。 期望在算法和工程CoDesign基础上，对多模态Cotrain、7B/13B大规模参数模型、更长序列端到端等方向进一步进行探索。 工程上研究方向包括多模态样本的表征、基于 pytorch 框架的高性能多模态推理引擎、高性能多模态训练框架的构建、异构硬件在多模态推荐系统上的应用；算法上的研究方向包括设计合理的推荐广告和多模态Cotrain结构、Sparse MOE、Memory Network、混合精度等。

1、负责机器学习系统架构的设计开发，以及系统性能调优；
2、负责解决系统高并发、高可靠性、高可扩展性等技术难关；
3、覆盖机器学习系统多个子方向领域的工作，包括：资源调度、任务编排、模型训练、模型推理、模型管理、数据集管理、工作流编排、ML for System等；
4、负责机器学习系统前瞻技术的调研和引入，比如：最新硬件架构、异构计算系统、GPU优化技术的引入落地；
5、研究基于机器学习方法，实现对集群/服务资源使用情况的分析和优化。

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

C+

C+++

Go+

Python+

Java+

Linux+

Kubernetes+

开发框架+

还有更多 •••

登录查看完整学习资料

相关职位

Large Model Application Algorithm Research Engineer｜大模型应用算法研究工程师-国际化内容安全算法研究-筋斗云人才计划

校招A238623

Team Introduction: TikTok Content Security Algorithm Research Team The International Content Safety Algorithm Research Team is dedicated to maintaining a safe and trustworthy environment for users of ByteDance's international products. We develop and iterate on machine learning models and information systems to identify risks earlier, respond to incidents faster, and monitor potential threats more effectively. The team also leads the development of foundational large models for products. In the R&D process, we tackle key challenges such as data compliance, model reasoning capability, and multilingual performance optimization. Our goal is to build secure, compliant, and high-performance models that empower various business scenarios across the platform, including content moderation, search, and recommendation. Research Project Background: In recent years, Large Language Models (LLMs) have achieved remarkable progress across various domains of natural language processing (NLP) and artificial intelligence. These models have demonstrated impressive capabilities in tasks such as language generation, question answering, and text translation. However, reasoning remains a key area for further improvement. Current approaches to enhancing reasoning abilities often rely on large amounts of Supervised Fine-Tuning (SFT) data. However, acquiring such high-quality SFT data is expensive and poses a significant barrier to scalable model development and deployment. To address this, OpenAI's o1 series of models have made progress by increasing the length of the Chain-of-Thought (CoT) reasoning process. While this technique has proven effective, how to efficiently scale this approach in practical testing remains an open question. Recent research has explored alternative methods such as Process-based Reward Model (PRM), Reinforcement Learning (RL), and Monte Carlo Tree Search (MCTS) to improve reasoning. However, these approaches still fall short of the general reasoning performance achieved by OpenAI's o1 series of models. Notably, the recent DeepSeek R1 paper suggests that pure RL methods can enable LLM to autonomously develop reasoning skills without relying on the expensive SFT data, revealing the substantial potential of RL in advancing LLM capabilities. 团队介绍：国际化内容安全算法研究团队致力于为字节跳动国际化产品的用户维护安全可信赖环境，通过开发、迭代机器学习模型和信息系统以更早、更快发掘风险、监控风险、响应紧急事件，团队同时负责产品基座大模型的研发，我们在研发过程中需要解决数据合规、模型推理能力、多语种性能优化等方面的问题，从而为平台上的内容审核、搜索、推荐等多项业务提供安全合规，性能优越的基座模型。课题介绍: 课题背景：近年来，大规模语言模型（Large Language Models, LLM）在自然语言处理和人工智能的各个领域都取得了显著的进展。这些模型展示了强大的能力，例如在生成语言、回答问题、翻译文本等任务上表现优异。然而，LLM 的推理能力仍有很大的提升空间。在现有的研究中，通常依赖于大量的监督微调（Supervised Fine-Tuning, SFT）数据来增强模型的推理性能。然而，高质量 SFT 数据的获取成本高昂，这对模型的开发和应用带来了极大的限制。为了提升推理能力，OpenAI 的 o1 系列模型通过增加思维链（Chain-of-Thought, CoT）的推理过程长度取得了一定的成功。这种方法虽然有效，但在实际测试时如何高效地进行扩展仍是一个开放的问题。一些研究尝试使用基于过程的奖励模型（Process-based Reward Model, PRM）、强化学习（Reinforcement Learning, RL）以及蒙特卡洛树搜索算法（Monte Carlo Tree Search, MCTS）等方法来解决推理问题，然而这些方法尚未能达到 OpenAI o1 系列模型的通用推理性能水平。最近deepseek r1在论文中提到通过纯强化学习的方法，可以使得 LLM 自主发展推理能力，而无需依赖昂贵的 SFT 数据。这一系列的工作都揭示着强化学习对LLM的巨大潜力。课题挑战： 1、Reward模型的设计：在强化学习过程中，设计一个合适的reward模型是关键。Reward模型需要准确地反映推理过程的效果，并引导模型逐步提升其推理能力。这不仅要求对不同任务精准设定评估标准，还要确保reward模型能够在训练过程中动态调整，以适应模型性能的变化和提高。 2、稳定的训练过程：在缺乏高质量SFT数据的情况下，如何确保强化学习过程中的稳定训练是一个重大挑战。强化学习过程通常涉及大量的探索和试错，这可能导致训练不稳定甚至模型性能下降。需要开发具有鲁棒性的训练方法，以保证模型在训练过程中的稳定性和效果。 3、如何从数学和代码任务上拓展到自然语言任务上：现有的推理强化方法主要应用在数学和代码这些CoT数据量相对丰富的任务上。然而，自然语言任务的开放性和复杂性更高，如何将成功的RL策略从这些相对简单的任务拓展到自然语言处理任务上，要求对数据处理和RL方法进行深入的研究和创新，以实现跨任务的通用推理能力。 4、推理效率的提升：在保证推理性能的前提下，提升推理效率也是一个重要挑战。推理过程的效率直接影响到模型在实际应用中的可用性和经济性。可以考虑利用知识蒸馏技术，将复杂模型的知识传递给较小的模型，以减少计算资源消耗。另外，使用长思维链（Long Chain-of-Thought, Long-CoT）技术来改进短思维链（Short-CoT）模型，也是一种潜在的方法，以在保证推理质量的同时提升推理速度。

更新于 2025-05-26新加坡

Recommendation Large Model Algorithm Engineer | 推荐大模型算法研究工程师-TikTok 算法-筋斗云人才计划

校招A177421

更新于 2025-05-26新加坡

Recommendation System Architecture Engineer | 推荐系统架构工程师- 筋斗云人才计划

校招A218205

Team Introduction: The ByteDance Recommendation Architecture Team is responsible for the design and development of the recommendation system architecture for ByteDance's related products. It ensures the stability and high availability of the system, optimizes the performance of online services and offline data streams, resolves system bottlenecks, and reduces cost overheads. The team also abstracts the common components and services of the system, builds the recommendation middle - office and data middle - office to support the rapid incubation of new products and enable ToB services. 团队介绍：字节跳动推荐架构团队，负责字节跳动旗下相关产品的推荐系统架构的设计和开发，保障系统稳定和高可用；负责在线服务、离线数据流性能优化，解决系统瓶颈，降低成本开销；抽象系统通用组件和服务，建设推荐中台、数据中台，支撑新产品快速孵化以及为ToB赋能。课题背景：在当今数字化时代，推荐系统已成为众多领域（如电商、信息资讯等）实现个性化服务、提升用户体验和竞争力的关键技术。然而，随着技术的不断发展和业务场景的日益复杂，推荐系统面临着诸多严峻挑战。一方面，推荐系统自身的复杂性急剧增加。大量推荐策略不断演进迭代，且系统状态动态变化，但缺乏有效手段自动跟踪评估策略有效性并下线低 ROI 策略，导致系统存在较多低效策略。同时，推荐系统依赖多种基础组件，其复杂负载模型给底层组件参数配置和性能调优带来巨大困难，日常开发迭代中的问题排查等工作消耗大量人力，亟需提升开发效率、降低人力成本。另一方面，随着电商行业等领域的激烈竞争，传统推荐系统在多样性、创新性和个性化方面的短板愈发凸显，难以满足用户日益增长的多元需求。生成式人工智能技术虽带来新突破，但在实际应用中面临成本效率、全域数据协同、数据隐私与安全以及技术变革应对等诸多难题。此外，随着大模型的快速发展，推荐系统对用户行为序列数据的存储和质量要求不断提高，数据质量对模型性能的影响愈发关键。同时，模型规模的扩大和多模态数据的涌现，使得推荐系统在数据处理环节面临冗长、资源利用不合理以及传统数据处理框架难以满足多模态数据处理需求等问题。课题挑战: 策略管理与优化：构建一套智能化系统，实现推荐策略的规范化定义、长期及离线评估、无效策略自动识别与下线，以及相关代码配置的下线。自适应调优与故障诊断：针对推荐系统多样化业务负载，利用大模型能力完成系统及底层组件的参数和配置调优，并探索自适应故障诊断方案，提供全局视角的故障追踪、定位和分析能力。成本与效率平衡：在推荐系统应用生成式技术时，解决模型训练和运行的高成本问题，平衡成本与效率，在有限资源下实现高效推荐。全域数据处理：应对电商等横向全域场景下海量异构数据，提升和保障数据质量与准确性，标准化供给数据给全域推荐模型，并实现低成本跨端服务，同时，确保数据隐私与安全，合规使用数据。数据存储与质量提升：研发低成本高性能存储引擎，设计灵活的Schema Evolution机制，实现数据高并发实时写入与训推一致性，深入探究数据质量与模型预测性能的量化关系，构建基于DCAI理念的数据和模型相关性分析工具及训练数据自动化处理链路。多模态数据与异构计算：构建适用于推荐系统的多模态数据异构计算处理框架，解决数据读取、框架整合、高性能算子编排等问题，提高数据处理和模型训练效率，建立以Python为核心的开发者生态。推荐大算力模型效率优化：随着大模型在CV/NLP/多模态以至于AGI领域的不断突破，推荐场景下的大算力驱动能够帮助模型更全面深刻理解用户偏好，进而更好地理解用户需求，挖掘用户潜在兴趣，进而带来更好地用户体验。更大规模的推荐模型需要更大的算力，如何平衡好算力开销和效果收益，需要架构和算法工程师深度Co-Design。

更新于 2025-05-26新加坡

Recommendation Algorithm Engineer｜推荐算法工程师-TikTok 算法 -筋斗云人才计划

校招A54374

Team Introduction: TikTok is a global short-video platform available in 150 countries and regions. Our mission is to inspire creativity and bring joy by helping users discover real and interesting moments that make life better. TikTok's global headquarters are in Los Angeles and Singapore, and we also have offices in New York City, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo. TikTok Research & Development (R&D) Team: The TikTok R&D team is dedicated to building and maintaining industry-leading products that drive the success of TikTok’s global business. By joining us, you'll work on core scenarios such as user growth, social features, live streaming, e-commerce consumer side, content creation, and content consumption, helping our products scale rapidly across global markets. You'll also face deep technical challenges in areas like service architecture and infrastructure engineering, ensuring our systems operate with high quality, efficiency, and security. Meanwhile, our team also provides comprehensive technical solutions across diverse business needs, continuously optimizing product metrics and improving user experience. Here, you'll collaborate with leading experts in exploring cutting-edge technologies and pushing the boundaries of what's possible. Every line of your code will serve hundreds of millions of users. Our team is professional and goal-oriented, with an egalitarian and easy-going collaborative environment. Research Project Introduction: As the world's leading short-video platform, TikTok faces multiple challenges in its recommendation systems, including data sparsity for new users leading to insufficient personalisation, high timeliness requirements for live steaming recommendations, difficulty in maintaining user interest diversity, and complex e-commerce recommendation system chains. Traditional recommendation methods heavily rely on historical behaviour modeling, which struggles with the cold-start problem for new users. Live-streaming recommendations demand real-time responsiveness to rapidly changing content dynamics (e.g., host interactions, traffic fluctuations) within extremely short time windows (typically within 30 minutes) posing higher demands on the system's real-time perception and decision-making capabilities. Additionally, the immersive single-feed format amplifies the challenge of maintaining content diversity, requiring a careful balance between multi-interest learning and the risk of content drift caused by exploratory recommendations. The current e-commerce recommendation system follows a multi-stage funnel architecture (recall–ranking–re-ranking), which often leads to inconsistent chains, high maintenance costs, and an overreliance on short-term value prediction. This leads users to fall into content homogenization fatigue. To address these pain points, this project proposes leveraging large language models (LLMs) and large model technologies to achieve significant breakthroughs. On one hand, LLMs—with their vast knowledge base and few-shot reasoning capabilities—can infer new users' potential intentions from registration data and external knowledge, thereby alleviating cold-start issues. On the other hand, by integrating graph neural networks (GNNs) and full-lifecycle user behavior sequences for modeling social preferences, we aim to improve the accuracy of interest prediction. Additionally, the project explores the generalization capabilities, long-context awareness, and end-to-end modeling strengths of large models to simplify the e-commerce recommendation chains, enhance adaptability to real-time changes, and improve exploratory recommendation effectiveness. The ultimate goal is to build a more streamlined system with more accurate recommendations, enhancing user experience and retention while driving sustainable business growth. 团队介绍： TikTok是一个覆盖150个国家和地区的国际短视频平台，我们希望通过TikTok发现真实、有趣的瞬间，让生活更美好。TikTok 在全球各地设有办公室，全球总部位于洛杉矶和新加坡，办公地点还包括纽约、伦敦、都柏林、巴黎、柏林、迪拜、雅加达、首尔和东京等多个城市。 TikTok研发团队，旨在实现TikTok业务的研发工作，搭建及维护业界领先的产品。加入我们，你能接触到包括用户增长、社交、直播、电商C端、内容创造、内容消费等核心业务场景，支持产品在全球赛道上高速发展；也能接触到包括服务架构、基础技术等方向上的技术挑战，保障业务持续高质量、高效率、且安全地为用户服务；同时还能为不同业务场景提供全面的技术解决方案，优化各项产品指标及用户体验。在这里，有大牛带队与大家一同不断探索前沿，突破想象空间。在这里，你的每一行代码都将服务亿万用户。在这里，团队专业且纯粹，合作氛围平等且轻松。课题介绍： TikTok作为全球领先的短视频平台，面临新用户数据稀疏导致的个性化推荐不足、直播推荐时效性要求高、用户兴趣多样性维护困难以及电商推荐系统链路复杂等多重挑战。传统推荐方法依赖历史行为建模，难以解决新用户冷启动问题，且直播推荐需在极短窗口期内（通常30分钟内）实时捕捉内容动态变化（如主播互动、流量波动），这对系统的实时感知与快速决策能力提出更高要求。此外，单列沉浸式场景放大了多样性问题，需平衡多峰兴趣学习与探索引发的内容穿越风险。当前电商推荐系统采用多阶段漏斗架构（召回-排序-混排），存在链路不一致、维护成本高、过度依赖短期价值预测等问题，导致用户易陷入内容同质化疲劳。针对上述痛点，项目提出结合大语言模型（LLM）和大模型技术实现突破：一方面利用LLM的海量知识储备与Few-shot推理能力，通过注册信息与外部知识推理新用户潜在意图，缓解冷启动问题；另一方面，在社交偏好建模中融合GNN与用户全生命周期行为序列，提升兴趣预测精准度。同时，探索大模型的泛化能力、长上下文感知及端到端建模优势，简化电商推荐链路，增强实时动态适应性与兴趣探索能力，最终实现系统更简洁、推荐更精准、用户体验与留存双提升的目标，推动业务可持续增长。

更新于 2025-05-26新加坡