字节跳动Machine Learning System Engineer | 机器学习系统研发工程师-Data AML-筋斗云人才计划
任职要求
1. Got doctor degree, with priority given to candidates majoring in Computer Science, Software Engineering, or related fields; 2. Proficiency in 1 or more programming languages such as C/C++/Go/Python/Java in a Linux environment; 3. Deep understanding of distributed system principles, with experience in designing, developing, and maintaining large-scale distributed systems; 4. Possess excellent logical analysis skills, able to abstract and decompose complex business logic effectively, with a collaborative team spirit; 5. Strong sense of responsibility, with good learning ability, communication skills, and self-motivation; 6. Good habits in technical documentation, including timely writing and updating of work processes and technical docs as required. Bonus Qualifications: 1. Familiarity with Kubernetes architecture and extensive experience in cloud-native system development; 2. Experience with at least one mainstream machine learning framework (e.g., TensorFlow, PyTorch, MXNet); …
工作职责
Team Introduction: Data AML is ByteDance's machine learning middle platform, providing training and inference systems for recommendation, advertising, CV (computer vision), speech, and NLP (natural language processing) across businesses such as Douyin, Toutiao, and Xigua Video. AML provides powerful machine learning computing capabilities to internal business units and conducts research on general and innovative algorithms to solve key business challenges. Additionally, through Volcano Engine, it delivers core machine learning and recommendation system capabilities to external enterprise clients. Beyond business applications, AML is also engaged in cutting-edge research in areas such as AI for Science and scientific computing. Research Project Introduction: Large-scale recommendation systems are being increasingly applied to short video, text community, image and other products, and the role of modal information in recommendation systems has become more prominent. ByteDance's practice has found that modal information can serve as a generalization feature to support business scenarios such as recommendation, and the research on end-to-end ultra-large-scale multimodal recommendation systems has enormous potential. It is expected to further explore directions such as multimodal cotraining, 7B/13B large-scale parameter models, and longer sequence end-to-end based on algorithm-engineering CoDesign. Engineering research directions include: Representation of multimodal samples Construction of high-performance multimodal inference engines based on the PyTorch framework Development of high-performance multimodal training frameworks Application of heterogeneous hardware in multimodal recommendation systems 1. Algorithmic research directions include: 2. Design of reasonable recommendation-advertising and multimodal cotraining architectures 3. Sparse Mixture of Experts (Sparse MOE) 4. Memory Network 5. Hybrid precision techniques 团队介绍: Data AML是字节跳动公司的机器学习中台,为抖音/今日头条/西瓜视频等业务提供推荐/广告/CV/语音/NLP的训练和推理系统。为公司内业务部门提供强大的机器学习算力,并在这些业务的问题上研究一些具有通用性和创新性的算法。同时,也通过火山引擎将一些机器学习/推荐系统的核心能力提供给外部企业客户。此外,AML还在AI for Science,科学计算等领域做一些前沿研究。 课题介绍: 大规模推荐系统正在越来越多的应用到短视频、文本社区、图像等产品上,模态信息在推荐系统中的作用也越来越大。 字节实践中发现模态信息能够很好的作为泛化特征支持推荐等业务场景,端到端的超大规模多模态推荐系统的研究具有非常大的想象空间。 期望在算法和工程CoDesign基础上,对多模态Cotrain、7B/13B大规模参数模型、更长序列端到端等方向进一步进行探索。 工程上研究方向包括多模态样本的表征、基于 pytorch 框架的高性能多模态推理引擎、高性能多模态训练框架的构建、异构硬件在多模态推荐系统上的应用;算法上的研究方向包括设计合理的推荐广告和多模态Cotrain结构、Sparse MOE、Memory Network、混合精度等。 1、负责机器学习系统架构的设计开发,以及系统性能调优; 2、负责解决系统高并发、高可靠性、高可扩展性等技术难关; 3、覆盖机器学习系统多个子方向领域的工作,包括:资源调度、任务编排、模型训练、模型推理、模型管理、数据集管理、工作流编排、ML for System等; 4、负责机器学习系统前瞻技术的调研和引入,比如:最新硬件架构、异构计算系统、GPU优化技术的引入落地; 5、研究基于机器学习方法,实现对集群/服务资源使用情况的分析和优化。
1、负责搭建快手NLP技术体系,包括但不限于文本分类、知识图谱、翻译、对话等; 2、与业务部门进行沟通与协作,交付满足产品需求的核心算法模型与能力。
1、负责AI小快智能助理机器人的研究和开发; 2、优化基础模型,并采用RAG、Agent等大模型衍生框架,来提升相关业务指标; 3、持续跟进并深入调研大模型前沿技术、开源方案,跟踪业内大模型领域的最新进展并推进相关研究,探寻将最新技术应用到AI小快的可能性。
1、模型研发与优化: 负责从0到1构建和迭代机器学习/深度学习模型(如:异常检测、图神经网络、自然语言处理、时间序列分析等),应用于恶意代码分类、网络入侵检测、用户行为分析、钓鱼网站识别等具体场景; 2、威胁狩猎与研究: 利用机器学习模型发现未知威胁和攻击模式,参与安全事件的分析与响应,为安全策略的制定提供数据驱动的洞察; 3、大模型智能体的落地:探索大模型结合信息安全领域的应用,如攻击告警自动化处理等; 4、数据探索与特征工程: 深入分析海量安全数据(如日志、流量、恶意样本、威胁情报等),进行数据清洗、特征提取和特征工程,为模型训练提供高质量的数据基础; 5、前沿技术探索: 跟踪学术界和工业界在AI安全领域的最新进展,评估并将有潜力的新技术(如:联邦学习、对抗机器学习、自监督学习等)应用于实际业务,解决诸如样本稀缺、对抗性攻击等挑战。
• Lead the product planning and execution of the platform’s recommendation system to improve the accuracy and effectiveness of personalized recommendations. • Analyze user behavior and purchase data to identify needs and preferences, and optimize recommendation algorithms accordingly. • Collaborate with data scientists and engineering teams to drive the development and enhancement of recommendation algorithms. • Develop and manage the product roadmap, ensuring timely delivery of projects that meet quality standards. • Monitor key performance indicators of the recommendation system, provide optimization suggestions, and implement improvement plans. • Work with marketing and user experience teams to ensure that recommendation product features align with the overall user experience.