腾讯混元大模型语音算法工程师(北京/上海)
社招全职3年以上AI技术地点:深圳状态:招聘
任职要求
1.有语音对话、语音合成、语音识别、音视频多模态、大语言模型(预训练、微调、强化学习)等相关经验者优先; 2.优秀的代码能力、数据结构和算法功底,熟练掌握Python或C/C++,熟悉Pytorch/Megatron/DeepSpeed等模型训练框架,有ACM/ICPC、NOI/IOI、Top Coder、K…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1.负责语音/音频大模型研发,包括语音对话(语音交互/音视频对话)、音频理解(ASR/音频caption)、音频生成(TTS/视频配音)等模型研发; 2.负责语音/音频大模型的预训练、后训练、强化学习(文本和音频强化)相关的数据和算法工作; 3.负责语音对话/音频理解/音频生成的模型开源以及产品落地(比如语音对话产品全链路端到端优化、音频理解在噪音/口音/远场/音效音乐场景的优化、语音合成在播报/闲聊/游戏/社交等场景的优化)。
包括英文材料
语音合成+
https://www.ibm.com/think/topics/text-to-speech
Text to speech (TTS) is a type of technology that converts text on a digital interface into natural-sounding audio.
语音识别+
https://developer.nvidia.com/blog/essential-guide-to-automatic-speech-recognition-technology/
Over the past decade, AI-powered speech recognition systems have slowly become part of our everyday lives, from voice search to virtual assistants in contact centers, cars, hospitals, and restaurants.
强化学习+
https://cloud.google.com/discover/what-is-reinforcement-learning?hl=en
Reinforcement learning (RL) is a type of machine learning where an "agent" learns optimal behavior through interaction with its environment.
https://huggingface.co/learn/deep-rl-course/unit0/introduction
This course will teach you about Deep Reinforcement Learning from beginner to expert. It’s completely free and open-source!
https://www.kaggle.com/learn/intro-to-game-ai-and-reinforcement-learning
Build your own video game bots, using classic and cutting-edge algorithms.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
还有更多 •••
相关职位
社招3年以上AI技术
1.负责大语言模型在人机语音交互、音视频对话能力上的算法设计优化; 2.调研业界前沿算法,追踪最前沿的技术动态,并应用在相关的项目中; 3.参与产品讨论,基于技术对产品提出改进建议。
更新于 2025-11-11深圳
社招TEG技术
1.负责大模型语音模态的设计、开发和优化,包括但不限于语音/音频数据清洗、模型设计、训练策略等方面的研究与应用; 2.参与语音识别、语音合成、声音克隆等相关大模型语音模态能力的建设,提高跨模态整体效果。
更新于 2025-06-10北京
社招TEG技术
1.多模态驱动引擎开发,通过对文本/语音/视觉等信息,构建虚拟人表情、动作的驱动大模型; 2.设计多模态条件生成框架,实现语音、表情、镜头、肢体动作的联合优化; 3.开发多模态特征同步技术:语音-表情时序对齐、文本语义-镜头运动关联建模。
更新于 2025-05-30深圳