腾讯混元数据算法工程师(北京)
社招全职2年以上AI技术地点:深圳状态:招聘
任职要求
1.硕士及以上学历,计算机视觉、自然语言处理或多模态方向背景优先; 2.熟练掌握深度学习框架(如PyTorch、TensorFlow),熟悉模型训练及数据处理经验,具备优秀的独立开发与分析调研能力; 3.具备良好的团队协作能力,具备强烈的自我驱动力,能够独立完成系统分析与优化,落地大模型应用; 4.加分项:对多模态生成大模型有深入理解,对数据敏感具备较强的数据洞察与分析能力。 加分项 1.熟练掌握HiveSQL、Spark、Ray等至少两种数据分析及处理工具; 2.有大模型训练或数据相关工作经验者优先。
工作职责
1.数据特征算法:负责海量文本&多模态数据(图像,视频,音频,3D)的内容理解(如分类标签体系、embedding表征、Caption生成等),质量检测(低质识别检测、优质美学评价等),去重/聚类分析,数据合成等算法; 2.数据pipeline建设:负责数据采集、筛选清洗、标注与质量评估pipeline的建设。与模型业务团队紧密配合,充分分析挖掘数据资源,建立自动化数据处理流程与机制,支持模型持续迭代; 3.数据实验分析:对模型训练数据进行详细分析,建立科学数据实验机制,识别样本不足、质量问题、配比不均衡等潜在问题,驱动数据优化提升数据覆盖、质量、多样性需求,最终带来大模型生成效果的持续提升。
包括英文材料
NLP+
https://www.youtube.com/watch?v=fNxaJsNG3-s&list=PLQY2H8rRoyvzDbLUZkbudP-MFQZwNmU4S
Welcome to Zero to Hero for Natural Language Processing using TensorFlow!
https://www.youtube.com/watch?v=R-AG4-qZs1A&list=PLeo1K3hjS3uuvuAXhYjV2lMEShq2UYSwX
Natural Language Processing tutorial for beginners series in Python.
https://www.youtube.com/watch?v=rmVRLeJRkl4&list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4
The foundations of the effective modern methods for deep learning applied to NLP.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
数据科学+
https://roadmap.sh/ai-data-scientist
Step by step roadmap guide to becoming an AI and Data Scientist
学历+
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Ray+
https://github.com/ray-project/ray
Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://www.youtube.com/watch?v=FhXfEXUUQp0
In this video, I'll teach you everything you need to know about Apache Ray!
https://www.youtube.com/watch?v=fMiAyj2kgac
Using powerful machine learning algorithms is easy using Ray.io and Python.
https://www.youtube.com/watch?v=q_aTbb7XeL4
Parallel and Distributed computing sounds scary until you try this fantastic Python library.
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
OpenCV+
https://learnopencv.com/getting-started-with-opencv/
At LearnOpenCV we are on a mission to educate the global workforce in computer vision and AI.
https://opencv.org/university/free-opencv-course/
This free OpenCV course will teach you how to manipulate images and videos, and detect objects and faces, among other exciting topics in just about 3 hours.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
相关职位
社招TEG技术
1.基于大模型训练对于数据的需求进行互联网数据抓取,对提供给大模型训练/搜索等场景的语料进行清洗,提升语料纯度; 2.建设对标业内前沿的大模型训练数据集和数据清洗能力,提升数据质量和多样性,并验证数据价值和效果。
更新于 2025-06-18
社招TEG技术
1.参与LLM、多模态大模型压缩加速方案研究,包括投机采样、稀疏化、量化和蒸馏等方法; 2.设计可落地的大模型压缩算法及成本优化方案,助力大模型的性能加速; 3.分析业务性能瓶颈和模型特点,定制化开发大模型压缩优化工具,实现高速推理方案。
更新于 2025-05-26
社招3年以上AI技术
1.负责TTS、ASR、声学前处理、自然语言处理、多模态大模型等AI系统的工程开发(包括训练工具和推理引擎的开发、优化、交付等); 2.负责AI系统最新算法的集成、工程化、实际场景效果验证、优化、上线; 3.负责AI相关业务、产品的工程支持,在效果和性能上更好的落地。
更新于 2025-09-12