腾讯微信-基础大模型数据算法工程师
社招全职2年以上WXG技术地点:广州状态:招聘
任职要求
1.了解LLM大模型,熟悉大模型的数据建设原理,有一定的大模型调优、应用实践经验; 2.较强的工程实现能力,熟练掌握C/C++, JAVA,Python等至少一种语言; 3.熟悉大数据处理/分析相关工具/框架,包括但不限于Spark/Hive,有相关实践经验优先; 4.良好的数据分析能力,能够从数据集中洞察并提取有价值的业务信息,并提出优化方式; 5.具有较强的沟通协调能力,具备团队合作精神,能够独立思考、快速学习和解决问题; 6.有大模型测评或大模型数据管理相关工作经验优先。
工作职责
1.设计训练数据全生命周期管理方案,涵盖元数据与血缘管理、质量监控(异常检测/置信校准)、自动化评估体系,为模型训练提供稳定、可靠的高质量数据; 2.探索大模型强化数据及SFT数据合成路径,推动大模型中数据价值验证方法论的建设与落地; 3.抽象并开发高效、可靠的数据加工框架,全面管理数据,提供训练数据的可视化、可观测能力;提升训练数据治理的工程效率; 4.不断跟进业界前沿数据算法并进行落地,提高数据算法效果和效率,为大模型储备高质量的数据资源。
包括英文材料
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
相关职位
社招微信基础AI探索
1.负责社交大模型方向的记忆检索、Agent函数调用、风格化基座模型等方向的算法突破; 2.紧密贴合业务,通过后训练(SFT&RL)提升模型的专项问题解决能力; 3.基于微信场景数据提供技术解决方案,探索业界前沿技术在业务中的落地与指标优化。
更新于 2025-10-07
社招WXG技术
1.主导微信大模型在应用层面的技术优化,涵盖智能体(Agent)、增强检索生成(RAG)、数据合成技术,以及针对垂直场景的模型调优与性能提升; 2.结合微信生态内的场景化数据,设计高效技术解决方案,推动AI前沿技术(如多模态推理、拟人化音频生成、长文本建模)的业务落地与核心指标优化; 3.密切关注AI学术界与工业界的技术进展(如Agent协作框架、轻量化微调方法),挖掘其在微信生态中的潜在应用场景与创新价值。
更新于 2025-06-12
社招2年以上微信基础AI探索
1.研发具备通用能力的端到端语音大模型,包括多语种语音识别、语音合成、声纹识别、副语言信息理解等; 2.推动上述语音技术与团队内部大语言模型 (LLM) 的深度融合,参与设计和实现智能语音交互系统架构; 3.在微信AI探索业务中,基于微信场景数据提供技术解决方案,探索业界前沿技术在业务中的落地与指标优化。
更新于 2025-08-12