快手【留用实习】多模态大模型数据处理算法工程师
实习兼职J1001地点:北京状态:招聘
任职要求
1、硕士及以上学历,计算机、统计学、数学或相关专业,具备丰富的机器学习、大模型训练及数据处理项目经验,拥有扎实的数据分析与建模基础; 2、熟练使用 Python、SQL 等分析工具,掌握常用数据分析与可视化工具(如 Pandas、Tableau、Matplotlib 等); 3、具备良好的沟通能力和团队合作精神,能够独立完成分析并提出可落地的优化建议; 加分项 1、对视频生成、计算机视觉或多模态生成技术有深入理解;对生成式 AI 领域有浓厚兴趣,关注行业动态并能提出创新性建议。
工作职责
1、数据特征算法方案制定与效果优化:针对不同模态、多种类目的数据,设计自动化筛选方案;对多模态数据涉及的前沿特征算法(如物体跟踪、ID 重识别、音频分离)进行场景化效果优化。与算法工程师协作,制定数据调整与扩展策略,提升模型在真实场景中的生成能力; 2、数据 pipeline 建设:负责多模态大模型训练数据的构建与管理,参与数据筛选、标注及质量评估工作。分析和挖掘现有数据资源,设计有效的数据分布策略,支持模型持续迭代; 3、数据分布分析:对模型训练数据分布进行详细分析,识别数据偏差、不均衡及潜在问题。提供可视化报告及改进建议,确保训练数据覆盖目标场景并满足多样性需求,最终通过数据驱动方法优化视频生成大模型效果。
包括英文材料
学历+
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Pandas+
[英文] 10 minutes to pandas
https://pandas.pydata.org/docs/user_guide/10min.html
This is a short introduction to pandas, geared mainly for new users.
[英文] Cookbook - pandas
https://pandas.pydata.org/docs/user_guide/cookbook.html#cookbook
This is a repository for short and sweet examples and links for useful pandas recipes.
https://www.kaggle.com/learn/pandas
Solve short hands-on challenges to perfect your data manipulation skills.
https://www.youtube.com/watch?v=2uvysYbKdjM
I'm super excited for this one. We're doing another complete Python Pandas tutorial walkthrough.
https://www.youtube.com/watch?v=Mdq1WWSdUtw
Filtering, Joins, Indexing, Data Cleaning, Visualizations
Tableau+
https://help.tableau.com/current/guides/get-started-tutorial/zh-cn/get-started-tutorial-home.htm
了解如何连接到数据、创建数据可视化项、演示您的发现以及与其他人共享您的见解。
https://www.youtube.com/watch?v=K3pXnbniUcM
Spent 2 years creating a 21-hour, high-quality course that covers everything about Tableau.
Matplotlib+
https://matplotlib.org/stable/tutorials/index.html
This page contains a few tutorials for using Matplotlib.
https://www.youtube.com/watch?v=c9vhHUGdav0
This video serves as an introduction to the Matplotlib Python library.
https://www.youtube.com/watch?v=OZOOLe2imFo
In this video we do a complete Matplotlib crash course in Python.
OpenCV+
https://learnopencv.com/getting-started-with-opencv/
At LearnOpenCV we are on a mission to educate the global workforce in computer vision and AI.
https://opencv.org/university/free-opencv-course/
This free OpenCV course will teach you how to manipulate images and videos, and detect objects and faces, among other exciting topics in just about 3 hours.
相关职位
实习J1010
1. 负责语音多模态大模型的研究与开发,包括Pretrain、SFT、RLHF等; 2. 负责语音处理算法的研究与开发,支撑大模型训练对数据的需求; 3. 负责大模型技术在快手业务中的落地,并探索新玩法或业务创新; 4. 负责跟踪国内外前沿技术的发展和实践,保持团队技术的敏锐性。
更新于 2025-05-08
实习J1005
1、探索大模型与推荐算法结合的下一代推荐系统技术,充分利用大模型的领域知识和学习范式为推荐系统注入新的能量,包括但不限于文本/ID生成式推荐、模型Scaling Law、用户超长序列端到端建模等; 2、探索视频、文本和语音等多模态信号的高效处理方式以及与推荐系统对齐的能力,让推荐系统看懂、听懂和理解世界; 3、混合专家、蒸馏剪枝等兼顾模型性能和效果的技术探索; 4、紧跟行业及大模型技术发展,结合业界前沿技术和业务需求,打造大模型应用的最佳实践。
更新于 2025-05-14
实习J1006
1、负责大模型在广告应用场景落地的相关工作;结合大模型的生成理解能力,将大模型prompts调优、RAG应用、大模型对齐微调、RLHF等技术在广告核心业务场景落地,提高广告模型的匹配效率,推动业务高速发展; 2、负责多模态技术在广告应用场景落地的相关工作;结合多模态表征学习、diffusion等生成式建模方法,提高广告模型的跨域理解能力; 3、跟踪AI行业及大模型技术发展,结合业界前沿技术和业务需求,不断推进广告算法设计升级; 4、了解业务,与公司各技术团队密切配合,能与产品、运营等角色高效沟通需求和目标,发挥自己的主观能动性,设计技术解决方案,培养自己的良好的业务sense和综合素质。
更新于 2025-04-28