拼多多AIGC算法工程师
社招全职技术类地点:上海状态:招聘
任职要求
1、熟悉Python、java等编程语言; 2、熟悉Pytorch或者tensoflow, 有过CNN、RNN等深度学习模型的实践经验, 熟悉U-Net、GAN等网络结构; 3、熟悉计算机图形学、音视频处理等相关技术, 了解常用图片格式,音、视频常用编码,熟悉ffmpeg的使用; 4、熟悉多模态大模型、多模态数据处理等相关技术; 5、具备优秀的编程能力和创新思维,能够独立完成技术方案设计和实现。 加分项 1、了解Stable Diffusion的技术原理,熟悉ComfyUI的优先; 2、有过数字人相关项目经验优先; 3、有TTS等相关项目经验经验优先; 4、有AIGC相关项目经验或研究成果,或在AIGC领域有发表论文或专利; 5、有过前端开发经验,熟悉webgl优先。
工作职责
1、探索AIGC技术的相关技术,包括图片生成、视频生成、语音合成、数字人、智能对话等领域; 2、结合业务场景,设计并实现AIGC相关的算法和模型; 3、参与AIGC技术的性能优化和工程化工作; 4、持续学习和跟踪AIGC技术的最新进展,为团队提供技术支持和指导。
包括英文材料
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
CNN+
https://learnopencv.com/understanding-convolutional-neural-networks-cnn/
Convolutional Neural Network (CNN) forms the basis of computer vision and image processing.
[英文] CNN Explainer
https://poloclub.github.io/cnn-explainer/
Learn Convolutional Neural Network (CNN) in your browser!
https://www.deeplearningbook.org/contents/convnets.html
Convolutional networks(LeCun, 1989), also known as convolutional neuralnetworks, or CNNs, are a specialized kind of neural network for processing data.
https://www.youtube.com/watch?v=2xqkSUhmmXU
MIT Introduction to Deep Learning 6.S191: Lecture 3 Convolutional Neural Networks for Computer Vision
RNN+
https://d2l.ai/chapter_recurrent-neural-networks/rnn.html
A neural network that uses recurrent computation for hidden states is called a recurrent neural network (RNN).
https://www.deeplearningbook.org/contents/rnn.html
Recurrent neural networks, or RNNs (Rumelhart et al., 1986a), are a family of neural networks for processing sequential data.
https://www.ibm.com/think/topics/recurrent-neural-networks
A recurrent neural network or RNN is a deep neural network trained on sequential or time series data to create a machine learning (ML) model that can make sequential predictions or conclusions based on sequential inputs.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
Stable Diffusion+
https://course.fast.ai/Lessons/lesson9.html
This lesson starts with a tutorial on how to use pipelines in the Diffusers library to generate images.
https://www.youtube.com/watch?v=dMkiOex_cKU
earn how to use Stable Diffusion to create art and images in this full course.
数字人+
https://www.youtube.com/watch?v=42_lCOayS6s
Taking chatbots to the next level, with emotion recognition and gesture control.
https://www.youtube.com/watch?v=DFHuV7nOgsI&list=PL05umP7R6ij13it8Rptqo7lycHozvzCJn
This lecture covers the history of virtual humans, from early models form the 80s until the more recent ones.
语音合成+
https://www.ibm.com/think/topics/text-to-speech
Text to speech (TTS) is a type of technology that converts text on a digital interface into natural-sounding audio.
前端开发+
https://roadmap.sh/frontend
Step by step guide to becoming a modern frontend developer
WebGL+
[英文] Learn WebGL
https://learnwebgl.brown37.net/
The traditional approach to learning a subject is to divide the topic into sub-topics, study each sub-topic, and then show how the sub-topics relate to each other.
https://www.youtube.com/watch?v=bP7_FeP9kU4
Ever want to know how 3D games and simulations are made?
https://www.youtube.com/watch?v=y2UsQB3WSvo
I'm finally getting around to updating my WebGL series! The old series used some fairly outdated JavaScript.
相关职位

社招
1. 主导多模态生成算法(图像 / 视频 / 3D 等)的设计与优化,聚焦生成质量提升、多样性拓展、可控性强化及可编辑功能实现,攻克技术瓶颈; 2. 深度结合游戏研发与发行业务需求,提供通用性技术框架或定制化算法方案,解决 AIGC 在实际落地中的适配性、效率及效果问题,推动技术转化; 3. 密切追踪多模态与 Generative AI 领域前沿动态(如模型架构、训练策略等),结合业务痛点设计创新性算法路径,保持技术竞争力。
社招
面向AIGC领域,研发前沿的视频生成与处理算法,结合短视频、电商、品牌创意等具体业务场景,进行系统性算法设计,推动自动化剪辑、视频生成、动作迁移、语义驱动等能力落地; 针对当前大模型视频生成中的痛点(如帧一致性、时空建模、长视频连贯性、跨模态对齐等),优化扩散/生成架构、设计稀疏高效推理策略,提高生成质量和响应速度; 开发用于视频创作的底层算法与工具链,包括视频分镜生成、关键帧补全、文本驱动编辑(text-driven editing)、镜头分割与结构化剪辑等能力模块; 持续追踪业界前沿(如Sora、Runway、Kling、Veo等),快速完成benchmark与迁移落地; 深度理解视频内容生产到多渠道分发的完整链路,与产品、运营、创意团队协同,构建适配业务的AI视频引擎与应用原型。
更新于 2024-09-27
社招MEG
-负责电商搜索场景的LLM相关算法研发工作,包括但不限于基座训练、sft、偏好对齐等 -利用LLM算法提升业务效果,包括但不限于需求理解、智能问答、优质内容生成、召回排序模块优化等 -跟进NLP/LLM/推荐/搜索等领域前沿技术 -结合产品需求推动前沿技术落地
更新于 2025-02-19