阿里巴巴客户体验部-多模态算法工程师-杭州-多模态算法
社招全职3年以上技术类-算法地点:杭州状态:招聘
任职要求
岗位要求 1. 计算机、人工智能、统计或相关专业硕士及以上学历,3年以上算法研发经验; 2. 具备扎实的计算机视觉基础,熟悉目标检测、图像分类、OCR、图像质量评估等技术,有实际项目落地经验; 3. 熟悉主流多模态模型架构(如CLIP、BLIP、Flamingo、Qwen-VL等),有在真实业务中微调和应用的经验,熟悉SFT/RLHF/DPO等对齐技术,有奖励模型构建与策略优化经验; 4. 有VLLM推理优化经验者优先,熟悉vLLM、TensorRT-LLM、HuggingFace…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1. 负责国际电商平台中用户、商家及物流商上传的多模态信息进行识别与分析,构建并优化以视觉为主的多模态理解系统; 2. 针对商品一致性、物流规范检测、数量识别、破损识别等核心业务场景,设计并落地端到端的多模态算法模型,提升自动化审核准确率; 3. 主导视觉大模型在电商场景下的预训练、微调、蒸馏与部署优化,具备AI-native系统设计思维; 4. 参与VLLM(Vision-Language Large Models)推理性能优化,包括但不限于模型量化、缓存机制、批处理调度、异步推理管道设计等,提升高并发场景下的响应速度与资源利用率; 5. 与产品、运营、数据团队紧密协作,推动算法能力在实际业务中的闭环验证与持续迭代。
包括英文材料
学历+
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
OpenCV+
https://learnopencv.com/getting-started-with-opencv/
At LearnOpenCV we are on a mission to educate the global workforce in computer vision and AI.
https://opencv.org/university/free-opencv-course/
This free OpenCV course will teach you how to manipulate images and videos, and detect objects and faces, among other exciting topics in just about 3 hours.
OCR+
https://www.ibm.com/think/topics/optical-character-recognition
Optical character recognition (OCR) is a technology that uses automated data extraction to quickly convert images of text into a machine-readable format.
https://www.youtube.com/watch?v=or8AcS6y1xg
Optical character recognition (OCR) is sometimes referred to as text recognition.
SFT+
https://cameronrwolfe.substack.com/p/understanding-and-using-supervised
Understanding how SFT works from the idea to a working implementation...
RLHF+
[英文] What is RLHF?
https://aws.amazon.com/what-is/reinforcement-learning-from-human-feedback/
Reinforcement learning from human feedback (RLHF) is a machine learning (ML) technique that uses human feedback to optimize ML models to self-learn more efficiently.
https://www.ibm.com/think/topics/rlhf
Reinforcement learning from human feedback (RLHF) is a machine learning technique in which a “reward model” is trained with direct human feedback, then used to optimize the performance of an artificial intelligence agent through reinforcement learning.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
vLLM+
https://www.newline.co/@zaoyang/ultimate-guide-to-vllm--aad8b65d
vLLM is a framework designed to make large language models faster, more efficient, and better suited for production environments.
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
还有更多 •••