阿里巴巴数据技术及产品部-大模型数据处理优化专家-杭州/北京
社招全职5年以上技术类-开发地点:北京 | 杭州状态:招聘
任职要求
1、精通Python/Java,熟悉PyTorch、vLLM等推理框架,具有处理多模态数据(图像、文本、音频、视频等)的经验,对主流大语言模型推理加速有实践经验者优先; 2、熟悉常用的flink/spark/hadoop等大数据处理框架 3、对Daft、Ra…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、负责多模态数据处理链路的推理优化,分析性能瓶颈,软硬件结合优化,实现包括语言识别、计算机视觉等多方向的推理优化,达到极致性能; 2、负责多模态数据处理链路的可观测性工具建设,包括数据处理、模型推理的监控、告警等; 3、负责异构资源(GPU、CPU等硬件)的调度优化,实现潮汐资源、混部资源、多云资源的最优化调度 4、负责集群和业务服务的稳定性治理、资源利用率提升,通过系统化方式提高GPU、CPU等硬件资源的使用效率。 5、参与设计高吞吐、低延迟的数据处理 pipeline。针对大模型数据处理场景(如LLM、多模态),优化数据清洗、预取、缓存及异步加载策略,确保数据大规模产出。
包括英文材料
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
vLLM+
https://www.newline.co/@zaoyang/ultimate-guide-to-vllm--aad8b65d
vLLM is a framework designed to make large language models faster, more efficient, and better suited for production environments.
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
还有更多 •••