百度大模型推理工程师(J101025)
社招全职1-3年ACG地点:北京状态:招聘
任职要求
-本科及以上学历,计算机、软件工程、人工智能等相关专业,1-3年大模型推理工程落地经验,熟悉LLM推理原理 -熟练掌握 Python、Linux、Shell,熟悉网络、多进程/多线程、异步并发编程 -精通大模型推理优化技术,熟练使用 vLLM/TGI/TensorRT 至少一种主流推理引擎,掌握量化(INT4/INT8)、KV Cache、PagedAttention、动态批处理等…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
-负责大语言模型、多模态模型的推理部署、性能优化与服务化落地,支撑公司MaaS模型服务平台对外稳定、低成本提供推理能力 -负责模型推理链路优化,包括推理加速、批处理调度、KV Cache优化等,持续提升吞吐、降低延迟、减少GPU资源成本 -搭建和优化大模型推理服务架构,实现模型热加载、动态扩缩容、流量隔离、负载均衡、超时重试、熔断降级,保障线上高并发、高可用、低抖动 -负责线上推理问题排查、性能瓶颈分析、稳定性治理,持续优化算力利用率、售卖率、服务SLA,支撑业务规模化商用 -配合业务、算法、平台团队,完成新模型接入、版本迭代、灰度发布、压测验收,输出标准化部署、监控、运维规范 -参与MaaS平台推理调度、资源管理、计费统计、算力运营体系建设,助力模型服务商业化落地
包括英文材料
学历+
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
多线程+
https://liaoxuefeng.com/books/java/threading/basic/index.html
和单线程相比,多线程编程的特点在于:多线程经常需要读写共享数据,并且需要同步。
https://www.youtube.com/watch?v=_uQgGS_VIXM&list=PLsc-VaxfZl4do3Etp_xQ0aQBoC-x5BIgJ
https://www.youtube.com/watch?v=IEEhzQoKtQU
https://www.youtube.com/watch?v=mTGdtC9f4EU&list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4
https://www.youtube.com/watch?v=TPVH_coGAQs&list=PLk6CEY9XxSIAeK-EAh3hB4fgNvYkYmghp
https://www.youtube.com/watch?v=xPqnoB2hjjA
This video is an introduction to multithreading in modern C++.
https://www.youtube.com/watch?v=YKBwKy5PrpQ
Rust threading is easy to implement and improves the efficiency of your applications on multi-core systems!
vLLM+
https://www.newline.co/@zaoyang/ultimate-guide-to-vllm--aad8b65d
vLLM is a framework designed to make large language models faster, more efficient, and better suited for production environments.
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
TGI+
https://huggingface.co/docs/text-generation-inference/en/index
Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs).
https://learn.ritual.net/examples/tgi_inference_with_mistral_7b
In this tutorial, we will use Huggingface's TGI (Text Generation Interface) API to query a Large Language Model (LLM) and enable users to requests jobs from it, both on-chain and off-chain.
https://www.sandgarden.com/learn/text-generation-inference-tgi
Text Generation Inference (TGI) is the process by which a trained AI model generates new text based on an input prompt, focusing on producing this text efficiently in terms of speed and computational resources.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
还有更多 •••