
地平线【地瓜机器人】大模型技术专家
社招全职算法序列地点:北京状态:招聘
任职要求
1. 大模型端侧推理加速方向深厚的功底和经验,熟悉kv cache、低比特量化、flash-attention、投机采样、稀疏化等 2. 熟悉常见开源大模型推理框架,如vllm、tensorrt-llm、mcl、llama.cpp等 3. 熟悉大模型相关领域最新进展,快速评估新技术的能力边界 4. 出色的复杂问题解决能力,能动手深入分析解决大模型端侧部署存在的问题 5. 良好的沟通协作能力,能和团队一起探索新技术,推进技术进步
工作职责
负责机器人领域端侧大模型的研发和应用,研判大模型未来发展趋势为后续芯片NPU规划提供输入。主要工作方向包括: 1. 探索LLM、VLM、VLA大模型在端侧性能与精度极限 2. 跟进与判断大模型发展趋势为后续芯片NPU规划提供输入
包括英文材料
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
缓存+
https://hackernoon.com/the-system-design-cheat-sheet-cache
The cache is a layer that stores a subset of data, typically the most frequently accessed or essential information, in a location quicker to access than its primary storage location.
https://www.youtube.com/watch?v=bP4BeUjNkXc
Caching strategies, Distributed Caching, Eviction Policies, Write-Through Cache and Least Recently Used (LRU) cache are all important terms when it comes to designing an efficient system with a caching layer.
https://www.youtube.com/watch?v=dGAgxozNWFE
vLLM+
https://www.newline.co/@zaoyang/ultimate-guide-to-vllm--aad8b65d
vLLM is a framework designed to make large language models faster, more efficient, and better suited for production environments.
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
llama.cpp+
https://blog.steelph0enix.dev/posts/llama-cpp-guide/
No LLMs were harmed during creation of this post.
https://github.com/ggml-org/llama.cpp/discussions/15396
This is a detailed guide for running the new gpt-oss models locally with the best performance using llama.cpp.
https://www.youtube.com/watch?v=EPYsP-l6z2s
In this guide, you'll learn how to run local llm models using llama.cpp.
相关职位