小鹏汽车GPU tools高级/资深/专家工程师

1. Architect and implement agentic workflows that plan, reason, call tools/APIs, and coordinate with humans or other agents. 2. Select, extend, or build frameworks (e.g., LangChain, AutoGen, CrewAI, MetaGPT, LangGraph) to accelerate delivery while avoiding vendor lock-in. 3. Own the MLOps lifecycle: data collection, evaluation harnesses, safety filters, CI/CD, and observability for deployed agents. 4. Integrate enterprise systems & data sources (REST/GraphQL, Kafka, vector databases, Kubernetes) so agents can act on real business objects. 5. Mentor and review code for junior engineers; drive best practices in prompt engineering, evaluation, and secure coding. 6. Research emerging techniques (toolformer, self-reflection, role specialization) and translate findings into the product roadmap.
• Lead the software development in C/C++, Python, and in GPU languages such as CUDA, ROCm, or Triton• Analyze metrics and identify opportunities based on offline and online testing, develop and deliver robust and scalable solutions.• Work with cutting-edge hardware stacks and a fast-moving software stack to deliver best-of-class inference and optimal cost.• Engage with key partners to understand and implement inference and training optimization for state-of-the-art LLMs and other models.
• Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle. • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture
We are seeking a skilled developer to build production-grade AI applications, focusing on LLM-based agents and tool-using systems. You will integrate large language models (LLMs), retrieval-augmented generation (RAG), and external tools/APIs on GPU-accelerated stacks, enhancing agent frameworks for reliability, scalability, and safety. What You’ll Be Doing: • Design, implement, and deploy AI-powered features using LLMs, including autonomous and multi-agent workflows. • Build agent toolchains, including planning, tool/function calling, memory management, RAG integration, and enterprise API connectivity. • Enhance agent frameworks with custom planners, routers, concurrency control, state management, and retry mechanisms. • Develop evaluation and observability systems to monitor agent performance (success rates, tool-call accuracy, latency, cost, traces). • Implement safety and compliance measures, including content filtering, PII handling, and policy enforcement using guardrail frameworks. • Optimize inference pipelines for GPU performance, latency, and cost; deploy via microservices and APIs. • Manage CI/CD, containerization, and deployment; maintain monitoring, logging, and alerting; and produce clear documentation.