英伟达Senior Application Engineer - Big Data
任职要求
• BS, MS, or PhD in Computer Science, Computer Engineering, or closely related field • 12+ years of work or research experience in software development • Excellent programming skills for manipulating data frames in Python, Scala, Java or SQL • Strong problem solving skills coupled with customer-facing communication skills • Knowledge of open source big data open source ecosystem (Apache Hadoop, Spark, Hive, Presto, Airflow, Kafka, etc) • Able to work successfully with multi-functional teams across organizational boundarie…
工作职责
• Serve as a lead application architect in RAPIDS Accelerator for Apache Spark . • Define reference architecture of accelerated Apache Spark applications for major industry verticals. • Lead the technical engagement with select customers and partners to accelerate Apache Spark applications with GPUs. • Work closely with NVIDIA Spark engineering teams in architecture design and system implementations. • Partner with Solution Architects to understand customer’s existing big data and ML/DL solution architecture. • Conduct regular technical customer meetings for project/product roadmap, feature discussions, customer issue resolution and performance tuning. • Build and work on the PoC for solutions that address customer’s critical business needs. • Develop applications to promote best practices for accelerated data analytics and machine/deep learning in various industry verticals. • Build tools to analyze data processing workloads to find opportunities for acceleration and cost savings. • Work with major cloud service providers and Apache Spark vendors globally. • Engage open source communities, including Apache Spark and RAPIDS, for technical discussions and contributions.
• Drive reliability engineering initiatives, including infrastructure automation, service monitoring, incident response, and capacity planning. • Leading and participating in technical design discussions across cross functional teams. • Collaborate with application teams to define and enforce architectural best practices, CI/CD standards, and cloud-native patterns. • Diagnose complex production issues through in-depth troubleshooting and implement resilient solutions to prevent recurrence. • Contribute to the development of internal tools that improve observability, system health, and operational transparency. • Analyze and optimize existing systems, providing enhancements and ongoing support as needed. • Stay current with new technologies and proactively recommend improvements to existing cloud architectures and processes. • Develop and maintain server-side logic, data processing, and application workflows. • Mentor junior engineers and promote a culture of knowledge sharing and continuous improvement.
• Design and implement end-to-end data pipelines (ETL) to ensure efficient data collection, cleansing, transformation, and storage, supporting both real-time and offline analytics needs. • Develop automated data monitoring tools and interactive dashboards to enhance business teams’ insights into core metrics (e.g., user behavior, AI model performance). • Collaborate with cross-functional teams (e.g., Product, Operations, Tech) to align data logic, integrate multi-source data (e.g., user behavior, transaction logs, AI outputs), and build a unified data layer. • Establish data standardization and governance policies to ensure consistency, accuracy, and compliance. • Provide structured data inputs for AI model training and inference (e.g., LLM applications, recommendation systems), optimizing feature engineering workflows. • Explore innovative AI-data integration use cases (e.g., embedding AI-generated insights into BI tools). • Provide technical guidance and best practice on data architecture that meets both traditional reporting purpose and modern AI Agent requirements.
Build/Improve experiment platforms for new scenarios.Build data pipelines on multiple computation platforms for reporting, analysis and metrics pre-computation with stable SLA and good quality.Build agents for productivity improvement.
AI Agent Engineering • Design, develop, and deploy production-grade AI agent systems, including multi-agent orchestration, tool-use frameworks, memory management, and API integration — ensuring reliability, scalability, and maintainability • Build and optimize Retrieval-Augmented Generation (RAG) pipelines: document ingestion, chunking strategy, embedding, vector search, and re-ranking to maximize LLM grounding quality • Support LLM adaptation to WWGS business domains through prompt engineering, context injection, fine-tuning signal curation, and systematic prompt evaluation frameworks • Develop automated knowledge base construction and real-time data access capabilities (Data Agent, MCP server/client) to connect AI agents with live business data • Design and implement LLM evaluation pipelines to systematically assess agent output quality, hallucination risk, and business impact Data Engineering • Design and implement end-to-end data pipelines (batch and streaming) for data collection, transformation, and storage — supporting both AI application and analytics use cases • Build and maintain integration layer data models that serve as a unified, AI-ready data foundation across WWGS domains • Develop automated data quality monitoring, alerting, and observability tooling to ensure pipeline reliability and data trustworthiness • Integrate multi-source data (seller behavior, transaction logs, off-platform signals, AI outputs) into a coherent, governed data layer • Establish data standardization and governance policies ensuring consistency, accuracy, and compliance across AI and BI consumption layers Technical Leadership • Provide technical guidance on AI-data architecture decisions; define best practices for the team's AI agent and data engineering stack • Collaborate cross-functionally with Product, Operations, and Science teams to translate business requirements into scalable technical solutions • Mentor junior engineers and conduct design reviews; raise the technical bar across the team