SAPSAP China iXp Intern - System Reliability Engineer Intern - Shanghai
任职要求
We are looking for a highly motivated and enthusiastic intern to join our Site Reliability Engineering (SRE) team, specializing in cloud-native technologies like Docker and Kubernetes. As an intern, you will assist the SRE team in maintaining and scaling our cloud infrastructure, ensuring high availability and performance of our services. You will work with various teams, including development, support, and security, to ensure that our cloud-based applications are resilient, scalable, and secure. Currently pursuing a degree in Computer Science, Information Technology, or a related field Knowledge of cloud-native technologies such as Docker, Kubernetes Familiarity with infrastructure as code and automation tools like Terraform Proficiency in one or more programming languages like Python, Go, or Java Understanding of Linux operating sy…
工作职责
Assist in designing, building, and maintaining a scalable and reliable cloud infrastructure Collaborate with developers, operations, and security teams to ensure that the infrastructure is performing optimally and securely Monitoring and alarm systems for our cloud infrastructure, applications, and services Monitor system performance, identify and resolve issues proactively, and troubleshoot incidents when they arise Develop and implement automation tools to streamline processes and improve operational efficiency Participate in the development of disaster recovery and business continuity plans Document infrastructure and processes to ensure knowledge transfer and institutional memory Stay up-to-date with emerging trends and technologies in cloud-native computing and SRE practices
• Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms; • Deploy and evaluate neural network models in physics simulation and on real humanoid hardware; • Design and maintain teleoperation software for controlling humanoid robots with low latency and high precision; • Implement tools and processes for regular robot maintenance, diagnostics, and troubleshooting to ensure system reliability; • Monitor teleoperators at the lab and develop quality assurance workflows to ensure high-quality data collection; • Collaborate with researchers on model training, data processing, and MLOps lifecycle.
• Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms; • Deploy and evaluate neural network models in physics simulation and on real humanoid hardware; • Design and maintain teleoperation software for controlling humanoid robots with low latency and high precision; • Implement tools and processes for regular robot maintenance, diagnostics, and troubleshooting to ensure system reliability; • Monitor teleoperators at the lab and develop quality assurance workflows to ensure high-quality data collection; • Collaborate with researchers on model training, data processing, and MLOps lifecycle.
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. What you’ll be doing: • Utilizing AI-powered tools to enhance QA efficiency, including automating test case generation, defect detection, and regression testing. • Implementing AI-driven solutions to optimize test coverage and identify high-risk areas in software systems. • Collaborating with cross-functional teams to adopt AI tools that improve workflow automation and reduce manual effort. • Review product requirements and collaborate with cross-functional teams to define test requirements/strategies • Build test plan, design test case, execute and report test progress, bugs and results to management. • Perform Function, Performance, Fault Injection and reliability testing • Automate test cases and assist in the architecture, crafting and implementing of test frameworks. • Manage bug lifecycle and co-work with inter-groups to drive for solutions. • In-house repro and verify customer issues/fixes. • Leveraging AI-powered tools to automate repetitive testing tasks, optimize test coverage, and detect flaky tests