腾讯Senior Site Reliability Engineer
任职要求
Bachelor’s degree or above in Computer Science or related field 5+ years of experience in SRE, DevOps, or related field In-depth knowledge of Linux, databases, networking, security, and Kubernetes operations Experienced with AWS, Tencent Cloud, GCP, Azure; capable of selecting optimal cloud solutions based on needs Familiar with Python, Shell, and SQL scripting Experience managing and optimizing Hadoop/Spark/Flink is a strong plus Fluent in both Chinese and English, with excellent cross-team communication skills
工作职责
Take ownership of internal system SRE practices including CI/CD, observability, and system reliability Manage and ensure the reliability of big data platforms (e.g., Hadoop, Spark, Flink) in cloud environments Design highly available architectures tailored to business needs and define ops standards and incident playbooks Lead technology choices, performance tuning, and stability enhancements for core infrastructure Work Location: China-Shenzhen
Whom we are looking for: A quick learner A positive, self-motivated, and passionate person Independent, insistent, and open-minded. A great team player and both dependable and autonomous. Customer-oriented and could work at a very fast pace. Work Location: China-Shanghai
THE ROLE We're the small, expert team creating the next-generation server-side infrastructure to support the manufacturing and functionality of fleets of Tesla products, and we're looking for seasoned SREs with domain expertise in one or more of: containers, public clouds and cloud-native apps. Today, Tesla owners rely on our services to safely and securely summon their cars with a tap on their mobile phones -- a feature enabled by one of the many over-the-air updates we've delivered to the Tesla vehicle fleet. Tesla engineering relies on our data and analytics platform to make Tesla products better and safer. And, when an owner needs assistance, Tesla service and support rely our applications to understand and respond to the situation. Tomorrow, we will apply fleet learning to dispatch and deliver real-time road conditions to millions of autonomous vehicles and manage distributed energy generation & storage at grid scale. Join us and you will work alongside world-class software and data engineers on some of the newest and most challenging IoT, manufacturing and service engineering problems in the world today. The platform you help us build and automate will be used daily by millions of Tesla owners (and tens of thousands of Tesla employees) to improve and enhance the functionality of our cars, chargers, and batteries worldwide. RESPONSIBILITIES Design and write software that enables rapid prototyping by development teams, while ensuring the highest levels of reliability and availability. Work directly with our factory firmware team to provide highly available factory-facing services. Drive the migration of large-scale, distributed fleet applications towards cloud-native microservices. Influence architectural decisions with focus on security, scalability and high-performance. Automate the build and deployment of infrastructure using Docker, Kubernetes & other orchestration technologies in a hybrid-cloud environment. Setup and maintain monitoring, metrics & reporting systems for fine-grained observability and actionable alerting.
About Ant International With headquarters in Singapore and main operations across Asia, Europe, the Middle East and Latin America, Ant International is a leading global digital payment, digitisation and financial technology provider. Through collaboration across the private and public sectors, our unified techfin platform supports financial institutions and merchants of all sizes to achieve inclusive growth through a comprehensive range of cutting-edge digital payment and financial services solutions. We are seeking for Senior and Junior SRE Engineers for our Malaysia Tech Center, work on end-to-end solutions for cross-border payments for our global merchants and globalization business. 1. Collaborate with global teams to complete the daily ops and alarm handling. 2. Identify and implement solutions on stability, scalability and security of business infrastructure using frameworks and industry best practices. 3. Drive and manage technical and solution architecture discussions between global teams and partners to ensure timely delivery that meet customer needs. 4. Plan and execute roadmap for strategic infrastructure improvement incorporating initiatives that align with the company goals.
• Own a product space that provides LOL and TFT game developers with features they need that are specific to Tencent Regions. • Manage essential customer and partner relationships, including game teams and central platform teams that support your product space. • Create a product roadmap and manage a prioritized backlog for your product space, ensuring your team is working on the highest value work. • Actively manage project scope as a result of customer and partner feedback and ensure the product development is aligned with objectives. • Partner with your Engineering Manager peer to successfully execute projects through the development lifecycle, from discovery to release. • Develop and uphold service-level objectives and agreements for your product space, in collaboration all core customers and supporting organizations. • Partnering with the other Shanghai Tech Team leads, foster a high velocity, high performance organization.