蚂蚁金服蚂蚁集团-AI业务高可用工程师-杭州Z
社招全职3年以上技术类-开发地点:杭州状态:招聘
任职要求
1、三年以上Java研发经验,扎实的Java编程基础,精通io、多线程、集合等基础框架,精通Java EE、SOA、OSGI等相关技术; 2、熟悉K8S开发,并具备一定的云原生运维经验,能处理常规的云原生应用问题,如扩缩容、故障应急等工作; 3、具备大型分布式系统开发及架构经验,熟悉缓存、消息、服务治理、容灾、分布式一致性等机制; 4、具备良好的识别和设计通用框架及模块的能力,熟悉UML; 5、较强的表达和沟通能力,工作认真、严谨、敬业,对系统质量有近乎苛刻的要求意识。有很强的分析问题和解决问题的能力,有强烈的责任心。 6、具有稳定性、高可用方向建设经验,或大模型领域工作经验可优先考虑。
工作职责
1、 负责蚂蚁AI领域的稳定性工作,包括各类模型和引擎的基础稳定性能力建设,应急运维等工作; 2、 负责蚂蚁Tab3、搜索、推荐等业务的底层引擎的稳定性保障工作,包括slo的定制、跟踪、action改进等; 3、 负责组内高可用架构工作,进行业务稳定性和平台的中长期规划,主导技术难题攻关,持续提升系统在大规模分布式系统环境下高并发,保证系统的安全、稳定、快速运行; 4、 负责组内AI场景(短视频、搜推广、大模型业务)的稳定性平台和组件建设,包括运维平台、应急工具、提效能力等方向,通过技术手段解决稳定性问题;
包括英文材料
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
多线程+
https://liaoxuefeng.com/books/java/threading/basic/index.html
和单线程相比,多线程编程的特点在于:多线程经常需要读写共享数据,并且需要同步。
https://www.youtube.com/watch?v=_uQgGS_VIXM&list=PLsc-VaxfZl4do3Etp_xQ0aQBoC-x5BIgJ
https://www.youtube.com/watch?v=IEEhzQoKtQU
https://www.youtube.com/watch?v=mTGdtC9f4EU&list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4
https://www.youtube.com/watch?v=TPVH_coGAQs&list=PLk6CEY9XxSIAeK-EAh3hB4fgNvYkYmghp
https://www.youtube.com/watch?v=xPqnoB2hjjA
This video is an introduction to multithreading in modern C++.
https://www.youtube.com/watch?v=YKBwKy5PrpQ
Rust threading is easy to implement and improves the efficiency of your applications on multi-core systems!
SOA+
https://www.ibm.com/think/topics/soa
SOA, or service-oriented architecture, defines a way to make software components reusable and interoperable through service interfaces.
[英文] SOA Tutorial
https://www.tutorialspoint.com/soa/index.htm
The Service Oriented Architecture is an architectural design which includes collection of services in a network which communicate with each other.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
分布式系统+
https://www.distributedsystemscourse.com/
The home page of a free online class in distributed systems.
https://www.youtube.com/watch?v=7VbL89mKK3M&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A
缓存+
https://hackernoon.com/the-system-design-cheat-sheet-cache
The cache is a layer that stores a subset of data, typically the most frequently accessed or essential information, in a location quicker to access than its primary storage location.
https://www.youtube.com/watch?v=bP4BeUjNkXc
Caching strategies, Distributed Caching, Eviction Policies, Write-Through Cache and Least Recently Used (LRU) cache are all important terms when it comes to designing an efficient system with a caching layer.
https://www.youtube.com/watch?v=dGAgxozNWFE
服务治理+
https://cloudnativecn.com/blog/istio-traffic-management-series-service-management-concept-theory/
通过阅读本文读者可以初步理解 Istio 流量治理的概念和相关知识框架。
https://juejin.cn/post/6844904006033080334
服务治理主要包括服务发现、负载均衡、限流、熔断、超时、重试、服务追踪等。我们今天要讲的,就是服务发现的内容。
UML+
https://www.youtube.com/watch?v=WnMQ8HlmeXc
Learn about how to use UML diagrams to visualize the design of databases or systems.
高可用+
https://redis.io/blog/high-availability-architecture/
A high available architecture is when there are a number of different components, modules, or services that work together to maintain optimal performance, irrespective of peak-time loads.
https://www.ibm.com/think/topics/high-availability
High availability (HA) is a term that refers to a system’s ability to be accessible and reliable close to 100% of the time.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
相关职位
社招技术类-开发
1. 负责AI训练推理服务高可用方向工作,如推理框架、推理在线服务、训练平台、训练框架等。 2. 负责以上平台的集群的资源治理、数字化管理等平台的研发。 3. 通过软硬件协同优化和技术创新,为双11、双12大促,新春红包等重大活动的保障与业务护航。
更新于 2025-04-23
社招5年以上网易云音乐
1. 负责云音乐微服务、可观测性和存储相关中间件的设计与研发工作; 2. 从用户视角解决业务在使用过程中遇到的各种疑难问题,能够挖掘业务实际需求给出中间件的最佳实践与演进方案; 3. 以产品化思维完善中间件相关设计,打造稳定、易用的微服务与中间件解决方案,赋能业务高效创新。
更新于 2025-07-17
社招3年以上技术类-前端
1.核心参与花呗AI创新系列产品的建设,包括架构设计、开发和优化; 2.聚焦智能交互领域,通过AIGC技术,结合用户行为感知与主动服务设计,驱动个性化体验升级; 3.联合产品、设计、算法团队快速验证创新方案,打造卓越的新一代消费金融产品; 4.关注AI与前端技术融合趋势,能够将新知识传递给团队,推动技术在金融场景的创新应用。
更新于 2025-08-20