vivo大数据工程师(实时计算)
社招全职5年以上研发类地点:深圳状态:招聘
任职要求
1、5年以上工作经验,计算机或相关专业本科及以上学历; 2、熟悉并行计算或者分布式计算原理,熟悉高并发、高稳定性、可线性扩展、海量数据的系统特点和技术方案; 3、熟悉Kafka/Flink/Druid/Hbase/Doris/ClickHouse/Redis/Spark等实时计算引擎、组件的开发和使用; 4、熟悉 Java / Scala 等大数据组件语言,具备良好的代码规范和系统设计能力; 5、善于沟通,对业务敏感,能快速理解业务背景,具备优秀的技术与业务结合能力; 其他: 1、有PB级数据量处理、调优经验优先; 2、对流式计算系统有深入的了解,有大规模实时计算系统开发经验,有Flink源码阅读/开源社区贡献经验优先; 3、有大规模实时数仓、湖仓一体落地经验者优先; 4、有有广告/游戏实时归因的经验优先。
工作职责
1、设计并开发高可用、可扩展的实时数据处理系统,稳定处理来自智能手机的海量用户行为数据; 2、负责实时指标计算的数据建模、架构设计与开发,包括:数据接入、数据处理、olap分析等,针对海量的数据,能以最优的方案实现,满足性能要求; 3、负责实时系统的异常数据监测和数据质量保障,提升和数据准确性、一致性和稳定性; 4、与PM、业务方合作,深入业务解决问题。
包括英文材料
学历+
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
相关职位
社招JMX2P
1.负责抖音短视频和电商业务的实时数据仓库的构建; 2.负责实时指标计算的数据建模、架构设计与开发; 3.负责对实时计算作业的性能进行优化; 4.参与大数据应用规划,为数据产品、挖掘团队提供应用指导。
更新于 2020-12-21
社招J7098
1. 负责抖音短视频和直播业务的实时数据仓库的构建; 2. 负责实时指标计算的数据建模、架构设计与开发; 3. 负责对实时计算作业的性能进行优化; 4. 参与大数据应用规划,为数据产品、挖掘团队提供应用指导。
更新于 2022-06-01
社招X9WV
1、为大规模推荐系统设计和实现合理的流式计算系统; 2、设计和实现灵活可扩展、稳定、高性能存储系统和计算模型; 3、生产系统的Trouble-shooting,设计和实现必要的机制和工具保障生产系统稳定运行; 4、打造业界领先的流式计算框架等分布式系统,为海量数据和大规模业务系统提供可靠的基础设施。
更新于 2021-12-31