字节跳动大数据开发工程师
社招全职A166444A地点:北京状态:招聘
任职要求
1、熟悉一个大数据领域的开源框架,Hadoop/Hive/Flink/FlinkSQL/Spark/Kafka/Hbase/Redis rocksdb/Elasticsearch/Parquet; 2、熟悉Java、…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、为大规模推荐系统设计和实现合理的离线/实时数据架构; 2、设计和实现灵活可扩展、稳定、高性能的存储系统和计算模型; 3、生产系统的Trouble-shoting,设计和实现必要的机制和工具保障生产系统整体运行的稳定性; 4、打造业界领先的离在线存储、批式流式计算框架等分布式系统,为海量数据和大规模业务系统提供可靠的基础设施。
包括英文材料
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
还有更多 •••
相关职位
社招住宿业务开发
1、负责离线和在线数据的采集、清洗和加载; 2、负责通过专项分析,输出专项分析报告,为业务决策和监控提供数据支持; 3、负责携程大量商户/用户数据的分析和提炼。
更新于 2025-03-31上海
社招2年以上食杂零售
1.参与小象离线和实时数仓开发,沉淀数据资产。 2.联合产品、商分等部门,高质高效交付业务需求。 3.深入理解生鲜自营即时零售业务,推动数据应用建设,提升业务决策质量和效率。
更新于 2025-04-03北京
社招4年以上核心本地商业-基
1.承担美团服务零售业务线的数仓设计和开发工作; 2.承担业务方应用层数据的搭建和开发工作; 3.承担服务零售业务数据质量、成本、安全等各方向数据治理工作; 4.业务方数据问题的统一接口人与综合解决方案提供方,对外提供一站式服务; 5.跨团队沟通、推动数据生产链路上的问题改进。
更新于 2025-04-03上海