字节跳动大数据研发工程师 — 实验评估方向
社招全职A111007地点:北京状态:招聘
任职要求
1、熟悉Presto、Hive、Spark、Flink、Clickhouse、Hadoop等大数据框架,有大规模数据处理经验; 2、熟悉Python、SQL、Java、Sca…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、面向字节跳动旗下相关产品线,为业务指标建设提供支持和指导; 2、建设PB级数据仓库,参与负责数据仓库设计、建模、研发等; 3、建设ETL数据管道及自动化的ETL数据管道系统; 4、建设离线、在线、实时相结合的指标数据处理专家系统。
包括英文材料
Presto+
[英文] What is Presto?
https://prestodb.io/what-is-presto/
https://www.tutorialspoint.com/apache_presto/index.htm
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
还有更多 •••
相关职位
社招J5LM1
1、面向字节跳动旗下相关产品线,为业务指标建设提供支持和指导; 2、建设PB级数据仓库,参与负责数据仓库设计、建模、研发等; 3、建设ETL数据管道及自动化的ETL数据管道系统; 4、建设离线、在线、实时相结合的指标数据处理专家系统。
更新于 2019-07-28上海
社招PM62
1、参与大数据方向服务端的设计,研发以及文档编写,独立完成需求分析,测试,上线工作; 2、对需求进行技术选型,开发,确保设计合理; 3、对大数据平台新技术的预研探索和选型,新功能的设计、评审和研发。
更新于 2021-10-21北京
社招5年以上技术
1、参与滴滴大数据分析平台产品的研发,主要包括数据可视化、数据解读等能力建设,为公司提供数据化运营和决策支持; 2、深度参与产品需求评审和设计,能够对产品设计提出自己的见解,在理解产品的基础上进行抽象和架构设计; 3、对负责的模块能够进行持续的优化和性能提升,并积极拓展创新场景
更新于 2025-11-25北京