
货拉拉大数据开发工程师(离线)(J18341)
社招全职3年以上地点:北京状态:招聘
任职要求
1、本科及以上学历,3年及以上的大数据工作经验; 2、熟悉Hadoop\Hive\Spark\Flink\Hbase\Kafka等大数据生态常用组件; 3、具有较强的编码能力,对代码风格自我要求严格,精通HiveSql编写及调优,熟练使用Java/Python等至少其中一门语言; 4、精通ETL开发,熟悉数据仓库建设方法论,熟悉大型数据仓库架构和模型设计; 5、对数据质量有着自己一定的理解和想法,能提升平台的数据质量稽核能力; 6、有着优秀的沟通协调能力及团队合作精神,能够积极主动推动需求落地; 7、有风控模型、知识图谱相关工作经验者优先;
工作职责
1、负责数据仓库建设以及指标体系建设工作; 2、负责数据的ETL的设计、开发与性能优化; 3、与上游系统与下游应用同时沟通协作,推动各类需求在数据模型中的落地; 4、负责数据治理、数据质量等方案设计与落地; 5、协助算法工程师进行数据模型构建以及算法特征的数据落地; 6、参与风控场景数据挖掘和模型分析工作;
包括英文材料
学历+
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
相关职位
社招A259456A
团队介绍:字节跳动推荐架构团队,负责字节跳动超10亿用户产品推荐系统架构的设计和开发,保障系统稳定和高可用;负责在线服务、离线数据流性能优化,解决系统瓶颈,降低成本开销;抽象系统通用组件和服务,建设推荐中台、数据中台,支撑新产品快速孵化以及为ToB赋能;实现灵活可扩展的高性能存储系统和计算模型,打通离在线数据流,构建统一的数据中台,支持推荐/搜索/广告。 1、为大规模推荐系统设计和实现合理的离线/实时数据架构,打造业界领先的离在线存储、批式流式计算框架等分布式系统,为海量数据和大规模业务系统提供可靠的平台化基础设施; 2、深入推荐系统,探索数据架构如何为业务赋能,提升线上效果; 3、尝试打破现有边界,探索核心框架的演进、新技术的应用、推荐大模型的落地; 4、生产系统的TROUBLE-SHOOTING和成本优化,设计和实现必要的机制和工具保障生产系统整体运行的稳定性与效率。
更新于 2025-05-13
社招住宿业务开发
1、负责离线和在线数据的采集、清洗和加载; 2、负责通过专项分析,输出专项分析报告,为业务决策和监控提供数据支持; 3、负责携程大量商户/用户数据的分析和提炼。
更新于 2025-03-31
社招A98746
1、参与离线与实时数据仓库的构建,支持国际化本地生活业务的发展; 2、深入业务,理解并合理抽象业务需求,并负责落地实施,与业务团队紧密合作,为业务提供数据解决方案; 3、参与数据模型的设计,ETL实施,ETL性能优化,ETL数据监控以及相关技术问题的解决; 4、参与大数据应用规划和支持,为数据产品、挖掘团队提供技术支持; 5、参与数据治理工作,提升数据易用性及数据质量。
更新于 2025-06-05