快手数据仓库开发工程师(风控)-【商业化】
社招全职3-5年D11903地点:北京状态:招聘
任职要求
1、有Hive、Kafka、Spark、Flink、HBase等两种以上两年以上使用经验; 2、熟悉数据仓库理论方法及ETL相关技术,对于数据的架构和设计有一定的思考,具备良好的数学思维和建模思维; 3、熟练使用Java、Python编程语言; 4、熟悉分布式计算框架,掌握分布式计算的设计与优化能力,了解流式计算; 5、有很强的学习、分析和解决问题的能力,良好的团队合作意识,较强的沟通能力; 6、有风控业务、画像业务背景优先。
工作职责
1、负责商业化风控数据仓库的建设,构建各垂直应用的数据集市; 2、定义并开发业务核心指标数据,负责垂直业务数据建模; 3、根据业务需求,提供大数据计算应用服务,并持续优化改进; 4、参与风控数仓应用数据开发工作,支持业务需求。
包括英文材料
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
相关职位
社招2年以上技术类-数据
1. 负责蚂蚁数科风控业务数据资产建设,支撑ToC、ToB场景风控相关数据开发工作; 2. 负责核心业务数据链路建模与离线&实时数据开发,支撑所在业务线的数据架构规划以及实施落地;负责所在业务线的数据服务的稳定性、数据时效性、数据质量的能力保障和能力建设;负责所在业务线的数据资产、数据资源的治理和保障; 3. 与算法、产品、运营、后端深度协同,将业务需求快速落地到生产。
更新于 2025-08-29
社招2年以上D7229
1、负责快手风控数据仓库的建设,整合构建各业务场景的基础数据; 2、根据业务需求,提供大数据计算应用服务,并持续优化改进; 3、针对复杂场景的数据进行合理的设计和加工,为业务上数据分析和应用加速、提效。
更新于 2025-06-17
社招3年以上JMXF2
1、负责风控数据链路开发工作,参与业务数据仓库架构设计、建模和ETL开发; 2、参与数据治理工作,提升数据易用性及数据质量,与算法团队紧密合作; 3、理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作。
更新于 2021-09-13