快手数据开发工程师-【风控】
社招全职2年以上D7229地点:北京状态:招聘
任职要求
1、熟悉分布式计算框架,掌握分布式计算的设计与优化能力,了解流式计算; 2、有Hive、Kafka、Spark、Flink、HBase等两种以上两年以上使用经验; 3、熟悉数据仓库理论方法及ETL相关技术,对于数据的架构和设计有一定的思考,具备良好的数学思维和建模思维; 4、有很强的学习、分析和解决问题的能力,良好的团队合作意识,较强的沟通能力; 5、有风控背景者优先。
工作职责
1、负责快手风控数据仓库的建设,整合构建各业务场景的基础数据; 2、根据业务需求,提供大数据计算应用服务,并持续优化改进; 3、针对复杂场景的数据进行合理的设计和加工,为业务上数据分析和应用加速、提效。
包括英文材料
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
相关职位
社招3-5年D11903
1、负责商业化风控数据仓库的建设,构建各垂直应用的数据集市; 2、定义并开发业务核心指标数据,负责垂直业务数据建模; 3、根据业务需求,提供大数据计算应用服务,并持续优化改进; 4、参与风控数仓应用数据开发工作,支持业务需求。
更新于 2025-08-25
社招数据开发岗
1.负责按照业务需求建立并完善风控所需要的风控集市 ,参与模型结构设计、模型mapping开发、特征开发等工作; 2.负责自有数据、三方数据进行分层管理和加工,通过合理的数据抽象和建模,沉淀可复用的数据资产; 3.参与数据治理、数据质量、数据服务及数据产品等基础数据平台和设施建设。
更新于 2025-06-16
社招5-10年D11903
1、负责快手商业化风控平台相关的需求开发,保证需求按时、高质量交付; 2、深入理解和分析产品需求,能不断拓展业务场景边界,撰写设计文档和系统开发; 3、愿意接受风控领域复杂业务逻辑、海量数据、服务高可用的挑战,推动系统可用性和可扩展性的提升; 4、具有良好的系统抽象能力,在系统中台能力建设方面能提出创新的解决思路和方案,为团队引入创新技术。
更新于 2025-07-30