快手【留用实习】数据挖掘工程师
实习兼职J1018地点:北京状态:招聘
任职要求
1、硕士及以上学历,计算机科学、数据科学、统计学、金融学相关专业优先,且具备较强的数理统计基础和逻辑思维能力; 2、熟练掌握回归、分类聚类、关联规则、时序预测、因果分析等相关数据挖掘及统计方法,善于抽象、拆解实际问题,并利用合理数据特征、算法模型实践; 3、扎实的编程基础,精通Python、C++、Java至少一门编程语言; 4、熟悉Hadoop生态,至少熟悉HSQL、Spark、Flink等一种数据处理技术,并对数据仓库、特征工程、特征选择有较好的理解; 5、良好的业务驱动力、业务敏感度,对数据价值挖掘、数据驱动业务感兴趣。 加分项: 熟悉短视频及直播业务社区,短视频重度用户。
工作职责
1、依托快手海量内容生产、消费、流量数据,基于短视频、直播等业务生态,挖掘用户画像属性,构建公司级用户画像数据体系和平台产品,深度参与创作者、直播、运营、增长等业务策略方案制定及实施,直接为业务提效赋能并达成新的增长点; 2、基于业务策略服务方案,面向全站用户、内容、社区沉淀数据标签资产至数据中台,为业务运营决策提供基础标签能力支撑,并不断挖掘、萃取数据价值; 3、基于海量、异构、高维的时空大数据,建设精准的全域空间实体数据体系,搭建全方位的地理位置服务(LBS); 4、建设业内一流的设备指纹引擎,整合快手海量多来源数据,建设全站统一的ID-Mapping服务框架。
包括英文材料
学历+
数据科学+
https://roadmap.sh/ai-data-scientist
Step by step roadmap guide to becoming an AI and Data Scientist
数据挖掘+
https://www.youtube.com/watch?v=-bSkREem8dM
Database vs Data Warehouse vs Data Lake
https://www.youtube.com/watch?v=7rs0i-9nOjo
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
特征工程+
https://www.ibm.com/think/topics/feature-engineering
Feature engineering preprocesses raw data into a machine-readable format. It optimizes ML model performance by transforming and selecting relevant features.
https://www.kaggle.com/learn/feature-engineering
Better features make better models. Discover how to get the most out of your data.
相关职位
实习J1001
1 挖掘海量用户数据,进行音视频场景的画像体系建设,包括但不限于机型画像、网络画像、用户清晰度/流畅度偏好画像等,精准刻画用户音视频属性。 2. 建设音视频体验QoE模型,优化播放和边缘计算相关策略,如预加载、CDN调度、PCDN等; 3. 基于因果模型、机器学习模型等框架进行音视频用户画像研发,全链路优化模型效果,包括特征优化,模型结构优化等 4. 与内外部团队合作,包括商业化、电商等,制定基于用户价值的体验和成本ROI优化策略,并推动优化上线。
更新于 2025-03-04
实习J1018
1、参与快手大数据体系的设计与建设,通过数据仓库、元数据、数据管理等体系,管理和建设几千P的数据; 2、参与各类数据专题体系(社交、内容生产/消费、直播、游戏、电商、商业化等)的建设,通过对数据的建设和应用理解,支持各类的业务管理决策和业务运营; 3、参与快手大数据产品的研发,研究洞察分析、效果监控、归因分析、ABTest等数据能力,结合自己的商业sense,发掘数据的业务价值; 4、获得数据领域的各类大牛的指导,徜徉在世界领先的大数据处理和应用技术的海洋中。
更新于 2025-06-23
实习J1004
1、参与综合短视频,直播,电商,本地,社交和多语言等搜索业务,用大规模机器学习,强化学习,多模态预训练等技术提升搜索质量,用户留存和点击率等核心业务指标; 2、负责搜索query 意图分类,query 表征,query推荐,视频内容理解&多模态表征,多模态语义召回和相关性等搜索核心技术,提升搜索用户渗透率和相关性; 3、负责搜索用户行为分析,语义和行为混合检索,多序列&多任务粗排,精排,重排等搜索排序技术,提升搜索质量和内容消费指标; 4、负责搜索生态和机制,参与搜索混排,多目标优化,异构内容混排,短期和长期目标平衡,冷启动等搜索等机制和算法。
更新于 2025-05-15