快手【快Star-X实习】用户画像算法工程师
实习兼职J1001地点:北京状态:招聘
任职要求
1、硕士及以上学历,计算机科学、数据科学、统计学、金融学相关专业优先,且具备较强的数理统计基础和逻辑思维能力; 2、熟练掌握回归、分类聚类、关联规则、时序预测、因果分析等相关数据挖掘及统计方法,善于抽象、拆解实际问题,并利用合理数据特征、算法模型实践; 3、扎实的编程基础,精通Python、C++、Java至少一门编程语言; 4、熟悉Hadoop生态,至少熟悉HSQL、Spark、Flink等一种数据处理技术,并对数据仓库、特征工程、特征选择有较好的理解; 5、良好的业务驱动力、业务敏感度,对数据价值挖掘、数据驱动业务感兴趣。 加分项: 1、熟悉短视频及直播业务社区,短视频重度用户。
工作职责
1、依托快手海量内容生产、消费、流量数据,基于短视频、直播等业务生态,挖掘用户画像属性,构建公司级用户画像数据体系和平台产品,深度参与创作者、直播、运营、增长等业务策略方案制定及实施,直接为业务提效赋能并达成新的增长点; 2、基于业务策略服务方案,面向全站用户、内容、社区沉淀数据标签资产至数据中台,为业务运营决策提供基础标签能力支撑,并不断挖掘、萃取数据价值; 3、基于海量、异构、高维的时空大数据,建设精准的全域空间实体数据体系,搭建全方位的地理位置服务(LBS); 4、建设业内一流的设备指纹引擎,整合快手海量多来源数据,建设全站统一的ID-Mapping服务框架。
包括英文材料
学历+
数据科学+
https://roadmap.sh/ai-data-scientist
Step by step roadmap guide to becoming an AI and Data Scientist
数据挖掘+
https://www.youtube.com/watch?v=-bSkREem8dM
Database vs Data Warehouse vs Data Lake
https://www.youtube.com/watch?v=7rs0i-9nOjo
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
特征工程+
https://www.ibm.com/think/topics/feature-engineering
Feature engineering preprocesses raw data into a machine-readable format. It optimizes ML model performance by transforming and selecting relevant features.
https://www.kaggle.com/learn/feature-engineering
Better features make better models. Discover how to get the most out of your data.
相关职位
实习J1006
1、广告算法策略包含深度学习、强化学习、大数据、数据挖掘、并行优化、策略机制等多个方向; 2、负责机器学习的算法和模型开发,包括:DNN、超参数优化、学习和优化方法等; 3、负责海量数据的分析和挖掘工作,构建用户画像模型,提升CTR、CVR; 4、对机器学习尤其是深度学习前沿问题进行探索与研究; 5、对推荐系统、自然语言处理、图像处理等领域提供模型支持。
更新于 2025-04-29
实习D7375
1、负责挖掘海量用户数据,基于因果模型、机器学习模型等框架进行基础体验画像建设,包括但不限于 用户 x 内容 x 场景 的清晰度/流畅度/低延迟 偏好画像等;通过精准刻画用户基础体验属性,驱动音视频个性化策略下发; 2、负责构建带宽曲线预测、视频热度预测等时序模型,驱动音视频资源调度优化(e.g.,视频热度时序建模); 3、负责探索大模型在时序预测、资源分配、人群画像偏好等场景的应用和落地; 4、负责与内外部团队合作,包括商业化、电商等,制定基于用户价值的体验和成本优化策略,并推动优化上线。
更新于 2025-03-31
实习J1005
1、探索大模型与推荐算法结合的下一代推荐系统技术,充分利用大模型的领域知识和学习范式为推荐系统注入新的能量,包括但不限于文本/ID生成式推荐、模型Scaling Law、用户超长序列端到端建模等; 2、探索视频、文本和语音等多模态信号的高效处理方式以及与推荐系统对齐的能力,让推荐系统看懂、听懂和理解世界; 3、混合专家、蒸馏剪枝等兼顾模型性能和效果的技术探索; 4、紧跟行业及大模型技术发展,结合业界前沿技术和业务需求,打造大模型应用的最佳实践。
更新于 2025-05-12