快手数据挖掘算法工程师
校招全职J1002地点:北京状态:招聘
任职要求
1、计算机/数学/物理/信息等专业硕士及以上学历; 2、扎实的统计学功底,擅长数据分析和科学实验方法,对机器学习、深度学习等主流算法理解深刻,且应用熟练; 3、熟练掌握Python、C++、JAVA、R等至少一门编程语言,优秀的编码能力和习惯,良好的数据结构和算法功底; 4、熟悉基于Hive、Spark、ElasticSearch、Mongodb等大数据平台的相关开发; 5、热爱短视频行业,较强的沟通能力和逻辑表达能力,善于沟通和团队协作,具有较强责任心和目标导向意识。 加分项: 1、在KDD、NeurIPS、WWW、SIGIR、WSDM、ICML、IJCAI、AAAI、RecSys等顶会发表论文; 2、在Kaggle等数据挖掘/机器学习竞赛获奖; 3、在ACM等编程竞赛获奖。
工作职责
1、负责海量短视频生产链路算法优化,基于视频特效、用户画像、行为序列、消费反馈等大规模数据信号进行算法建模,加强特效、美颜等视频生产业务的智能化; 2、通过异常检测、因果推断、自动归因等算法等对生产、消费数据进行挖掘,洞察业务痛点,指导业务优化方向; 3、挖掘热点事件、预测流行趋势,帮助视频特效等业务更好运营和生产; 4、挖掘用户特征,用于提升广告与用户匹配的效率、业务反欺诈、渠道反作弊、搜索索引等业务场景。
包括英文材料
学历+
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
R+
[英文] R Tutorial
https://www.w3schools.com/r/
R is often used for statistical computing and graphical presentation to analyze and visualize data.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
MongoDB+
https://learnxinyminutes.com/mongodb/
MongoDB is a NoSQL document database for high volume data storage.
https://studio3t.com/academy/#courses
The fastest way to learn MongoDB
https://www.youtube.com/watch?v=c2M-rlkkT5o
This video will give you and introduction to MongoDB in 1 Hour. Afterwards I recommend exploring aggregation, replication, and sharding.
https://www.youtube.com/watch?v=ExcRbA7fy_A&list=PL4cUxeGkcC9h77dJ-QJlwGlZlTd4ecZOA
You'll learn how to use MongoDB (a NoSQL database) from scratch. You'll also learn how to integrate it into a simple Node.js API.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
NeurIPS+
https://neurips.cc/
WSDM+
https://www.wsdm-conference.org/
ICML+
https://icml.cc/
RecSys+
[英文] Recommender Systems
https://recsys.acm.org/
This site contains information about the ACM Recommender Systems community, the annual ACM RecSys conferences, and more.
Kaggle+
[英文] Kaggle Learn
https://www.kaggle.com/learn
Gain the skills you need to do independent data science projects.
数据挖掘+
https://www.youtube.com/watch?v=-bSkREem8dM
Database vs Data Warehouse vs Data Lake
https://www.youtube.com/watch?v=7rs0i-9nOjo
相关职位
社招2年以上G341
1、研究数据挖掘或统计学习领域的前沿技术,针对海量用户行为和内容信息,构建和优化用户画像以及用户属性; 2、基于对用户理解和大量数据特征,参与风控、精准营销、个性化定价等模型建设和领域研究,提升产品效果; 3、根据公司需要寻找和采集相关数据,对原始数据进行清理、甄别、归类和整合,并实现流程自动化。
更新于 2020-11-11

社招2年以上
1. 负责国际机票智能运营系统的搭建,利用数据科学相关手段解决报价策略、收益管理等供应链核心问题; 2. 负责国际机票供应链核心业务的策略优化,数据驱动改进业务流程,提升总体效率和核心指标; 3. 负责国际机票供应链相关数据的定量分析,洞察数据背后的业务规律和价值,发掘优化方向,探索解决方案.
更新于 2023-03-27

社招2年以上算法工程
1、研发基于VLM/多模态大模型的数据挖掘算法,精准识别自动驾驶长尾场景(如极端天气、复杂交通参与行为、罕见障碍物等)。 2、构建高效的自动化数据挖掘Pipeline,提升数据标签质量并降低标注成本。 3、 结合点云、图像、文本等多模态数据,设计多模态特征,支持数据的跨模态检索
更新于 2025-03-20