快手数据研发实习生
实习兼职D0599地点:北京状态:招聘
任职要求
1、本科及以上学历,计算机、数学、统计、数据科学、大数据技术等相关专业; 2、有较强的动手能力和学习能力,熟悉一门数据处理语言,如SQL、Java、Python等; 3、有较强的逻辑思维、解决问题的能力,有较强的沟通和团队协作能力; 4、了解 Hadoop 生态系统、大型数据仓库架构、模型设计、ETL等,掌握Hive、Spark,额外掌握Flink、kafka、redis、clickhouse等的同学优先。
工作职责
1、参与快手大数据体系的设计与建设,通过数据仓库、元数据、数据管理等体系,管理和建设几千P的数据; 2、参与各类业务数据专题体系(用增、消费、直播、内容生产/社交、电商、商业化、本地生活、游戏、海外、大模型业务等)的建设,通过对数据的建设和应用理解,支持各类的业务管理决策和业务运营; 3、参与快手业务数据产品建设和应用,包括洞察分析、归因分析、ABTest、数据资产应用等数据能力,结合自己的商业sense,发掘数据的业务价值; 4、有很好的团队氛围,你会获得数据领域的各类大牛的指导,徜徉在世界领先的大数据处理和应用技术的海洋中; 5、表现优秀者可以获得转正机会。
包括英文材料
学历+
数据科学+
https://roadmap.sh/ai-data-scientist
Step by step roadmap guide to becoming an AI and Data Scientist
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
相关职位
实习D7507
1、协助建设面向快手全公司的研发效能数据仓库,覆盖需求全生命周期的数据,包括需求、开发、测试、发布、运维、服务等环节; 2、协助建设公司级研发效能指标体系,数据驱动主站、电商等核心业务的研发效能改善。
更新于 2025-02-17
实习淘天集团日常实习
1、负责天猫自营业务中供应链领域数据,包括数据资产的生产、管理和应用,建立资产标准和价值评估体系,提供丰富、稳定的基础数据服务。 2、建设供应链核心数据产品,抽象通用业务逻辑,沉淀可复用的数据洞察能力,通过模版化和组件化提升数据架构扩展性,从而支持数据产品的快速迭代和横向扩展。 3、负责供应链业务应用,紧贴业务场景通过数据的方式驱动供应链降本增效。
更新于 2025-05-13
实习D6213
1、参与快手大数据体系的设计与建设,通过数据仓库、元数据、数据管理等体系,管理和建设几千P的数据; 2、参与各类业务数据专题体系(用增、消费、直播、内容生产/社交、电商、商业化、本地生活、游戏、海外、大模型业务等)的建设,通过对数据的建设和应用理解,支持各类的业务管理决策和业务运营; 3、参与快手业务数据产品建设和应用,包括洞察分析、归因分析、ABTest、数据资产应用等数据能力,结合自己的商业sense,发掘数据的业务价值; 4、有很好的团队氛围,你会获得数据领域的各类大牛的指导,徜徉在世界领先的大数据处理和应用技术的海洋中。
更新于 2025-03-14