小米高级大数据研发工程师
社招全职A94598地点:北京状态:招聘
任职要求
1、熟悉大数据相关技术:Spark/Flink/Hadoop/HBase/Hive/Iceberg等; 2、熟练使用Java、Scala、Python语言中的一种或者多种,熟悉java常见数据结构、多线程并发、JVM等; 3、熟悉常用基础组件:Mq/R…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、负责小爱内容资源数据流及数据服务建设; 2、负责小爱内容数据、搜索推荐服务架构优化; 3、负责小爱用户画像、LLM通用知识库建设。
包括英文材料
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Iceberg+
https://iceberg.apache.org/spark-quickstart/
This guide will get you up and running with Apache Iceberg™ using Apache Spark™, including sample code to highlight some powerful features.
https://www.baeldung.com/apache-iceberg-intro
This tutorial will discuss Apache Iceberg, a popular open table format in today’s big data landscape.
https://www.youtube.com/watch?v=TsmhRZElPvM
You’ve probably heard about Apache Iceberg™—after all, it’s been getting a lot of buzz.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
还有更多 •••
相关职位
社招
1. 参与围绕泛搜推领域数据的数据工程研发体系建设,通过数据+算法分析+工程化能力,高效赋能业务; 2. 通过抽象业务域,简历规范化的数据解决方案,通过构建实时、离线数仓为业务提供稳定、安全的数据源; 3. 参与湖仓一体的下一代大数据解决方案建设,提升效率,更好的支持业务; 4. 参与到实验效果的分析中,提供更科学的实验指导及实验分析。
更新于 2025-04-17杭州
社招1-5年MEG
-负责公司大数据项目的研发工作,包括数据预处理、数据清洗、数据分析和数据可视化等环节 -参与项目的需求分析和设计,并确保项目按照预期完成 -负责项目中关键模块的编写和调试,确保模块的稳定性和可靠性 -协助项目经理完成项目计划,并确保项目按时完成 -参与项目成果的验收和评估,并对项目成果提出改进建议 -协助团队完成其他与大数据项目研发相关的工作
更新于 2024-10-16北京
社招3-5年技术类
1. 负责集团业务流量、客户等大数据数仓建设工作; 2. 对业务数据进行离线和实时计算,推动数仓体系建设,服务于公司业务,实现数据驱动业务增长; 3. 持续对大数据系统技术架构进行优化,提升系统的性能和用户体验。
更新于 2025-09-11北京
社招3年以上机器学习平台
【业务介绍】 我们是小红书内稠密类模型(LLM/MLLM/SD/CV/NLP)统一的AI平台QuickSilver,负责调度公司内所有稠密类模型训练与推理资源,基于自建的训推引擎,为公司所有AI算法同学迭代业务模型提供端到端一站式AI服务;包括数据管理,模型管理,模型训练、压缩、推理、部署,服务管理,资源调度等一系列能力。 工作职责: 1、负责稠密类模型训练推理开发平台的架构设计和核心功能研发 2、设计和实现大模型训练部署流程,包括模型fine-tuning、推理服务化等 3、构建云原生架构,设计高可用、高性能的微服务体系 4、优化平台性能,提升系统稳定性和可扩展性
北京|上海|深圳