小红书数据工程研发工程师-商业技术
社招全职2年以上数据引擎地点:北京 | 上海状态:招聘
任职要求
1、计算机相关专业,本科以上学历,两年以上工作经验; 2、熟练掌握Java/Scala/C++中的一种或多种;熟悉常见的数据结构和算法,具备扎实的编程功底; 3、熟悉Flink,Kafka,Spark,Hive,Hbase等相关技术并有相关开发经验;有从事分布式数据存储与计算平台应用开发经验者优先 4、责任心强,积极主动,有良好的沟通能力和团队合作能力 5、对数据敏感,具备优秀的逻辑思维,对解决挑战性问题充满热情,善于解决问题和分析问题
工作职责
1、打造业界领先的广告归因&实时数据流系统,为机制策略、模型训练、客户报表等提供高可靠的实时数据 2、负责广告实时计费系统研发,保障系统稳定性和准确性 3、负责模型样本&特征数据链路开发,参与特征平台研发,为广告算法模型提供高效稳定的学习能力
包括英文材料
学历+
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
相关职位
社招2年以上D6323
1、打造业界领先的广告算法模型数据平台,包括样本拼接、特征构建、实验框架等,为快手的广告算法模型提供高效稳定的学习能力; 2、负责广告系统核心数据流架构的设计与开发,支撑广告百万级QPS,为机器学习系统、客户报表及内部分析系统提供高可靠的实时数据; 3、负责研发广告转化归因(Attribution)系统,通过归因策略提升快手广告算法的效率; 4、通过数据和算法技术,为快手的广告客户打造智能投放工具,包括投放模拟、账户诊断、出价建议等,提升广告主的投放效率。
更新于 2025-04-03
社招A180560A
1、负责大模型数据智能平台的核心研发,构建多模态标注、合成数据生成和高质量数据蒸馏系统,同时深入研究大模型工作原理并探索创新应用场景; 2、主导大模型应用的构建全过程,从需求分析、模型设计到开发实现与优化,确保数据平台的高效性与准确性,持续调优模型性能; 3、通过数据分析与算法改进,优化大模型数据供应链(标注、合成、蒸馏),提升模型效果和用户体验,并与跨部门团队协作推动项目落地; 4、关注行业前沿技术,引入创新算法理念,为数据平台的技术发展提供前瞻性建议,特别是在代码生成、模型对齐与持续学习等领域的应用突破。
更新于 2025-04-16
社招3-5年D11430
1、负责数据中台-商业化各个业务线数据仓库建设,构建商业化垂直数据集市; 2、定义并开发业务核心指标数据,负责垂直业务数据建模,如用户画像; 3、根据具体问题,设计并实现合适的可视化展示,构建数据持续观测平台; 4、参与数据平台的搭建,优化数据处理流程具体工作; 5、数据收集,反作弊数据仓库,用户数据仓库,UGC数据仓库,审核数据仓库的研发; 6、A/B测试实时ETL研发,转化漏斗分析平台研发。
更新于 2025-10-11