
得物【算法工程-上海/北京】搜推工程技术专家-特征工程方向
社招全职技术类地点:上海 | 北京状态:招聘
任职要求
1、本科及以上学历,计算机及相关专业,具备良好的编码能力和扎实的技术功底,熟练掌握c++; 2、具备电商&社区等场景高TPS的特征抽取和特征存储的经验,有较强的稳定性意识,完善特征核心数据链路的高可用性、一致性等相关体系建设; 3、能熟练运用常见的离线大数据组件,包括但不限于Flink、Spark、MR等主流计算框架,以及kafka、HDFS、HBase等主流存储; 4、具有良好的沟通协作能力,具有较强的分享意愿,对搜推工程和算法有很好的理解能力; 5、有搜索、推荐、广告等业务经验者优先,有电商&社区等场景的特征工程或模型工程经验者优先。
工作职责
1、负责统一规划得物搜推主场景特征平台,管理在离线海量特征的生产、加工与存储; 2、搭建一站式的在离线特征链路,不断完善架构设计,优化系统的稳定性、性能、可扩展性; 3、实现模型训练预测时需要的用户&商品&内容特征的管理与分析。
包括英文材料
学历+
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
高可用+
https://redis.io/blog/high-availability-architecture/
A high available architecture is when there are a number of different components, modules, or services that work together to maintain optimal performance, irrespective of peak-time loads.
https://www.ibm.com/think/topics/high-availability
High availability (HA) is a term that refers to a system’s ability to be accessible and reliable close to 100% of the time.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
MapReduce+
https://www.youtube.com/watch?v=bcjSe0xCHbE
https://www.youtube.com/watch?v=cHGaQz0E7AU
In this video I explain the basics of Map Reduce model, an important concept for any software engineer to be aware of.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
特征工程+
https://www.ibm.com/think/topics/feature-engineering
Feature engineering preprocesses raw data into a machine-readable format. It optimizes ML model performance by transforming and selecting relevant features.
https://www.kaggle.com/learn/feature-engineering
Better features make better models. Discover how to get the most out of your data.
相关职位

社招技术类
1、负责研发得物搜推主场景样本数据平台,管理样本数据的拼接、加工与存储; 2、负责历史训练样本的管理、特征回补等功能;负责在线流式训练的实时样本系统的研发; 3、搭建一站式的在离线样本特征链路,不断完善架构设计,优化系统的稳定性、性能、可扩展性。
更新于 2024-09-19
社招7年以上核心本地商业-业
1、负责美团搜推平台中算法平台方向的全链路技术架构设计和演进工作 2、负责美团搜推平台中算法平台方向的团队管理工作 3、带领团队建设面向多场景、高性能、可拓展的算法平台在离线架构,支撑美团各场景的搜索推荐需求
更新于 2025-08-14
社招3-5年引擎
【业务介绍】 作为公司统一的模型引擎团队,支撑公司内所有搜推广类业务的工程侧工作,包括了模型推理、训练、参数服务器、特征工程等服务,通过引擎能力的持续建设结合多元异构算力为业务提供高效、灵活、稳定的模型服务。 【岗位职责】 1、负责小红书搜广推-机器学习训练框架的研究与开发,服务于全公司各个产品; 2、参与机器学习训练框架底层组件的抽象,设计,优化与落地; 3、与全公司算法部门深度合作,为重点项目进行算法与系统的联合优化。
更新于 2025-10-18