顺丰大数据挖掘与分析工程师
社招全职3-5年地点:深圳状态:招聘
任职要求
1.硕士及以上学历,有3年以上数据挖掘项目/算法型落地项目的工作经验者优先; 2.熟悉hive、python、spark的使用和底层原理; 3.熟练使用Flink优先; 4.精通数据分析、数据挖掘相关的技术,如聚类、回归、卡方检验等,能够针对具体业务有效分析和建模; 5.熟悉常用的机器学习、深度学习模型技术的使用,如prophet、lightGBM、RNN、GCN、Transformer等 6.熟悉目前主流大模型的使用优先 7.有全局意识,可以端到端看独立设计解决方案,较强的逻辑分析思维和数据敏感度,主动性好、善于发现问题,洞察力强,且沟通能力良好,工作务实,承压能力强;
工作职责
1.参与顺丰多业务项目下的预测模型的数据分析与模型搭建工作; 2. 熟悉营运各环节(收、转、运、派)的业务数据,对于预测的细化场景可以识别关键问题、分析出有效结论,并且设计解决方案,反哺模型的优化,实现端到端统一建设; 3. 负责构建机器学习、深度学习等模型,并实现生产模型服务的搭建与运维; 4. 构建算法模型效果评价体系,并搭建看板进行效果监控和呈现,助力算法模型持续优化;
包括英文材料
学历+
数据挖掘+
https://www.youtube.com/watch?v=-bSkREem8dM
Database vs Data Warehouse vs Data Lake
https://www.youtube.com/watch?v=7rs0i-9nOjo
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
LightGBM+
https://lightgbm.readthedocs.io/en/stable/
LightGBM is a gradient boosting framework that uses tree based learning algorithms.
https://www.youtube.com/watch?v=tSZxOd1TWZc
In this video, we explore LightGBM, a machine learning algorithm developed by Microsoft that offers superior speed, efficiency, and accuracy.
RNN+
https://d2l.ai/chapter_recurrent-neural-networks/rnn.html
A neural network that uses recurrent computation for hidden states is called a recurrent neural network (RNN).
https://www.deeplearningbook.org/contents/rnn.html
Recurrent neural networks, or RNNs (Rumelhart et al., 1986a), are a family of neural networks for processing sequential data.
https://www.ibm.com/think/topics/recurrent-neural-networks
A recurrent neural network or RNN is a deep neural network trained on sequential or time series data to create a machine learning (ML) model that can make sequential predictions or conclusions based on sequential inputs.
Transformer+
https://huggingface.co/learn/llm-course/en/chapter1/4
Breaking down how Large Language Models work, visualizing how data flows through.
https://poloclub.github.io/transformer-explainer/
An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.
https://www.youtube.com/watch?v=wjZofJX0v4M
Breaking down how Large Language Models work, visualizing how data flows through.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
相关职位
社招3-5年
1. 数据挖掘与分析:负责物流风控相关数据的收集、清洗、分析和挖掘,构建用户风险画像、风险评估模型等,识别潜在风险。 2. 算法开发与优化:设计并实现风控算法,如信用评分、反欺诈、异常检测等,持续优化算法性能。 3. 风控策略制定:基于数据分析结果,协助制定和优化风控策略,提升风控效率和准确性。 4. 跨部门协作:与产品、运营、技术等部门合作,推动风控策略落地。 5. 行业研究:跟踪物流风控领域的最新技术和趋势,能基于物流场景进行算法的优化与改造,沉淀算法能力,提出创新解决方案。
更新于 2025-07-28
社招3-5年
1、参与客户数据底盘建设,挖掘客户行为数据,设计构建全生命周期客户画像。 2、负责关键业务指标分析监控,独立开发数据报表/看板,洞察数据趋势异常,输出数据洞察报告赋能决策。 3、运用数据分析与机器学习算法(聚类/预测等),深度参与营销策略制定、效果评估与迭代优化,提升营销转化率及收入达成。 4、探索基于大模型的智能营销应用,协同开发团队支持底层数据链路构建。
更新于 2025-07-17
社招3-5年
1、参与客户数据底盘建设,挖掘客户行为数据,设计构建全生命周期客户画像。 2、负责关键业务指标分析监控,独立开发数据报表/看板,洞察数据趋势异常,输出数据洞察报告赋能决策。 3、运用数据分析与机器学习算法(聚类/预测等),深度参与营销策略制定、效果评估与迭代优化,提升营销转化率及收入达成。 4、探索基于大模型的智能营销应用,协同开发团队支持底层数据链路构建。
更新于 2025-10-10