
搜狐数据开发工程师
社招全职智能平台地点:北京状态:招聘
任职要求
1. 熟练使用python、sql、java等语言,工程落地能力强; 2. 有良好的业务sense,熟悉数据分析、数据挖掘,能够自主挖掘有效的反作弊规则; 3. 熟悉反作弊相关机器学习算法,有实际业务落地经验者优先; 4. 熟悉大数据工具Hadoop、Spark、kafka、flink等的使用; 5. 具备良好的理解能力和沟通能力,数据敏感度高,善于将业务问题转化为技术问题。
工作职责
1.负责广告流量反作弊系统的规划&建设,包括但不限于相关数据收集、风控规则&算法挖掘与落地、风险等级建设与报警等; 2.基于业务需求以及日常问题识别,制定业务场景反作弊策略和解决方案,确保各业务场景反作弊效果; 3.探索反作弊场景的算法模式,例如半监督/无监督/自监督/小样本学习/强化学习/对比学习等,并且将之应用到反作弊业务场景中; 4.在流量质量评估的基础上,反哺广告投放、用户增长等业务场景;
包括英文材料
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
数据挖掘+
https://www.youtube.com/watch?v=-bSkREem8dM
Database vs Data Warehouse vs Data Lake
https://www.youtube.com/watch?v=7rs0i-9nOjo
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
相关职位
社招数据开发岗
1.负责按照业务需求建立并完善风控所需要的风控集市 ,参与模型结构设计、模型mapping开发、特征开发等工作; 2.负责自有数据、三方数据进行分层管理和加工,通过合理的数据抽象和建模,沉淀可复用的数据资产; 3.参与数据治理、数据质量、数据服务及数据产品等基础数据平台和设施建设。
更新于 2025-06-16
社招3年以上数据开发岗
1.参与京东外卖&秒送PB级数据仓库的建设,为各业务方提供完整、高效的数据支撑; 2.基于简单、易用、高效、可靠等原则建设离线数据仓库,支撑上层数据产品和分析师; 3.构建实时数据仓库,满足实时业务场景; 4.深入参与数据产品建设,为公司内外提供完善的数据解决方案; 5.满足公司各部门日常的数据需求。
更新于 2025-06-15
社招数据开发岗
1.深入理解电商平台业务,围绕场景构建分析模型,挖掘潜在问题和增长机会,助力业务发展; 2.完成平台业务的数据架构设计及实时和离线的数据开发工作; 3.对未来数据流架构和研发流程进行设计和落地,持续提升稳定性和研发效能。
更新于 2025-06-15