
搜狐安全算法工程师
社招全职2年以上社交产品中心地点:北京状态:招聘
任职要求
1.全日制统招硕士及以上学历,计算机/数学/AI相关专业,2年以上算法研发经验,博士学历或具有安全领域经验者优先; 2.精通Python/C++,具备高性能代码开发与调优能力; 3.深入掌握机器学习基础理论(LR/SVM/GBDT等),熟悉深度学习模型(CNN/RNN/GNN)实现原理与优化策略; 4.具备以下至少一项实战经验:NLP-文本分类/实体识别/语义匹配、CV-图像识别/多模态融合、图计算-社区发现/链路预测; 5.熟练运用Hadoop/Spark/Flink处理大规模数据,具备特征工程与分布式训练经验。 加分项: 1.主导过亿级数据量的风控系统建设,具备高并发场景优化经验; 2.发表过AI/安全相关顶会论文(KDD/CVPR/ACL等)或曾获得算法类竞赛奖项; 3.熟悉联邦学习/隐私计算等安全计算技术。
工作职责
1.负责智能风控模型研发: a).基于海量业务数据,应用机器学习/深度学习技术(如XGBoost、Transformer、GNN等),设计高精度风控模型,覆盖欺 诈检测、异常行为识别等场景; b).结合专家经验与数据挖掘,提炼强判别性特征,推动模型在实时对抗场景中的高效落地; 2.负责多模态风险识别引擎开发: a).融合文本、语音、图像等多模态数据,构建端到端风险判定模型(如CV目标检测、ASR语音分析、NLP语义理解); b).探索图神经网络技术在复杂关系网络(如黑产团伙挖掘)中的深度应用,提升风控系统鲁棒性; 3.负责算法效果优化与技术创新: a).主导NLP/CV领域的前沿算法(如大模型Prompt工程、RAG增强检索)在风控场景的适配与效果提升; b).持续迭代模型架构,优化推理效率与资源消耗,应对亿级数据实时处理需求。
包括英文材料
学历+
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
GBDT+
https://developers.google.com/machine-learning/decision-forests/intro-to-gbdt
Like bagging and boosting, gradient boosting is a methodology applied on top of another machine learning algorithm.
https://scikit-learn.org/stable/modules/ensemble.html
Ensemble methods combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
CNN+
https://learnopencv.com/understanding-convolutional-neural-networks-cnn/
Convolutional Neural Network (CNN) forms the basis of computer vision and image processing.
[英文] CNN Explainer
https://poloclub.github.io/cnn-explainer/
Learn Convolutional Neural Network (CNN) in your browser!
https://www.deeplearningbook.org/contents/convnets.html
Convolutional networks(LeCun, 1989), also known as convolutional neuralnetworks, or CNNs, are a specialized kind of neural network for processing data.
https://www.youtube.com/watch?v=2xqkSUhmmXU
MIT Introduction to Deep Learning 6.S191: Lecture 3 Convolutional Neural Networks for Computer Vision
RNN+
https://d2l.ai/chapter_recurrent-neural-networks/rnn.html
A neural network that uses recurrent computation for hidden states is called a recurrent neural network (RNN).
https://www.deeplearningbook.org/contents/rnn.html
Recurrent neural networks, or RNNs (Rumelhart et al., 1986a), are a family of neural networks for processing sequential data.
https://www.ibm.com/think/topics/recurrent-neural-networks
A recurrent neural network or RNN is a deep neural network trained on sequential or time series data to create a machine learning (ML) model that can make sequential predictions or conclusions based on sequential inputs.
GNN+
https://distill.pub/2021/gnn-intro/
Neural networks have been adapted to leverage the structure and properties of graphs.
https://gnn.seas.upenn.edu/
Graph Neural Networks (GNNs) are information processing architectures for signals supported on graphs.
https://www.ibm.com/think/topics/graph-neural-network
Graph neural networks (GNNs) are a deep neural network architecture that is popular both in practical applications and cutting-edge machine learning research.
NLP+
https://www.youtube.com/watch?v=fNxaJsNG3-s&list=PLQY2H8rRoyvzDbLUZkbudP-MFQZwNmU4S
Welcome to Zero to Hero for Natural Language Processing using TensorFlow!
https://www.youtube.com/watch?v=R-AG4-qZs1A&list=PLeo1K3hjS3uuvuAXhYjV2lMEShq2UYSwX
Natural Language Processing tutorial for beginners series in Python.
https://www.youtube.com/watch?v=rmVRLeJRkl4&list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4
The foundations of the effective modern methods for deep learning applied to NLP.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
特征工程+
https://www.ibm.com/think/topics/feature-engineering
Feature engineering preprocesses raw data into a machine-readable format. It optimizes ML model performance by transforming and selecting relevant features.
https://www.kaggle.com/learn/feature-engineering
Better features make better models. Discover how to get the most out of your data.
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
CVPR+
https://cvpr.thecvf.com/
相关职位
社招3年以上元宝技术
1.负责分析大语言模型产品的安全风险,探索模型脆弱性并针对具体产品功能提出防御方案; 2.研究和应用NLP、机器学习等技术,采用不同安全识别算法解决不同场景的业务风险,降低有害内容生成概率; 3.跟踪安全大模型领域的前沿技术与行业动态,持续创新安全应用解决方案,保持产品的安全竞争力。
更新于 2025-08-02
社招2年以上元宝技术
1.负责分析大语言模型产品的安全风险,探索模型脆弱性并针对具体产品功能提出防御方案; 2.研究和应用NLP、机器学习等技术,采用不同安全识别算法解决不同场景的业务风险,降低有害内容生成概率; 3.跟踪安全大模型领域的前沿技术与行业动态,持续创新安全应用解决方案,保持产品的安全竞争力。
更新于 2025-09-03
校招AI/算法类
方向一:负责大模型测评工作,包括测试集构建、自动化评测等; 方向二:负责 AI 防火墙的指令攻击识别和内容安全算法设计和代码开发; 方向三:负责互联网业务风控领域的数据挖掘、风控策略挖掘、算法建模工作; 方向四:负责将大数据与算法应用于恶意APP、URL识别和检测等安全策略算法领域的工作。
更新于 2025-07-14