网易易盾-爬虫开发工程师
社招全职网易数智地点:杭州状态:招聘
任职要求
1、熟悉垂直搜索爬虫、分布式网络爬虫; 2、JAVA/Python基础扎实,熟悉io、多线程、集合等基础框架,熟悉分布式、缓存、消息、搜索等机制优先; 3、熟悉主流爬取技术及爬虫框架工具,如Selenium/Puppeteer/Scrapy/PhantomJS等; 4、熟悉常见反爬封禁策略,并具备相关的实战经验; 5、对网络层协议及网络技术熟悉者优先考虑; 6、有客户端APP经验者优先,有相关安全领域经验者优先
工作职责
1、采集互联网公开的信息,满足各类业务数据需求; 2、负责分布式爬虫系统的建设,优化数据调度、抓取、解析、存储全栈流程; 3、帮助团队攻克各种爬虫技术难关,提升海量数据系统的抓取效果与性能。
包括英文材料
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
多线程+
https://liaoxuefeng.com/books/java/threading/basic/index.html
和单线程相比,多线程编程的特点在于:多线程经常需要读写共享数据,并且需要同步。
https://www.youtube.com/watch?v=_uQgGS_VIXM&list=PLsc-VaxfZl4do3Etp_xQ0aQBoC-x5BIgJ
https://www.youtube.com/watch?v=IEEhzQoKtQU
https://www.youtube.com/watch?v=mTGdtC9f4EU&list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4
https://www.youtube.com/watch?v=TPVH_coGAQs&list=PLk6CEY9XxSIAeK-EAh3hB4fgNvYkYmghp
https://www.youtube.com/watch?v=xPqnoB2hjjA
This video is an introduction to multithreading in modern C++.
https://www.youtube.com/watch?v=YKBwKy5PrpQ
Rust threading is easy to implement and improves the efficiency of your applications on multi-core systems!
缓存+
https://hackernoon.com/the-system-design-cheat-sheet-cache
The cache is a layer that stores a subset of data, typically the most frequently accessed or essential information, in a location quicker to access than its primary storage location.
https://www.youtube.com/watch?v=bP4BeUjNkXc
Caching strategies, Distributed Caching, Eviction Policies, Write-Through Cache and Least Recently Used (LRU) cache are all important terms when it comes to designing an efficient system with a caching layer.
https://www.youtube.com/watch?v=dGAgxozNWFE
Selenium+
https://www.youtube.com/watch?v=j7VZsCCnptM
Learn Selenium by building a web scraping bot in Python.
https://www.youtube.com/watch?v=mOAXEQevCAE&list=PLhW3qG5bs-L_s9HdC5zNshE5Ti8jABwlU
Puppeteer+
https://oxylabs.io/blog/puppeteer-tutorial
There are a few methods to accessing and parsing web pages, but in this tutorial we will be covering how to do it with Google Puppeteer.
[英文] Getting started
https://pptr.dev/guides/getting-started
You launch/connect a browser, create some pages, and then manipulate them with Puppeteer's API.
https://www.youtube.com/watch?v=nIJV-LbV_vM
This tutorial walks you through every thing you need to know about Puppeteer and headless browsers, so you can automate website testing, web scraping, fetching and downloading content, and more.
https://www.youtube.com/watch?v=Sag-Hz9jJNg
Learn puppeteer in less than one hour.
相关职位
社招5年以上网易数智
1、深入了解客户需求,设计并提供合适的产品或解决方案,协助销售团队完成售前技术支持工作。 2、参与客户拜访、技术交流和产品演示,解答客户技术问题,助力销售机会转化。 3、关注行业动态,收集市场信息,参与产品优化,协助市场推广。 4、协助项目前期规划,提供技术支持,确保项目顺利交付,提升客户满意度。
更新于 2025-06-06
社招网易数智
1、掌握内容安全相关舆情和法律法规、政策动向,主动识别对应安全风险; 2、梳理和优化内容安全的业务体系,结合业务为客户提供有效的运营解决方案,推动风险接入、风险识别等各场景的标准化运作; 3、负责现有客户的维护,日常策略运营,通过数据分析挖掘业务风险点,快速策略布控,并对治理效果进行持续的监控和反馈,以保障平台和业务的健康发展; 4、结合业务现状,调研和学习行业内风控相关经验,沉淀内容安全各类问题和应对方法,并协同产研资源推进落地,驱动风控业务流程和能力的完善和提升; 5、结合区域客户特性,沉淀客户运营方法论,并有效落地实施。
更新于 2025-06-06