携程爬虫开发工程师(MJ027783)
社招全职2年以上住宿业务开发地点:上海状态:招聘
任职要求
1.计算机相关专业,本科及以上学历,有2年以上爬虫相关经验。 2.掌握至少一种开发语言Java/Python,具备良好的编码能力、扎实的数据结构知识 。 3.熟悉TCP/IP,HTTP及相关网络协议。 4.了解各种Web前端技术,包括XHTML/XML/CSS/JavaScript/AJAX等。 5.有反爬相关问题处理经验,熟悉常用的爬虫技术及架构设计。 6.对分布式、多线程、缓存、消息队列等常用互联网技术有一定了解。 7.热爱技术开发,善于学习,善于团队协作,能积极主动地参与公司产品研发等相关工作。 8.熟悉移动端APP安全及逆向技术、图像识别技术、WebKit或其他浏览器引擎、JavaScript逆向技术,有相关经验者优先考虑。
工作职责
1.负责分布式网络爬虫系统的设计与开发工作。 2.对多平台数据源(WEB/APP/H5/小程序等)进行数据采集及分析。 3.通过逆向、图像识别、行为分析等技术提升爬虫核心技术突破。 4.设计数据采集策略,提升数据采集效率及质量。
包括英文材料
学历+
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
TCP/IP+
[英文] What is TCP/IP?
https://www.techtarget.com/searchnetworking/definition/TCP-IP
TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to interconnect network devices on the internet.
HTTP+
https://developer.mozilla.org/zh-CN/docs/Web/HTTP
超文本传输协议(HTTP)是一个用于传输超媒体文档(例如 HTML)的应用层协议。它是为 Web 浏览器与 Web 服务器之间的通信而设计的,但也可以用于其他目的。
Web+
https://web.dev/learn
Explore our growing collection of courses on key web design and development subjects.
XML+
https://developer.mozilla.org/zh-CN/docs/Web/XML/Guides/XML_introduction
XML(Extensible Markup Language)是一种类似于 HTML,但是没有使用预定义标记的语言。
CSS+
JavaScript+
https://developer.mozilla.org/zh-CN/docs/Learn_web_development/Core/Scripting
[英文] Learn JavaScript
https://learnjavascript.online/
The easiest way to learn & practice modern JavaScript
[英文] Learn JavaScript
https://web.dev/learn/javascript
https://www.youtube.com/watch?v=zuKbR4Q428o
Write bulletproof JavaScript code with unit testing!
AJAX+
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
多线程+
https://liaoxuefeng.com/books/java/threading/basic/index.html
和单线程相比,多线程编程的特点在于:多线程经常需要读写共享数据,并且需要同步。
https://www.youtube.com/watch?v=_uQgGS_VIXM&list=PLsc-VaxfZl4do3Etp_xQ0aQBoC-x5BIgJ
https://www.youtube.com/watch?v=IEEhzQoKtQU
https://www.youtube.com/watch?v=mTGdtC9f4EU&list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4
https://www.youtube.com/watch?v=TPVH_coGAQs&list=PLk6CEY9XxSIAeK-EAh3hB4fgNvYkYmghp
https://www.youtube.com/watch?v=xPqnoB2hjjA
This video is an introduction to multithreading in modern C++.
https://www.youtube.com/watch?v=YKBwKy5PrpQ
Rust threading is easy to implement and improves the efficiency of your applications on multi-core systems!
缓存+
https://hackernoon.com/the-system-design-cheat-sheet-cache
The cache is a layer that stores a subset of data, typically the most frequently accessed or essential information, in a location quicker to access than its primary storage location.
https://www.youtube.com/watch?v=bP4BeUjNkXc
Caching strategies, Distributed Caching, Eviction Policies, Write-Through Cache and Least Recently Used (LRU) cache are all important terms when it comes to designing an efficient system with a caching layer.
https://www.youtube.com/watch?v=dGAgxozNWFE
消息队列+
https://www.youtube.com/watch?v=xErwDaOc-Gs
相关职位
社招3-5年D5201
1、参与海量数据的价值挖掘和工程体系攻防技术突破等能力建设,以强大技术驱动力让商家、消费者通过平台撮合交易; 2、负责快手直播电商促销活动海量商品建设,打造完备的数据运营支持体系,探索新场景下的新玩法,服务上亿用户; 3、研究直播电商场景下全新的用户消费习惯、全新的选品策略、全新的商业化思路,以及全新的大数据、人工智能、工程技术应用场景,探索新消费习惯场下隐含的全新技术挑战。
更新于 2025-08-07
校招
1. 负责根据业务需求和规划开展爬虫开发工作,包括数据解析与清洗,数据链路优化等,同时对现有爬虫进行维护和完善; 2. 参与爬虫核心技术研究,维护和升级现有技术体系,快速定位并修复现有软件缺陷,对线上问题进行及时响应并解决 3. 参与爬虫监测体系建设,及时监控及解决运行过程中出现的问题,确保数据的稳定性和准确性; 4. 参与公司内部爬虫平台的架构设计与开发,并结合业务场景及NLP等技术,实现产品化。
更新于 2025-07-18
社招3年以上内容-技术类
1.参与分布式爬虫系统的设计与开发; 2.负责平台外部数据覆盖,并解决高并发爬取、海量存储等问题; 3.负责监控框架的迭代和改进,维护所需资源池,并探索最新的技术能力; 4.负责部分硬件SDK开发(安卓或C++方向)。
更新于 2025-09-02