滴滴资深爬虫开发工程师(J250918019)
社招全职3年以上技术地点:北京状态:招聘
任职要求
1、本科及以上学历,三年以上后端开发经验,能力突出者可放宽要求; 2、精通java和python,掌握异步编程、多线程/多进程同步等技术; 3、熟悉web,app抓取的技术原理,熟悉常用的爬虫框架和组件; 4、熟悉反爬技术,对设备指纹、验证码识别、js混淆等有一定的了解; 5、熟悉常用的数据库:Mysql、Redis、ES、Kafka等,有大数据开发经验优先; 加分项 1、有前端开发经验者优先; 2、有通用爬虫设计经验者优先; 3、有手机群控后端开发经验者优先;
工作职责
1、负责分布式爬虫系统架构的设计和开发; 2、负责对采集数据进行清洗、去重、结构化处理; 3、负责对反爬机制的研究分析,提升采集数据的成功率; 4、负责对系统的性能监控和调优,确保系统的高可用性和稳定性;
包括英文材料
学历+
后端开发+
https://www.youtube.com/watch?v=tN6oJu2DqCM&list=PLWKjhJtqVAbn21gs5UnLhCQ82f923WCgM
Learn what technologies you should learn first to become a back end web developer.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
多线程+
https://liaoxuefeng.com/books/java/threading/basic/index.html
和单线程相比,多线程编程的特点在于:多线程经常需要读写共享数据,并且需要同步。
https://www.youtube.com/watch?v=_uQgGS_VIXM&list=PLsc-VaxfZl4do3Etp_xQ0aQBoC-x5BIgJ
https://www.youtube.com/watch?v=IEEhzQoKtQU
https://www.youtube.com/watch?v=mTGdtC9f4EU&list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4
https://www.youtube.com/watch?v=TPVH_coGAQs&list=PLk6CEY9XxSIAeK-EAh3hB4fgNvYkYmghp
https://www.youtube.com/watch?v=xPqnoB2hjjA
This video is an introduction to multithreading in modern C++.
https://www.youtube.com/watch?v=YKBwKy5PrpQ
Rust threading is easy to implement and improves the efficiency of your applications on multi-core systems!
Web+
https://web.dev/learn
Explore our growing collection of courses on key web design and development subjects.
JavaScript+
https://developer.mozilla.org/zh-CN/docs/Learn_web_development/Core/Scripting
[英文] Learn JavaScript
https://learnjavascript.online/
The easiest way to learn & practice modern JavaScript
[英文] Learn JavaScript
https://web.dev/learn/javascript
https://www.youtube.com/watch?v=zuKbR4Q428o
Write bulletproof JavaScript code with unit testing!
MySQL+
https://juejin.cn/post/7190306988939542585
这是一篇 MySQL 通关一篇过硬核经验学习路线,包括数据库相关知识,SQL语句的使用,数据库约束,设计等。
[英文] MySQL Tutorial
https://www.mysqltutorial.org/
your go-to resource for mastering MySQL in a fast, easy, and enjoyable way.
https://www.youtube.com/watch?v=5OdVJbNCSso
MySQL SQL tutorial for beginners
https://www.youtube.com/watch?v=7S_tz1z_5bA
This beginner-friendly course teaches you SQL from scratch.
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
前端开发+
https://roadmap.sh/frontend
Step by step guide to becoming a modern frontend developer
相关职位
社招5年以上网易职能
1. 设计、开发、维护、重构分布式网络爬虫,从各种网站、APP中抓取并清洗结构化数据。 2. 负责持续运营和更新爬虫程序,识别和解决数据源变化和其他问题。 3. 爬虫性能优化,包括处理性能、爬取策略、占用带宽、反爬虫机制等方面。 4. 能够统计和分析爬虫数据,与其他团队合作,如数据工程师、数据分析师协作,以确保数据采集系统的有效性和可靠性。
更新于 2025-06-20
社招2年以上住宿业务开发
1.负责分布式网络爬虫系统的设计与开发工作。2.对多平台数据源(WEB/APP/H5/小程序等)进行数据采集及分析。3.通过逆向、图像识别、行为分析等技术提升爬虫核心技术突破。4.设计数据采集策略,提升数据采集效率及质量。
更新于 2025-03-21