
汽车之家数据抓取高级工程师
社招全职2年以上技术地点:北京状态:招聘
任职要求
1. 熟练 Python,熟悉 MySQL、MongoDB、Redis、Kafka、Git,熟悉 Linux 环境 2. 掌握分布式、多线程,精通 Scrapy/Scrapy-Redis、Feapder 等至少一种爬虫框架 3. 熟练抓包工具、网页解析(正则、XPath),能处理结构化/非结构化数据 4. 精通 JS …
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1. 负责分布式爬虫系统开发、落地与迭代,保证数据采集稳定高效、全面及时 2. 优化爬虫策略、反爬对抗、调度机制、代理IP,提升抓取成功率与实时性 3. 监控爬虫运行、处理异常预警,维护系统稳定性与效率 4. 完成多平台数据爬取、内容解析、数据清洗与存储,优化数据平台 5. 参与爬虫核心算法、自动化平台设计与持续迭代
包括英文材料
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
MySQL+
https://juejin.cn/post/7190306988939542585
这是一篇 MySQL 通关一篇过硬核经验学习路线,包括数据库相关知识,SQL语句的使用,数据库约束,设计等。
[英文] MySQL Tutorial
https://www.mysqltutorial.org/
your go-to resource for mastering MySQL in a fast, easy, and enjoyable way.
https://www.youtube.com/watch?v=5OdVJbNCSso
MySQL SQL tutorial for beginners
https://www.youtube.com/watch?v=7S_tz1z_5bA
This beginner-friendly course teaches you SQL from scratch.
MongoDB+
https://learnxinyminutes.com/mongodb/
MongoDB is a NoSQL document database for high volume data storage.
https://studio3t.com/academy/#courses
The fastest way to learn MongoDB
https://www.youtube.com/watch?v=c2M-rlkkT5o
This video will give you and introduction to MongoDB in 1 Hour. Afterwards I recommend exploring aggregation, replication, and sharding.
https://www.youtube.com/watch?v=ExcRbA7fy_A&list=PL4cUxeGkcC9h77dJ-QJlwGlZlTd4ecZOA
You'll learn how to use MongoDB (a NoSQL database) from scratch. You'll also learn how to integrate it into a simple Node.js API.
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Git+
https://www.youtube.com/watch?v=rH3zE7VlIMs
Learn Git from start to finished in this full course written by ThePrimeagen.
还有更多 •••
相关职位
社招2年以上A09576
1、负责公司AI场景数据抓取解决方案架构设计和研发,负责海量的接口查询服务,海量的数据接收、存储和查询; 2、负责参与深度定制Headless引擎,完成核心模块研发工作。
更新于 2023-12-19深圳
社招2年以上A159321A
1、负责公司AI场景数据抓取解决方案架构设计和研发,负责海量的接口查询服务,海量的数据接收、存储和查询; 2、负责参与深度定制Headless引擎,完成核心模块研发工作。
更新于 2023-12-19杭州
社招5-16年SOFTWARE
1、负责WEB端/APP端 接口的协议逆向破解,解决网站/APP分析的加密、混淆等实际问题; 2、负责数据抓取的全面性、准确性、及时性建设,解决抓取过程中遇到的技术问题和挑战; 3、负责构建代理IP池、指纹浏览器池,设计资源调度管理爬虫资源;
更新于 2025-10-31深圳