京东电商数据开发工程师
社招全职3-5年数据开发岗地点:北京状态:招聘
任职要求
1.3-5年爬虫开发经验,熟悉电商平台数据抓取场景,有大规模(百万级/日)爬虫系统实战经验; 2.精通Python(Scrapy、Requests、BeautifulSoup等),熟悉异步框架(如aiohttp、Celery)、数据库(MySQL/MongoDB/Redis)及消息队列(Kafka/RabbitMQ); 3.深入理解反爬技术(User-Agent轮换、代理IP池、Selenium/Puppeteer模拟等),能独立突破常见反爬限制; 4.熟悉分布式爬虫(Scrapy-Redis、Splash)、数据去重及增量抓取方案,具备性能调优经验; 5.了解HTTP/HTTPS协议、Web安全机制(如Token加密、OAuth),能逆向分析Ajax接口及动态渲染页面; 6.加分项:有机器学习辅助解析(OCR识别、NLP处理商品描述)经验;熟悉Kubernetes/Docker部署。 符合京东价值观:客户为先、创新、拼搏、担当、感恩、诚信。
工作职责
1.负责海外电商平台的数据爬取、清洗与结构化存储,支持价格监控、竞品分析、商品推荐等业务需求; 2.解决反爬机制(如验证码、IP封禁、动态加密等),确保数据抓取的稳定性与效率; 3.参与爬虫框架的选型与开发,维护现有爬虫系统,提升代码可扩展性和健壮性; 4.分析目标网站结构及数据接口,动态调整爬取策略,应对网站改版或反爬策略升级; 5.与数据团队协作,确保数据质量及实时性,提供自动化数据监控与报警机制。
包括英文材料
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
MySQL+
https://juejin.cn/post/7190306988939542585
这是一篇 MySQL 通关一篇过硬核经验学习路线,包括数据库相关知识,SQL语句的使用,数据库约束,设计等。
[英文] MySQL Tutorial
https://www.mysqltutorial.org/
your go-to resource for mastering MySQL in a fast, easy, and enjoyable way.
https://www.youtube.com/watch?v=5OdVJbNCSso
MySQL SQL tutorial for beginners
https://www.youtube.com/watch?v=7S_tz1z_5bA
This beginner-friendly course teaches you SQL from scratch.
MongoDB+
https://learnxinyminutes.com/mongodb/
MongoDB is a NoSQL document database for high volume data storage.
https://studio3t.com/academy/#courses
The fastest way to learn MongoDB
https://www.youtube.com/watch?v=c2M-rlkkT5o
This video will give you and introduction to MongoDB in 1 Hour. Afterwards I recommend exploring aggregation, replication, and sharding.
https://www.youtube.com/watch?v=ExcRbA7fy_A&list=PL4cUxeGkcC9h77dJ-QJlwGlZlTd4ecZOA
You'll learn how to use MongoDB (a NoSQL database) from scratch. You'll also learn how to integrate it into a simple Node.js API.
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
消息队列+
https://www.youtube.com/watch?v=xErwDaOc-Gs
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
RabbitMQ+
[英文] RabbitMQ Tutorials
https://www.rabbitmq.com/tutorials
These tutorials cover the basics of creating messaging applications using RabbitMQ.
https://www.youtube.com/watch?v=bfVddTJNiAw
RabbitMQ is a powerful message broker that can help you create resilient and scalable applications.
AI agent+
https://www.ibm.com/think/ai-agents
Your one-stop resource for gaining in-depth knowledge and hands-on applications of AI agents.
Selenium+
https://www.youtube.com/watch?v=j7VZsCCnptM
Learn Selenium by building a web scraping bot in Python.
https://www.youtube.com/watch?v=mOAXEQevCAE&list=PLhW3qG5bs-L_s9HdC5zNshE5Ti8jABwlU
Puppeteer+
https://oxylabs.io/blog/puppeteer-tutorial
There are a few methods to accessing and parsing web pages, but in this tutorial we will be covering how to do it with Google Puppeteer.
[英文] Getting started
https://pptr.dev/guides/getting-started
You launch/connect a browser, create some pages, and then manipulate them with Puppeteer's API.
https://www.youtube.com/watch?v=nIJV-LbV_vM
This tutorial walks you through every thing you need to know about Puppeteer and headless browsers, so you can automate website testing, web scraping, fetching and downloading content, and more.
https://www.youtube.com/watch?v=Sag-Hz9jJNg
Learn puppeteer in less than one hour.
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
HTTP+
https://developer.mozilla.org/zh-CN/docs/Web/HTTP
超文本传输协议(HTTP)是一个用于传输超媒体文档(例如 HTML)的应用层协议。它是为 Web 浏览器与 Web 服务器之间的通信而设计的,但也可以用于其他目的。
Web+
https://web.dev/learn
Explore our growing collection of courses on key web design and development subjects.
OAuth+
[英文] Getting Started
https://oauth.net/getting-started/
Below are some guides to OAuth 2.0 which cover many of the topics needed to understand and implement clients and servers.
https://www.digitalocean.com/community/tutorials/an-introduction-to-oauth-2
OAuth 2 is an authorization framework that enables applications — such as Facebook, GitHub, and DigitalOcean — to obtain limited access to user accounts on an HTTP service.
https://www.youtube.com/watch?v=ZDuRmhLSLOY
Welcome to the ultimate guide on OAuth 2.0!
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
OCR+
https://www.ibm.com/think/topics/optical-character-recognition
Optical character recognition (OCR) is a technology that uses automated data extraction to quickly convert images of text into a machine-readable format.
https://www.youtube.com/watch?v=or8AcS6y1xg
Optical character recognition (OCR) is sometimes referred to as text recognition.
NLP+
https://www.youtube.com/watch?v=fNxaJsNG3-s&list=PLQY2H8rRoyvzDbLUZkbudP-MFQZwNmU4S
Welcome to Zero to Hero for Natural Language Processing using TensorFlow!
https://www.youtube.com/watch?v=R-AG4-qZs1A&list=PLeo1K3hjS3uuvuAXhYjV2lMEShq2UYSwX
Natural Language Processing tutorial for beginners series in Python.
https://www.youtube.com/watch?v=rmVRLeJRkl4&list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4
The foundations of the effective modern methods for deep learning applied to NLP.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Docker+
https://www.youtube.com/watch?v=GFgJkfScVNU
Master Docker in one course; learn about images and containers on Docker Hub, running multiple containers with Docker Compose, automating workflows with Docker Compose Watch, and much more. 🐳
https://www.youtube.com/watch?v=kTp5xUtcalw
Learn how to use Docker and Kubernetes in this complete hand-on course for beginners.
相关职位
社招数据开发岗
1.负责京东跨境电商数据BP工作,包括数据资产建设、数据应用建设、通过数据帮助业务增长; 2.完成业务的数据架构设计及实时离线数据开发工作; 3.通过中台Paas化工具,完成数据指标的建设和数据看板的建设; 4.基于业务和用户视角进行数据分析,输出分析结论; 5.打造跨境电商全主题域、全场景数据资产,全流程数智化建设方案。
更新于 2025-06-08
社招3年以上信息技术类
1、分析业务需求,建设数据库仓库,对业务部门提供数据支持; 2、参与数据源分析,完成大数据平台与各业务系统的数据对接; 3、完成基于大数据技术平台基础上的数据仓库设计和ETL开发; 4、调研相关技术,优化大数据开发流程,规划大数据平台应用。
更新于 2025-04-16
社招3年以上
1..负责淘宝商品库基础数据相关的实时、离线数据仓库设计、开发、性能优化 以及 相关业务指标的开发;参与淘宝商品基础数据架构、技术体系、数据模型的规划建设,包括数据采集、数据治理、数据质量及稳定性保障体系、数据处理智能化和自动化体系的建设; 2.负责商品、用户等维度数据的挖掘和数据资产沉淀,为业务的交互式即席分析、AB实验效果分析提供统一、可靠、高效的实时+离线的数据服务和丰富的维度下钻支持。 3.能够针对业务场景探索提供大数据解决方案,并实现数据产品化。
更新于 2025-08-20