京东数据开发工程师(业务应用方向)
社招全职软件开发岗地点:北京状态:招聘
任职要求
1.对大规模数据处理、分布式存储计算、数据建模有浓厚兴趣,能够主动跟踪学习前沿技术; 2.深入理解常用的数据建模理论,有数据仓库模型建设经验优先; 3.有Hive Spark Flink Clickhouse Presto ES 开发经验者优先; 4.能够使用SQL操作复杂的数据模型并有SQL优化的经验,熟悉数据仓库的ETL的开发和数据建模; 5.对数据敏感,有严谨的工作思路,较强的逻辑思维能力,良好的跨团队协作沟通合作能力。 符合京东价值观:客户为先、创新、拼搏、担当、感恩、诚信。
工作职责
1. 基于对AI应用创新全链路业务理解,搭建企业级数据仓库主题模型体系,构建离线/实时数据模型,统一支撑核心数据产品和系统,为业务提供分析决策支持; 2. 参与从数据采集、存储、计算到查询应用的端到端的海量数据处理架构设计和开发,如批流一体、数据湖、OLAP等; 3. 负责面向业务目标的数据建模和分析工作,制定符合业务特点的解决方案并推进落地实施; 4. 不断探索行业内最新的大数据解决方案,提升算力、降低成本、拓展多元数据服务能力。
包括英文材料
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
Presto+
[英文] What is Presto?
https://prestodb.io/what-is-presto/
https://www.tutorialspoint.com/apache_presto/index.htm
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
相关职位
社招软件开发岗
1. 基于对AI应用创新全链路业务理解,搭建企业级数据仓库主题模型体系,构建离线/实时数据模型,统一支撑核心数据产品和系统,为业务提供分析决策支持; 2. 参与从数据采集、存储、计算到查询应用的端到端的海量数据处理架构设计和开发,如批流一体、数据湖、OLAP等; 3. 负责面向业务目标的数据建模和分析工作,制定符合业务特点的解决方案并推进落地实施; 4. 不断探索行业内最新的大数据解决方案,提升算力、降低成本、拓展多元数据服务能力。
更新于 2025-07-15
校招数据开发
1. 负责智能车控数据、流量降本、工厂项目大数据开发及维护; 2. 负责数据呈现、指标分析、看板呈现; 3. 负责业务数据科学应用方向探索; 3. 制定数据方案与车云前后台对接、联调测试。