
搜狐大数据开发工程师(数仓方向)
社招全职3年以上集团商业部地点:北京状态:招聘
任职要求
1.全日制统招本科及以上学历,计算机相关专业,具备扎实的计算机专业基础,3年以上数据研发经验; 2.熟悉数据仓库、数据体系和数据价值的建设及优化,有数据仓库研发经验; 3.掌握数据管理治理的相关理论,熟悉数据治理、数据标准、企业级数据建模、主数据、元数据管理等方法论; 4.熟悉大数据架构,具备实时或离线数据研发能力,熟悉Hive、Kafka、Spark、Hbase、Flink等相关技术并有相关开发经验; 5.具备一定的数据分析能力,具备数据敏感性和探知欲,专注数据的价值发现和转化。 加分项: 1.有数据仓库研发、数据建模、数据体系建设及优化实际相关经验; 2.有广告相关数据处理研发和优化经验。
工作职责
1.负责数据模型的设计、ETL实施、性能优化、ETL数据监控以及相关技术问题的解决; 2.面向广告业务方向,建设专题数据,基于数据仓库构建用户、业务核心标签、特征工程数据,与业务场景深度结合,为各业务线提供数据支持; 3.负责数据仓库体系的设计、构建和实现,数仓标准化分层体系建设工作,并沉淀企业级数据资产,助力提升支持业务的效率,探索数据的增量价值; 4.负责数据治理和管理体系,结合业务+元数据+技术,推进资源成本的优化,提高数据服务的数据质量,保障数据产出的稳定性。
包括英文材料
学历+
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
数据治理+
https://www.ibm.com/think/topics/data-governance
Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data.
https://www.youtube.com/watch?v=uPsUjKLHLAg
Building data fabric eliminates the technological complexities of data governance so users can connect to the right data at the right time, regardless of where it resides.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
相关职位
社招自动驾驶板块
1. 数据指标体系搭建:深挖数据价值,构建和维护车端信号数据仓库体系和数据指标体系,为算法和数据闭环提供PB级共享平台和框架支持;负责核心数据指标体系(包括业务分类、生产状态、功能指标等)的搭建、监控与运营;快速输出并不断沉淀标准化的产品数据体系,让业务的数据化运营更加高效、便捷; 2. 数据治理:梳理上下游的数据资产,制定及推广数据标准(如研发规范、质量规范、保障规范)和治理流程,确保数据准 确性、完整性和一致性。 3. 数据管理:负责元数据管理、数据质量检查、数据分级管理等系统的设计、开发及应用,提升数据易用性、可用性及稳定性; 4. 业务团队数据需求的研发支撑:如日志埋点、车联网数据、内部与外部数据的采集、数据同步、数据清洗与标准化、数据模型设计、离线数据处理、实时数据处理、数据服务化、数据可视化等;
更新于 2025-07-08
社招
1.负责自动驾驶业务数据的数据采集、清洗、转换和加载(ETL)流程,构建和维护车端信号数据仓库体系和数据指标体系 2.支持C端用户和B端分析的各种数据需求 3.参与数据治理工作(如数据质量核查、元数据管理等) 建立监控和反馈指标,持续优化改进产品的架构及性能,保证PB级数仓的数据质量和平台稳定性
更新于 2025-06-04

社招2年以上技术支持
1、负责离线与实时数据仓库的构建,负责数据模型的设计和开发; 2、负责公司基础数据的开发、调优、维护等工作; 3、负责游戏指标体系建设与维护; 4、深入业务,理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作; 5、参与数据治理工作,提升数据易用性、规范性和数据质量。
更新于 2025-03-31