
商汤Senior Data Engineer@
任职要求
• AI First Developer (>80% coding with AI Tools),understand how to control and maximize the AI programming tools capabilities, super-charged developer • Experience: 5+ years in data engineering or software engineering, with 3+ years architecting cloud scale data platforms. • Flexible and have the passion of creation impactful product • Fluent English communication skills (spoken and written) are requiredprefered • Technical Skills: Expert in SQL and one of Python/Scala/Java; deep hands on knowledge of (Spark), Kafka, and orchestration (Airflow, dbt, Prefect). • Ability to understand business scenarios and design data acquisition solutions that directly impact AI model perfo…
工作职责
• Understand business scenarios and design targeted data acquisition solutions, ensuring data is relevant, high-quality, and aligned with project goals. • Architect, design, and maintain enterprise-grade databases, data warehouses, and lakehouse systems to support analytical, operational, and AI workloads. • Model and optimize schema design, storage layouts, data partitioning, clustering, and indexing strategies for large-scale datasets. • Implement and maintain ETL/ELT pipelines feeding data warehouses (e.g., Snowflake, BigQuery, Redshift, Databricks, or open-lakehouse environments). • Design, collect, and maintain high-quality datasets for AI inferencing and LLM model optimization, fine-tuning, and testing, ensuring data is formatted and preprocessed to meet model requirements. • Collaborate with AI application engineers to understand model performance requirements and translate them into targeted data collection and preparation strategies. • Develop and implement automated data pipelines for efficient data processing, including data cleaning, labeling, augmentation, and transformation. • Proactively identify data gaps based on model performance metrics, design solutions to acquire, clean, and optimize data for enhanced model accuracy and efficiency. • Build, clean, and manage diverse data sources, ensuring compliance with data security and privacy standards. • Conduct exploratory data analysis to discover data patterns, anomalies, and optimization opportunities, directly impacting model performance. • Continuously learn and adapt to the latest advancements in data engineering, AI, and large language model (LLM) technologies.
You will design and build data warehouses on cloud, to provide efficient analytical and reporting capabilities across Apple’s global and regional sales and finance teams. You will develop highly scalable data pipelines to load data from various source systems, use Apache Airflow to orchestrate, schedule and monitor the workflows. Build generic and reusable solutions meeting data warehousing design standards for complex business requirements. You will be required to understand existing solutions, fine-tune them and support them as needed. Data quality is our goal and we expect you to meet our high standards on data and software quality. We are a rapidly growing team with plenty of interesting technical and business challenges to solve.We seek a self starter, who is willing to learn fast, adapt well to changing requirements and work with cross functional teams.
• Design and implement end-to-end data pipelines (ETL) to ensure efficient data collection, cleansing, transformation, and storage, supporting both real-time and offline analytics needs. • Develop automated data monitoring tools and interactive dashboards to enhance business teams’ insights into core metrics (e.g., user behavior, AI model performance). • Collaborate with cross-functional teams (e.g., Product, Operations, Tech) to align data logic, integrate multi-source data (e.g., user behavior, transaction logs, AI outputs), and build a unified data layer. • Establish data standardization and governance policies to ensure consistency, accuracy, and compliance. • Provide structured data inputs for AI model training and inference (e.g., LLM applications, recommendation systems), optimizing feature engineering workflows. • Explore innovative AI-data integration use cases (e.g., embedding AI-generated insights into BI tools). • Provide technical guidance and best practice on data architecture that meets both traditional reporting purpose and modern AI Agent requirements.
Key Responsibilities 1. Design and build batch/real-time data warehouses to support overseas e-commerce growth 2. Develop efficient ETL pipelines to optimize data processing performance and ensure data quality/stability 3. Build unified data middleware layer to reduce business data development costs and improve service reusability 4. Collaborate with business teams to identify core metrics and data requirements, delivering actionable data solutions 5. Discover data insights through collaboration with business owner 6. Participate in AI-driven efficiency enhancement initiatives, collaborating on machine learning algorithm development, feature engineering, and data processing workflows
"1. Responsible for the research and development of data platfrom for xiaomi internet businesses. 2. Build the infrastructure and tools required for optimal extraction, transformation, and loading of data from a wide variety of data sources 3. Design and implement Data as a Service ( DaaS ) for analytics and data scientist team members that assist them in developing intelligent agile operation Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc."