SupercellData Engineer, Analytics
任职要求
• 5+ years in Data Engineering or a related field. • Expertise in Python and SQL, with the ability to guide others in querying and best practices. • Proven track record of designing and maintaining large-scale ETL processes. • Familiarity with modern data stacks (e.g., Databricks, Spark) and build/orchestration tools. • Proactive, independent, and passionate about delivering high-quality data in a fast-paced environment. • Strong communication skills and fluency in English. Nice to Have • Experience contributing to data tooling and automations (e.g., debugging ETLs, optimizing queries). • Ability to diagnose issues, benchmark pipelines, and drive performance improvements. • A willingness to push back on initiatives that don’t meet best practices—ensuring high data quality remains the standard. Success in This Role
工作职责
• Own team-specific data pipelines and products end-to-end. • Plan, execute, and maintain data engineering roadmaps, aligning with wider company initiatives. • Define what data is collected to serve our evolving business needs. • Develop pipelines to deliver new datasets, uncover insights, and improve decision-making. • Continuously improve the scalability, reliability, and performance of our data systems. • Support data analysts and other stakeholders with timely, accurate data. • Participate in on-call rotations to maintain pipeline stability.
• Design and implement end-to-end data pipelines (ETL) to ensure efficient data collection, cleansing, transformation, and storage, supporting both real-time and offline analytics needs. • Develop automated data monitoring tools and interactive dashboards to enhance business teams’ insights into core metrics (e.g., user behavior, AI model performance). • Collaborate with cross-functional teams (e.g., Product, Operations, Tech) to align data logic, integrate multi-source data (e.g., user behavior, transaction logs, AI outputs), and build a unified data layer. • Establish data standardization and governance policies to ensure consistency, accuracy, and compliance. • Provide structured data inputs for AI model training and inference (e.g., LLM applications, recommendation systems), optimizing feature engineering workflows. • Explore innovative AI-data integration use cases (e.g., embedding AI-generated insights into BI tools). • Provide technical guidance and best practice on data architecture and BI solution
• Design and implement end-to-end data pipelines (ETL) to ensure efficient data collection, cleansing, transformation, and storage, supporting both real-time and offline analytics needs. • Develop automated data monitoring tools and interactive dashboards to enhance business teams’ insights into core metrics (e.g., user behavior, AI model performance). • Collaborate with cross-functional teams (e.g., Product, Operations, Tech) to align data logic, integrate multi-source data (e.g., user behavior, transaction logs, AI outputs), and build a unified data layer. • Establish data standardization and governance policies to ensure consistency, accuracy, and compliance. • Provide structured data inputs for AI model training and inference (e.g., LLM applications, recommendation systems), optimizing feature engineering workflows. • Explore innovative AI-data integration use cases (e.g., embedding AI-generated insights into BI tools). • Provide technical guidance and best practice on data architecture that meets both traditional reporting purpose and modern AI Agent requirements.
1. Design, develop, and maintain scalable data pipelines to support ML model development and production deployment. 2. Implement and maintain CI/CD pipelines for the data and ML solutions. 3. Collaborate with data scientists and other team members to understand data requirements and implement efficient data processing solutions. 4. Create and manage data warehouses and data lakes, ensuring proper data governance and security measures are in place. 5. Collaborate with product managers and business stakeholders to understand data needs and translate them into technical requirements. 6. Stay current with emerging technologies and best practices in data engineering, and propose innovative solutions to improve data infrastructure and processes for ML models and analytics applications. 7. Participate in code reviews and contribute to the development of best practices for data engineering within the team.
"1. Responsible for the research and development of data platfrom for xiaomi internet businesses. 2. Build the infrastructure and tools required for optimal extraction, transformation, and loading of data from a wide variety of data sources 3. Design and implement Data as a Service ( DaaS ) for analytics and data scientist team members that assist them in developing intelligent agile operation Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc."