As a Data Engineer on the Services Data Science & Analytics team, you will be responsible for designing, developing, and maintaining robust data pipelines to support Apple Services analytics initiatives. Our team's task is to build a comprehensive aggregate data layer that enables efficient and flexible executive reporting, highly customized data applications, and powerful ML inference and analysis.
In this role, you will work closely with data scientists, BI engineers, and business and product teams to build scalable data pipelines and solutions. As a data engineer, you must effectively collaborate to bridge the gap between business needs, analytical solutions, and engineering requirements. Additionally, proactive collaboration with other data engineering teams is essential for scaling solutions across teams and Apple Services.
This role requires expertise in data engineering tools and patterns, including developing PySpark jobs, using orchestration tools for scheduling and monitoring, understanding CI/CD processes, and managing a dynamic data lake. The team maintains pipelines that require on-call support and monitoring. A successful engineer will have a strong intuition for quickly identifying and resolving bugs and efficiently solving technical challenges.
Responsibilities:
- Independently design technical solutions to process massive datasets (billions of daily records) and unify analytics across Apple Services
- Build data pipelines using Python, PySpark and SQL
- Manage an evolving data schema ensuring that data structures can extend over time while also maintaining backward compatibility
- Collaborate with Business Intelligence and Data Science teams to design aggregate tables used across multiple teams and workflows
- Integrate new tools and packages into the team’s workflow and codebase
- Provide on-call support and monitoring
- Mentor and provide data engineering best practices across the organization