Retailer, Distributor, and Production and ManufacturingView Full Profile
The Data Engineer designs and maintains data pipelines supporting a production grade data warehouse spanning multiple operational / functional departments and hundreds of users. S/he is a meticulous programmer who prides themselves on clean, efficient, maintainable code, develops new ETL pipelines and maintains existing ETL pipelines. The Data Engineer extracts company data from a variety of structured and unstructured sources and normalizes information encoding methods/schema into data sets for warehousing and subsequent data modeling. In addition, s/he recommends and develops changes to source data structures/systems and assists with the implementation of new systems and updates existing systems from a data integrity and use perspective by developing appropriate data schema and structures for use in downstream models/reports.
What You'll Do
- Embraces and demonstrates alignment with Caliva Values – Integrity, Positive Energy, Bias to Action, Connectedness, Truth Seeking .
- Primary owner of our ETL pipelines spanning dozens of source systems across all departments of Caliva to support our data warehouse initiative.
- Develops programmatic tests of existing ETL pipelines to assert the quality of warehoused data.
- Becomes an evangelist of programming best practices within the Analytics team and lead by example on topics such as code clarity, judicious use of whitespace, and unit tests.
- Builds new ETL pipelines to warehouse new data sources as they are identified.
- Assembles large, complex data models to meet the needs of operational and strategic stakeholders.
- Works closely with our in-house analysts to integrate SQL data models to a dependency tree.
- Maintains user permissions for Warehouse data sources and assist in user access training.
- Other duties and responsibilities as assigned by management.
What You Have
- Previous experience developing ETL pipelines using technologies such as Airflow (preferable), Luigi, Oozie, Azkaban, ect.
- Experience manipulating and de-normalizing data in JSON format for storage in relational databases.
- Experience with Google Cloud Platform or AWS cloud services.
- Knowledge and experience with Kubernetes and/or Docker (Preferred).
- Advanced knowledge of SQL and experience working with relational databases (Preferred).
- Previous experience developing data models to support a data warehouse (Preferred).
- Bachelor’s degree or higher in an engineering or technical field such as Computer Science, Physics, Mathematics, Statistics, Engineering, Business Administration, or similar or equivalent combination of education and experience.
- 1-5 years experience manipulating data using Python (experience with Pandas is a plus); extracting data from REST APIs; managing a codebase in GitHub.
- This job operates in a professional office environment. This role routinely uses standard office equipment such as computers, phones, copiers and filing cabinets.
- May need to work overtime and weekends as needed, particularly during initial months with company.
- Must be able to sit for extended periods of time, use hands and fingers for data entry for 8 hours or more, some lifting, squatting, bending, some pushing and pulling.
- Must be able to travel to other locations.
- Must be 21 years of age or older.
- Must comply with all legal and company regulations for working in the industry.
- Must pass a background check with the San Jose Police Department.