Job Description

Please be aware of recruiting scams! All legitimate communication from our recruitment team will come from an official calstart.org email address via email, we will not text you about a role you have not applied to or shown interest in. We will not perform any interviews via text or Zoom chat. CALSTART does not ask for any fees or personal information such as social security numbers or bank details during the recruitment process.

About Us

CALSTART is a mission-driven industry organization focused on transportation decarbonization and clean air for all. For over 30 years, it’s been CALSTART’s mission to develop, assess, and implement large-scale, zero-emission transportation solutions to mitigate climate change and support economic growth. CALSTART works with businesses, organizations, governments, and communities to create real-life impact toward clean air and equitable access to clean transportation for all. CALSTART provides scientific, technical and policy support for regulatory development and clean technology and infrastructure acceleration.

About the Role

Proposed Internship Project: Building a Data Lake and Advanced Data Pipelines for Clean Transportation Insights

Project Overview: CALSTART aims to enhance its data infrastructure and analytics capabilities by building a robust data lake that consolidates diverse data sources to support clean transportation initiatives. This project will focus on creating a data lake environment, developing automated data pipelines, and designing powerful visualizations to gain insights into clean vehicle adoption, infrastructure planning, and sustainability efforts. The intern will contribute to building scalable data solutions that support CALSTART's mission while gaining hands-on experience in cloud-based data engineering and data science. This is a part-time (25 hours per week) 6 month internship.

What You'll Do

Data Lake Architecture: Collaborate with the data engineering team to design and build a scalable data lake architecture using cloud platforms (AWS) and technologies like Amazon S3, RDS or EC2.
Data Pipeline Development: Assist in building end-to-end ETL (Extract, Transform, Load) pipelines that pull data from various sources, process it, and store it in the data lake in an organized and efficient manner.
Data Transformation and Quality Assurance: Implement data cleansing, transformation, and validation processes to ensure data accuracy, completeness, and consistency before storing it in the data lake.
Data Visualization: Develop interactive dashboards and visualizations using tools like Power BI, Tableau, or open-source alternatives to present insights related to clean transportation, such as vehicle performance, infrastructure coverage, and funding distribution.
Documentation and Knowledge Sharing: Ensure proper documentation of the data lake architecture, pipeline processes, and visualization tools for knowledge transfer and future improvements.

What You'll Bring To The Table

Proficiency in Python for data processing and analysis
Experience with data science workflows and tools
Knowledge of ETL processes and pipeline development
Familiarity with AWS services for cloud-based data infrastructure
Strong communication skills, both written and verbal
Collaborative team player with a proactive mindset

Desired Qualifications

Bachelor/Master degrees in Math, Data Science, Statistics, Engineering, Computer Science or related fields
Experience with Data science/analytics
Proficiency in SQL and experience with relational databases such as MySQL, PostgreSQL, or Microsoft SQL Server.
Some experience with the ETL pipeline will be an add.
AWS experience

More jobs at calstart

Required Skills:

Job Description: