Jaygovind Sahu

Senior Data Engineer · Downingtown, Pennsylvania

Passionate and results-driven Senior Data Engineer with over 15 years of experience in developing, implementing, and optimizing data pipelines and platforms to drive business success. Adept at designing and executing robust data solutions using Python, SQL, Airflow, and AWS products that enhance efficiency, reduce costs, and enable strategic decision-making across various domains.

Experience

Amazon

Data Engineer · Malvern, PA

Mar 2022 – Present
  • Developed a robust data platform to measure and enhance the efficiency of marketing campaigns across multiple channels, including offline (billboards, newspapers, gas station TVs), social media (Facebook, Instagram), and search engines (Google, Bing). The pipeline ingested and analyzed data from various vendors to generate key metrics such as cost per click, cost per application, impressions, and GRP — leading to a 30% reduction in marketing costs in one year after launch.
  • Built a data pipeline using Airflow to standardize addresses for millions of records via an address standardization API, benefiting multiple pipelines across domains. Automated process replaced a manual one, reducing standardization time by ~90% and eliminating manual errors.
  • Partnered with a third-party data provider to integrate hundreds of millions of sensitive records into the data lake. Designed and implemented a pipeline to transform and prepare the dataset for machine learning, and developed a reusable framework for running regression and classification models — expected to reduce marketing costs by an additional 25%.

Vanguard

Data Engineer · Malvern, PA

Apr 2021 – Mar 2022
  • Implemented and maintained data pipelines using AWS EMR, Glue, and other AWS products for big data analytics, enabling successful functioning of ~15 Tableau dashboards used by financial and business analysts for strategic decision-making by leadership teams.

Jornaya (Verisk Marketing Solutions)

Data Engineer · Conshohocken, PA

Dec 2019 – Apr 2021
  • Led development and enhancement of a core product requiring processing of billions of records from clients and building reports for users — using ~70 EMR clusters, ~10 Glue jobs, and AWS Lambda functions for serverless architecture.
  • Optimized Apache Spark and AWS EMR configurations for a data pipeline serving ~50 vendors, saving more than 55% in AWS costs in one year while lowering the failure rate by ~90%, resulting in faster and more resilient delivery of results.
  • Designed and implemented a data pipeline to asynchronously call an external API endpoint, transform the response, and write datasets to Amazon S3 — using AWS SNS (queuing), Lambda (API calls with multi-processing and transformation), and Step Functions (orchestration). The pipeline processed 1 billion records in ~25 minutes, enabling timely delivery of critical client reports.

Tata Consultancy Services

Data Developer · Multiple locations

Feb 2010 – Dec 2019
  • Built data solutions for 2 major clients in the banking and financial domain, contributing to more than 15 successful projects and numerous ad-hoc analyses for business users. Developed data pipelines across technologies ranging from legacy IBM Mainframes (COBOL, JCL, DB2) to modern stacks using Python, Apache Spark, and AWS products.

Projects

The Data Domain Blog

Featured

A personal blog that curates news and articles in Data Engineering, Data Analytics, and AI/ML from various sources and RSS feeds. The backend uses Airflow to automate content discovery, retrieval, and summarization using LLMs from OpenAI and Anthropic. WordPress APIs are used to auto-publish articles with concise summaries and links to originals.

Apache Airflow Python OpenAI API Anthropic API WordPress APIs AWS

Skills
Languages
Python, SQL, COBOL, JCL
Data & Pipelines
Apache Airflow, Apache Spark, DBT, AWS Glue, AWS EMR, AWS Lambda, AWS Step Functions, AWS SNS, Amazon S3
Cloud & Infrastructure
Amazon Web Services (AWS), Terraform
Analytics & Modeling
Data Engineering, Data Modeling, Data Warehousing, Data Analysis, Machine Learning (Regression, Classification)
Tools & Platforms
Tableau, WordPress APIs, OpenAI API, Anthropic API, DB2

Education

Veer Surendra Sai University of Technology

Bachelor of Technology in Electrical Engineering — Burla, India

2009

Certifications

AWS Certified Data Analytics – Specialty

Amazon Web Services

Jul 2023