Amrutha Vamshi Goud

Data Engineer
Denton, US.

About

Highly proficient Data Engineer with over 3 years of experience in designing and deploying scalable ETL pipelines, real-time data platforms, and ML-ready infrastructure across AWS and GCP environments. Specializing in orchestrating complex data workflows using Airflow, dbt, Apache Spark, and Kafka, I build analysis-ready pipelines with Python, SQL, and Terraform, enhancing data quality, governance, and high-availability. My expertise in CI/CD automation with Docker, Kubernetes, and GitHub Actions drives cost-optimized, enterprise-grade data ecosystems for advanced analytics and business intelligence.

Skills

Programming & Scripting

Python (NumPy, Pandas, Matplotlib, Seaborn, SciPy, Scikit-learn, TensorFlow, PyTorch, Keras, NLTK, MLflow, Open Refine), R, Java, Scala, SQL (MySQL, PostgreSQL, Oracle, PL/SQL), NoSQL (MongoDB, Cassandra, DynamoDB).

Machine Learning & AI

Supervised & Unsupervised Learning, Deep Learning (CNN, RNN), Natural Language Processing (BERT, spaCy, GPT-4, Claude), Time Series Forecasting, Feature Engineering, Anomaly Detection, Ensemble Methods (XGBoost, LightGBM, Random Forest), Reinforcement Learning, Generative Models (LLMs incl. LLaMA 2), Prompt Engineering, Retrieval-Augmented Generation (RAG).

MLOps & Deployment

Docker, Kubernetes, PyTorch Serve, Flask, FastAPI, Jenkins, Git, CI/CD Pipelines, Terraform, Model Serving (FBLearner, Torch Serve), Serverless (AWS Lambda).

Big Data & Data Engineering

Apache Spark, Hadoop, Hive, Kafka, Airflow, Prefect, EMR, MapReduce, Apache Flink, ETL/ELT Pipelines.

Cloud & Data Warehousing

AWS (S3, Lambda, Redshift, Event Bridge), IBM Cloud, Snowflake, Teradata, Google Cloud Platform (GCP), BigQuery, Dataflow, RDS, Azure, Azure Data Factory, Databricks.

Data Analytics

Machine Learning, Predictive Modeling, Data Mining, Statistics, Statistical Analysis, ANOVA, Regression Analysis, Time-Series Forecasting, Forecast Accuracy, Classification, Cross-validation, Analytical Models, Exploratory Data Analysis (EDA), A/B Testing, Predictive Analytics.

Visualization & BI Tools

Power BI, Tableau, Google Data Studio, Excel, SSRS, Dashboard Design, Data Storytelling.

Data Modeling & Governance

Dimensional Modeling, Star & Snowflake Schema Design, dbt Tests, Data Contracts, Data Lineage (DataHub), Data Quality Checks, Database Design, Schema Versioning.

Tools & Environments

Jupyter Notebook, RStudio, Linux Shell, RESTful APIs, Apache NiFi, ELK Stack.

Work

Uber
|

Data Engineer

Denton, TX, US

Summary

Developed and optimized scalable ETL pipelines and data infrastructure for real-time processing and analytics within AWS and GCP environments.

Highlights

Developed 15 scalable ETL pipelines using Apache Spark, Airflow, and Python, processing over 50M daily transactions with 99.9% uptime.

Refactored core dbt models and modular SQL logic, enhancing maintainability and reducing month-end close processes by 3 days.

Optimized Amazon Redshift performance by redesigning schemas and partition strategies, boosting query speeds by 30%.

Architected cross-cloud ingestion frameworks across AWS (S3, Lambda) and GCP (BigQuery, Dataflow), reducing data latency by 35%.

Integrated CI/CD pipelines with GitHub Actions and dbt test automation, enhancing reliability and reducing downstream data defects by 25%.

Enabled analytics-ready outputs and data lineage tracking with DataHub, directly aiding a $2M budget variance investigation.

Trigent Software
|

Junior Data Analyst

Hyderabad, Telangana, India

Summary

Performed data analysis and dashboard design to uncover insights and improve reporting efficiency for marketing and product growth.

Highlights

Analyzed over 100K rows of data using SQL, Python, and Power BI, uncovering key insights for marketing and product growth.

Designed 15+ dynamic Tableau dashboards, reducing manual reporting workload by 35%.

Built segmentation logic with Python, scikit-learn, and schedulers, saving 20+ hours monthly in reporting.

Improved accuracy with custom ETL cleansing logic, increasing cross-team report precision by over 20%.

Delivered actionable insights through SQL joins, window functions, and dashboards, reducing failed transactions by nearly 10%.

Dell Technologies
|

Data Analyst Intern

Hyderabad, Telangana, India

Summary

Processed data and developed interactive dashboards to improve analytics workflows and identify cost-saving opportunities.

Highlights

Processed over 50K records using Python, SQL, and Jupyter, enhancing analytics workflows and operational reporting.

Built 10+ interactive Tableau dashboards, increasing data visibility and reducing decision-making time by over 25%.

Engineered data cleaning flows in OpenRefine, NumPy, and Python, cutting data preparation time by 40%.

Wrote efficient queries in Snowflake and MySQL, accelerating business dashboard updates by nearly 30%.

Automated recurring ETL tasks with Python scripts, eliminating 40% of manual data preparation time and enhancing workflow integrity.

Conducted regression analyses to uncover cost-saving opportunities in vendor contracts, contributing to strategic solutions.

Education

University Of North Texas
Denton, Texas, United States of America

Master of Science

Advanced Data Analytics

Osmania University
Hyderabad, Telangana, India

Bachelor of Arts

Administration