Recruit Leading Data Scientist with Expertise in Data Analytics
Tarmack LogoRequest a demo

Hire Data Scientists With Expertise in Spark, Streaming, Hadoop & Kafka

Experienced Data Scientist – 10 Yrs+ Experience In Machine Learning & Big Data

Summary

Experienced AI Solution Architect and Lead Data Scientist with a strong background in machine learning, deep learning, and big data technologies. Proven track record of designing and implementing end-to-end ML pipelines for predictive modelling, fraud detection, sentiment analysis, and anomaly detection. Proficient in cloud computing, NLP, MLOps, and CI/CD, with a focus on delivering scalable and production-ready solutions.

Work Experience

Leading Tech Data Science Company (Jun 2021 – Present)

AI Solutions Architect
  • Created and implemented solution architecture for end-to-end Machine Learning pipeline, model training, and execution of machine learning model.
  • Made data pipeline and model code production-ready following engineering standards using Python on Oracle cloud.
  • Implemented feature engineering using Python, PySpark on dataflow oracle cloud (OCI).
  • Implemented ML model to predict the wind turbine power using PySpark and GBTRegressor (Gradient boosted tree) on Spark.
  • Implemented distributed model training and scoring using PySpark and GBT Regressor of MLlib.
  • Used Oracle cloud (OCI) data science service for development of the model.
  • Implemented distributed feature engineering OCI (Oracle Cloud) dataflow using PySpark.
  • Implemented distributed model training and scoring using PySpark MLlib GBT Regressor on OCI (Oracle Cloud) using dataflow which is an ephemeral distributed computing system.
  • Credit Card Fraud detection
  • Designed and implemented solution architecture for end-to-end Machine Learning pipeline, model training, and execution of machine learning model.
  • Made data pipeline and model code production-ready following engineering standards using Python on Oracle cloud.
  • Created the solution architecture for credit card fraud detection model pipeline on cloud.
  • Implemented ML model to detect credit card fraud using PySpark and GBTClassifier (Gradient boosted tree) of MLlib on Dataflow service (Oracle cloud).
  • Leveraged oracle cloud (OCI) Data science service for development of the model.
  • Implemented distributed and scalable feature engineering on OCI (Oracle Cloud) Dataflow using PySpark.
  • Sentiment Analysis – NLP
  • Created and implemented solution architecture for end-to-end NLP pipeline for text analytics on Oracle cloud.
  • Implemented a pre-processing solution using Python.
  • The AI deep learning solution takes input statements/documents and does sentiment analysis and gauges the overall sentiment of the statement. The solution uses state-of-the-art Bert model for sentiment prediction. Developed a solution to save the ML model output downstream on cloud i.e., on object store and Oracle ADW.

Machine Learning Focused Company (January 2019 – Jun 2021)

Data scientist (Associate VP)
  • Architected the end-to-end Machine Learning pipeline, model training, and execution of machine learning model.
  • Made data pipeline and model code production-ready following engineering standards using Python.
  • Mentored and supervised the team in project planning and execution and provided constant guidance to the team to achieve their targets.
  • Created high-level plan, epics, and did backlog grooming for execution for data science tasks.
  • Implemented feature engineering to find out and extract important features using pandas and numpy.
  • Initiated and led backlog grooming, sprint planning, breaking down of epics into user stories, and estimation following agile scrum principles with the aim of delivering Machine Learning and data science solutions.
  • Performed Imputation of missing data before model building to make model more robust.
  • Did key word clustering of free-form text, using NLP and Deep learning.
  • Implemented anomaly detection algorithm using scikit-learn and XGboost with 88% accuracy.
  • Implemented Isolation Forest for anomaly detection.
  • Initiated and conducted meetings with business and other stakeholders to present dashboard of model results.
  • Co-ordinated with external vendors to procure the required purchase items.
  • Onboarded and trained model on AWS.
  • Created statistical alerts using z-score and mad.
  • Developed a model for expense type mismatch based on co-occurrence.
  • Wholesale Credit Risk
  • Designed and implemented the end-to-end data flow pipeline, model training, and execution of Machine Learning model.
  • Made the ML pipeline and model code production-ready following engineering standard using Python and PySpark.
  • Developed a statistical model to alert clients which are likely to be downgraded.
  • Initiated and led backlog grooming, sprint planning, breaking down of epics into user stories, and estimation following agile scrum principles with the aim of delivering Machine Learning and data science solutions.
  • Optimized the code and reduced the process running time from 26 hours to 3 hrs.
  • Initiated and regularly coordinated stakeholder level meetings to gather business requirements.
  • Created an execution plan, prepared high-level effort estimation and resource requirements.
  • Mentored and guided the team in the execution phase of the project to achieve their targets.
  • Took the ownership of productionizing the Statistical model and handing over the ML pipeline to Prod support team.
  • Monitored the model and did impact analysis and inculcated the business feedback into the model.

Machine Learning Startup (August 2016 – Jan 2019)

ML Engineer (Senior Developer)
  • Implemented an anomaly detection model for identifying anomalies in pharmaceutical products using random forest and scikit-learn.
  • Owned and developed Projections module which based on spark and Scala.
  • Lead a team of 6 for country-specific implementation of statistical algorithm(projection) generating revenue for product BDF R & D division.
  • Reduced Big Data algorithm (Phoenix) execution time drastically from 12 hours to 1.5 hours and improved the accuracy.
  • Designed and Implemented Projections algorithm (statistical algorithm) from scratch using spark, Scala and Hive.
  • Improved the performance of Big data and analytics products bringing down the time of run from 6 hours to 25 minutes.

Big Data Analytics Firm (Jul 2014 – Jul 2016)

Associate
  • Designed and Implemented Lambda Architecture using spark context and spark streaming context with Spark and Kafka integration
  • Automated Hadoop hortonworks deployment using Cloudbreak and Ambari via one-click install reducing deployment time to just 10 mins from 4-5 hours.
  • Successfully integrated Kafka with Spark and added streaming capability Big data analytics platform
  • Mentored the team on Big data technologies.

Global IT Services Leader (Sep 2010 – Jun 2014)

Software Engineer
  • Worked in Big data analytics with the use of Mapreduce, Hive, Pig, Sqoop, Hbase.
  • Configured, scaled and deployed Hadoop cluster (Cloudera and Apache)
  • Ran Jobs on Hadoop

Education

  • MS in Machine Learning and AI from Liverpool John Moore’s University in 2021.
  • Post-Graduation in Machine Learning and AI from IIIT-Bangalore in 2019,
  • B.E. in Electronics and Communication from CMRIT under VTU, Bangalore, in 2010.

Other

Proficient in machine learning techniques and libraries such as scikit-learn, Keras, and MLlib. Experienced with big data tools like Spark, Spark-Streaming, Hadoop, Kafka, and Hive. Skilled in data manipulation using Pandas and Numpy. Knowledgeable in ensemble learning (XGBoost, Isolation Forest, GBTRegressor) and databases (Oracle ADW, SQL Server, MySQL). Well-versed in statistical analysis and exploratory data analysis. Proficient in Python, Scala, Java, CI/CD (Jenkins, Gitblit, Bitbucket, TFS), Agile (Jira), and workflow orchestration (Apache Airflow). Expertise in NLP (NLP, Word2Vec, Bert) and cloud platforms (AWS, Oracle Cloud). Strong background in MLOps and MLflow.

Want to hire talent like this?

If yes, you've come to the right place! Tarmack can help you hire this person or others with similar profile, wherever you are located in the world. We are a global platform that helps employers hire great talent across a whole range of skills and levels.

Want us to help you with your hiring needs?

Get Started

You can also reach us by sending us an email at employers@tarmack.com

Want to know more about Tarmack? Click here

Want to hire talent like this? i

Get Started

Other Suggested Profiles For You To See

+ More

A truly global HR platform with everything you need to build, grow & manage a global team.

  • bestTalentIdentifying & recruiting the best talent
  • payrollPayroll with full compliance across 100+ countries
  • agreementsEmployment agreements as per local laws
  • contractorContractor invoices & time management
  • onboardingSmooth remote onboarding of employees
  • immigrationImmigration & mobility services around the world
Find Out More