Skilled Data Scientist | ML, NLP, Image Processing, Sales Forecasting | 10+ Years of Experience
Summary
Experienced Data Science professional with 5.6 years of expertise in data science, machine learning, and advanced analytics. Skilled in working with structured and unstructured data sources for various Pharma companies. Demonstrated proficiency in SQL, Python, TensorFlow, Keras, OpenCV, PySpark, Google Cloud APIs, and Power BI. Specializes in Machine Learning, Natural Language Processing, Image Processing, Predictive Modelling, Unstructured data analysis, Convolutional Neural Network, Time-series forecasting, propensity modelling, Numpy, Pandas, Scipy, Scikit-learn, NLTK, Spacy, Gensim, CoreNLP, and Amazon AWS SageMaker. Additional experience includes Azure, PowerShell, ARM, Automating Azure IaaS with PowerShell, CLI and ARM Templates, DataBricks, Data Factory/Data Flow, and RDBMS.
Worked With Global-scale Companies
- Data Scientist in a leading IT Consulting Services company. (2021-Present)
- Data Scientist in a reputed Digital Transformation Services company, (2020-2020)
- Data Scientist – Artificial Intelligence and Analytics with a Multinational IT company. (2019-2020)
- Data Scientist in a leading IT Services and Consulting company.(2019-2019)
- Senior Software Engineer in a thriving company providing cloud computing, mobile solutions, global IT staffing and application development services.(2014-2019)
- Sr. Application Developer – Microsoft Technologies, C#, ASP.Net, MVC, Entity Framework, SQL Server in a top-tier AI, Automation and Hybrid Cloud Solutions company. 20
Involvement in a Multitude of Projects
AUTOMATED CHATBOT FOR INSURANCE APPLICATION
- Developed an automated chatbot for an insurance application, including configuration for application-related replies
- Tools Used: Python, Pytorch, CUDA, NLTK, Spacy, SkLearn, Tensorflow, and Keras.DeepPavlov for the project
IMAGE DIAGNOSTICS DURING HOSPITAL ENDOSCOPIC SURGERY
- Developed the architecture for data flow and attribute extraction from organ images taken by a medical device, endoscopic surgery probe, equipped with two fish-eye lens cameras.
- Processed the captured images through de-noising, dewarping, and stitching techniques to improve clarity and enable detection of organ medical condition, blood flow speed and directions, and presence of fat globules for diagnostics.
- Utilized image processing techniques such as de-noising, fish-eye dewarping, image stitching, and Homography, as well as attribute extraction, on-the-fly video preparation, and diagnostics using the MASK R-CNN model.
- Experimented with various models, including CNN, Fast R-CNN, Faster R-CNN, YOLO, and YOLOCOCO, and ultimately achieved a final accuracy of 96.54% using the MASK R-CNN model.
- Employed MongoDB for saving images and image features.
- Deployed the model in a Federated Cloud Architecture using DataRobot.
- Tools Used: Python, OpenCV, Kornia, PyTorch, TorchVision, and CUDA for the project.
TOPIC MODELING AND CORRELATED TOPIC MODELING OF MARKETING DAT
- Developed an end-to-end solution for finding similar documents using investigation numbers or key phrases for search.
- Implemented an algorithm that ranks documents based on similarity scores calculated using cosine distance, Jaccard distance, or word-mover distance.
- Utilized word2vec or FastText to vectorize the documents depending on the chosen distance metric.
- Implemented Latent Dirichlet Allocation and Hierarchical LDA, unsupervised ML models, for marketing data analysis, categorization, and sentiment analysis of client data.
- Developed a user interface (UI) to make the algorithm accessible to SMEs or auditors to streamline their audit process.
- Tools Used: Python, SQL, and SkLearn as the main tools for the project.
CUSTOMER SENTIMENT ANALYSIS ON REVIEW OF SOLD PRODUCTS
- Utilized NLP for customer sentiment analysis and tagging of similar words/specs in product reviews.
- Implemented tagging techniques to improve clarity and categorization of the output.
- Worked as a data scientist to design the data flow architecture and select appropriate models for generating topics and labeling documents.
- Utilized LDA (Latent Dirichlet Allocation) and CTM (Correlated Topic Model) as the base models for topic creation.
- Prepared and cleaned the raw free-form text data before applying the models.
- Tools Used: Azure ML, Python, NLP, NLTK, and SkLearn for the project.
CUSTOMER EMAIL CLASSIFICATION AND SENDING REPLY TO EMAILS
- Utilized NLP for customer email classification, categorization, and automated reply/action.
- Conducted sentiment analysis to improve categorization and clarity of the output.
- Developed methodologies to prepare unstructured data for statistical tests (ANOVA, Apriori rule, PCA) and machine learning techniques (logistic regression, random forest).
- Employed Stanford CoreNLP package and recursive neutral network for sentiment analysis of incoming emails.
- Worked as a Data Scientist to design the data flow architecture and select models for topic generation and labeling of documents.
- Extracted attributes from emails with various attachments (text, images, PDFs, client signature) using OCR, language detection, translation, and attribute extraction processes.
- Tools Used: SQL, Python, Django, NLP, NLTK, Spacy, Gensim, Rasa-Core, SkLearn, Tensorflow, Keras, and GreenPlum in the project.
SALES FORECASTING OF MANUFACTURED AND MARKETED PRODUCTS
- Developed a hybrid Time Series forecasting model using ARIMA and Time Series decomposition.
- Consulted contact centre agents and analyzed 4 years of historical sales data.
- Successfully reduced variance to 5.1% with the forecasting model.
- Considered seasonality, weather, geographical region, and diversity in sales forecasting.
- Tools Used: Python, SQL, Power BI, and Digital Ocean.
ENGLISH HANDWRITING RECOGNITION AND DIGITIZATION OF STUDENT EXAM SHEET [OLICR]
- Detected English handwritten digits and alphabets from photographs of handwritten papers taken with a mobile camera.
- Preprocessed techniques such as de-skewing and de-noising using Python, OpenCV, Pillow, etc.
- Precisely cut, normalized, and processed individual digits/alphabets to match the configuration of EMNIST data sets.
- Developed a statistical model (CNN) from the EMNIST Training DataSet to recognize Test DataSet and other processed images.
- Prepared a large training data set with background noise using InfiMNIST.
- Compared and analyzed processed test images for recognition, achieving an accuracy of 94.8%.
- Developed a mobile application using Kivy.
- Tools Used: Python, TensorFlow, Keras, OpenCV, Numpy, Pandas, Scipy, Scikit-learn, Kivy, Amazon AWS SageMaker, Matlab, and SQL.
Education
- MTECH –CSE from WBUT
- MBA –IT from Sikkim Manipal University
- DOEACC “B”-level (Equiv to MCA) from CDAC
- BSc. Physics(Hons) from Calcutta University
- Higher Secondary from WBCHSE
- Madhyamik from WBBSE
Other
- Certified in multiple courses relating to AI, ML, Deep Learning, Data Science, and NLP from a trusted online platform – Udemy.
← Want to hire talent like this?
If yes, you've come to the right place! Tarmack can help you hire this person or others with similar profile, wherever you are located in the world. We are a global platform that helps employers hire great talent across a whole range of skills and levels.
Want us to help you with your hiring needs?
Get StartedYou can also reach us by sending us an email at employers@tarmack.com
Want to know more about Tarmack? Click here