Portfolio Page

Publication

Flynet – Neural Network Model for Automatic Building Detection from Satellite Images. J Indian Soc Remote Sens 51, 1441–1456 (2023). Spinger.

In this research paper, I introduced a convolutional neural network model named Flynet to address the challenge of automatic building detection in high-resolution satellite images. Existing methods often suffer from time-consuming processes and incomplete solutions due to the complexity of visual features and the presence of other objects in the images. Flynet is designed with an encoder-decoder architecture, incorporating improvements that enhance its speed, lightweight nature, and accuracy.

The experimental results demonstrate that it outperforms U-Net, providing more accurate predictions while being three times faster and 70% smaller in size. Through this research paper, my aim was not only to contribute to the development of state-of-the-art algorithms for satellite image analysis but also to open new possibilities for real-world applications in remote sensing, urban planning, and disaster management.

Illustration of predictions by the models on the validation dataset. Columns starting from left are as follow: raw satellite image, its corresponding ground truths, prediction by proposed model and prediction by U Net model.

View Details

Computer Vision Image Segmentation Tensorflow Remote Sensing

Experience

Founder — Image Generation

November 2023 – December 2024

Cloth2Life

1 yr 1 mo

cloth2life.ai is an AI-driven SaaS platform for fashion virtual try-on and AI photoshoots

Independently built the entire SaaS product, including AI models, APIs, a web app, a Shopify app, and cloud microservices, delivering end-to-end solutions and ensuring seamless integration across platforms.
Developed and trained a proprietary image generation model for Virtual Try-On, based on SD 1.5 inpainting and Self-Attention, enabling users to create AI-powered fashion and lifestyle visuals instantly.
Worked with SOTA image generation model FLUX, fine-tuning it with LORAs to achieve high-quality and tailored outputs.

Machine Learning Engineer III

January 2022 - October 2023

Aftershoot

1 yrs 9 mos

Aftershoot is a software startup specializing in computer vision-powered solutions for photographers.

Played a key role as the first full-time employee in scaling the business to over 2Mn ARR and growing the ML team to 10.
Leading machine learning team & building computer vision solutions based on object detection, classification, statistics leading algorithms, segmentation and object recognition.
Developed a face detection algorithm achieving >99% accuracy, even on images with a resolution of 6k.
Trained new ML models with improved accuracy and faster inference times, leading to increased customer satisfaction and improved product performance.

Intern

June 2021 - January 2022

Siemens Industry Software

8 mos

Handled the development of build infrastructure and streamlined the CI/CD pipeline to ensure faster and more efficient product delivery.
Deployed a machine learning model to accurately predict ticket build time, lead to increased developer efficiency.
Designed and implemented an internal chatbot to instantly resolve queries from developers, improving productivity and collaboration within the team.

Graduate Apprentice

October 2018 - October 2019

Bosch

1 yrs

Oversaw machine, parts, and manpower planning, leading to significant improvements in efficiency and productivity.
Automated MIS reporting in Excel, reducing manual effort by more than 1 hour and improving accuracy.
Redesigned the coolant return tank of a machine, eliminating waste of waiting and processing on the production line, and achieving an annual cost savings of Rs 100k.

Education

M. Tech — Artificial Intelligence

Sept. 2020 – May 2022

National Institute of Technology (NIT), Bhopal

2 yrs

B. Tech — Mechanical Engineering

Aug 2014 – May 2018

University Institute of Engg. and Technology, Kurukshetra

4 yrs

Projects

Developed a High-Resolution Face Detection Algorithm (>99% accuracy)

Problem Statement:
The project involves the development of a face detection machine learning algorithm on high resolution images that achieved an impressive accuracy rate of over 99%. High-resolution images contain a substantial amount of detail, and very tiny face, making it more challenging for a face detection algorithm to accurately identify faces amidst the abundance of visual information.

Solution:
To address the challenges of developing a high-resolution face detection algorithm in tensorflow, I undertook a comprehensive approach.
Dataset Preparation: To prepare the dataset for training, I prepared few python scripts. I first divided each high-resolution image into 16 parts, ensuring overlapping sections to cover every possible face. I then applied an open-source face detection model to these image segments. This process resulted in a well-annotated and diverse dataset, crucial for training a robust model.
Model Architecture: Afterward, I crafted the model architecture from scratch and embarked on training the model using the dataset. This custom architecture allowed for fine-tuning the algorithm to perform optimally on high-resolution images with tiny faces, a challenging task in itself.
Inferencing Optimization: I also optimized the inferencing process by dividing the input image into 4 parts, ensuring efficient and accurate detection of faces during application usage.
The combination of these steps resulted in an exceptional high-resolution face detection algorithm, which not only achieved an accuracy rate of over 99% but also significantly improved the overall speed and accuracy of face detection in real-world scenarios.

Impact:
This project led to a rapid growth in the company, as we received significantly fewer complaints about face detection. Furthermore, people began recommending the product to others due to the enhanced accuracy and reliability of the face detection algorithm.

View Details

Computer Vision Object Detection Tensorflow Onnx AWS Sagemaker

Faster yet accurate kiss classification

Problem Statement:
Task aimed to classify images into three categories: 'kiss,' 'almost kiss,' and 'no kiss.' The primary challenge faced was the high resolution of the images, with an average resolution of around 6k pixels. This high resolution made it challenging for machine learning models to achieve accurate results, particularly as the kiss event typically occurred in a very small region of the entire image. The initial model had an accuracy of only 24%, which was lower than what a random model could provide (33% in this case).

Solution:
To address these challenges, I adopted a multi-step approach. Firstly, I leveraged a face detection model's predictions to identify the region of interest within the image. Next, I implemented a smart cropping algorithm to extract and prepare the input for my machine learning model. Subsequently, I trained a Convolutional Neural Network (CNN) on these cropped images. One additional challenge I encountered was the class imbalance in the dataset. I had a limited number of kiss images (15k) compared to a much larger number of images for the 'almost kiss' and 'no kiss' classes (approximately 100k each). To mitigate this imbalance, I employed weighted loss functions and down-sampled the dataset.

Impact:
Approach yielded significant improvements. The model's accuracy increased to approximately 80%, representing a substantial enhancement over the initial 24% accuracy. Additionally, the preprocessing step significantly improved inference speed. As a result, the overall speed of the application increased by 10%, leading to heightened customer satisfaction.

View Details

Computer Vision Image Classification Tensorflow Feature Engineering AWS Sagemaker

Image Composition Assessment with YOLO

Problem Statement:
The project's primary goal was to assess the composition quality of an image, specifically in terms of adhering to the rule of thirds and accurately identifying the subject's position within the image.

Solution:
To address this challenge, I employed the YOLO machine learning model to detect and locate the subjects within the images. YOLO is known for its real-time object detection capabilities, making it suitable for identifying the subjects swiftly and accurately. Subsequently, I developed an algorithm that assigned a composition score to each image based on two key factors: the subject's position within the image and adherence to the rule of thirds. This algorithm considered the subject's placement in relation to the rule of thirds grid and evaluated how well the image composition aligned with this fundamental principle of visual design.

Impact:
The implemented solution demonstrated remarkable performance in assessing image composition. By efficiently detecting subjects and evaluating composition quality, it provided valuable insights for selecting the best images

View Details

Image Processing Object Detection

Correct White Balance Prediction in Image

Problem Statement:
The project's primary challenge was to correct the white balance of an image by accurately predicting the appropriate temperature and tint values for editing in Adobe Lightroom. Adobe Lightroom offers a wide range of white balance temperature options, from 2000K to 50,000K Kelvin, allowing photographers to adjust them based on their preferences. Predicting the correct values within this extensive range can be a daunting task.

Solution:
To address this challenge, I conducted a detailed analysis of the impact of temperature values on images. Through this analysis, I observed that changes in temperature values beyond 9,000K and below 2,400K had minimal impact on the image. Therefore, I focused on developing a regression-based machine learning model within this temperature range. The model's objective was to predict the ideal temperature and tint values for white balance correction, simplifying the editing process in Adobe Lightroom. In this case, I employed transfer learning and designed a CNN architecture that not only takes images but also the current Temp and Tint values into account. I then trained the model using the prepared dataset.

Impact:
The implementation of this machine learning solution had a significant impact on image editing efficiency. By predicting temperature and tint values within a practical range, it streamlined the white balance correction process for photographers, saving them time and improving the overall quality of image editing.

View Details

Computer Vision Regression Tensorflow Onnx

Amazon ML Challenge 2021 - Product Categorization

Problem Statement:
The challenge was to categorize products into 10,000 different categories based on their product title and the description. The dataset posed a significant challenge due to its vast size, comprising 3 million data points.

Solution:
Given the immense dataset, my approach began with a thorough data analysis and data cleaning to gain a comprehensive understanding. I identified the primary challenge as data imbalance, with over ~95% of the data points belonging to approximately 2,700 categories, leaving the remaining 5% distributed across roughly 7,300 categories.
To address this challenge, I used a two-fold solution. First, I trained a machine learning model using Tensorflow/Keras on the 95% of the dataset that belonged to the 2,700 classes. I experimented with various Natural Language Processing (NLP) models, starting with simple techniques like Bag of Words, TF-IDF, and Word2Vec, leveraging the Python's NLTK library for data preparation. For the remaining 5% of the dataset, encompassing approximately 7,300 classes, I implemented a straightforward random model that provided random predictions for these categories.
Attached graph shows the cumulative distribution of the target classes.

Impact:
While I was unable to submit my solution within the deadline, my approach demonstrated remarkable results. Even with the simplicity of using clever data preparation and a basic Bag of Words model, the accuracy achieved was on par with the best solutions at the time that utilized advanced models like BERT. This project underscored my ability to handle complex, large-scale datasets and devise effective strategies to tackle data imbalance issues, leading to competitive results in a challenging machine learning competition.

View Project

View Details

Natural Language Processing (NLP) Tensorflow NLTK Feature Engineering Statistics

Deployed a Machine Learning Model for Ticket Build Time Prediction

Problem Statement:
The problem statement was to create a machine learning model meant to predict ticket build times accurately. The objective was to enhance developer efficiency by providing a more precise estimate of the time required to build the software after submitting a commit.

Solution:
To achieve this, the initial challenge was to create a dataset because there wasn't a readily available one. I talked to developers from different departments to figure out if the number of files affected build time and which changes took more or less time. Afterward, I conducted data analysis and feature engineering using Python libraries like pandas and matplotlib to prepare the necessary features. I trained several regression machine learning models and found that the decision tree performed the best. The great thing about it was that it not only predicted the build time but also provided information about the importance of each feature.

Impact:
The deployment of the machine learning model delivered significant benefits. Developers now had access to precise predictions for ticket build times, leading to increased efficiency and better project management. This project showcased my ability to leverage machine learning to optimize processes and enhance productivity within the development team, ultimately contributing to more effective software development practices.

View Details

Scikit-learn Statistics Exploratory Data Analysis Feature Engineering

Web Application for Automatic Background Removal in Images

Problem Statement:
The challenge at hand was to create a machine learning model capable of automatically removing backgrounds from images. It involves distinguishing the main subject from the background, which can be complex and vary significantly across different images.

Solution:
To tackle this problem, I leveraged a diverse dataset of images containing various subjects and backgrounds. The objective was to develop an efficient model that could accurately identify and isolate the main subject from the background.
Initially, I explored the use of existing deep learning models, such as Mask R-CNN and U-Net, known for their segmentation capabilities. However, these models often came with computational overhead and larger model sizes.
To optimize for speed and model size, I devised a custom CNN-based architecture, drawing inspiration from U-Net but tailoring it to the specific task of background removal. The challenge lay in achieving real-time or near-real-time processing while preserving accuracy.

Web Application Development:
In addition to the model development, I also created a user-friendly web application using Flask. This web app allowed users to upload their images and receive instant background removal results. It provided a seamless and intuitive interface for individuals and businesses to utilize this powerful image editing capability without the need for extensive technical expertise.

Impact:
The result of this project was a highly efficient model for automatic background removal in images, coupled with a user-friendly web application. It offered real-time or near-real-time performance while effectively removing complex backgrounds.

View Project

View Details

Computer Vision Image Segmentation Tensorflow Flask HTML, CSS

Certifications & Awards

Data Science Math Skills

2020

Coursera

Show Credential

Machine Learning Scientist

2020

Datacamp

Show Credential

Data Scientist

2020

Datacamp

Show Credential

SQL Server For Database Administrators

2020

Datacamp

Show Credential

Advanced Python

2022

Cutshort

Show Credential

Gaurav Jain

Skills

AI & Machine Learning

Cloud & DevOps

Development

Languages

Soft Skills

Publication

Flynet – Neural Network Model for Automatic Building Detection from Satellite Images. J Indian Soc Remote Sens 51, 1441–1456 (2023). Spinger.

Experience

Founder — Image Generation

Cloth2Life

Machine Learning Engineer III

Aftershoot

Intern

Siemens Industry Software

Graduate Apprentice

Bosch

Education

M. Tech — Artificial Intelligence

National Institute of Technology (NIT), Bhopal

B. Tech — Mechanical Engineering

University Institute of Engg. and Technology, Kurukshetra

Projects

Developed a High-Resolution Face Detection Algorithm (>99% accuracy)

Faster yet accurate kiss classification

Image Composition Assessment with YOLO

Correct White Balance Prediction in Image

Amazon ML Challenge 2021 - Product Categorization

Deployed a Machine Learning Model for Ticket Build Time Prediction

Web Application for Automatic Background Removal in Images

Certifications & Awards

Data Science Math Skills

Coursera

Machine Learning Scientist

Datacamp

Data Scientist

Datacamp

SQL Server For Database Administrators

Datacamp

Advanced Python

Cutshort