I am a full-stack AI engineer with 3+ years of experience building AI solutions and SaaS products. Lately, I designed & trained my own GenAI diffusion model. I previously led the ML team at a successful startup (as first hire).
I hold an M.Tech in AI and have published research in Springer. My expertise spans generative AI, cloud deployment, machine learning, and product development, with a focus on delivering impactful solutions.
Also have extensive hands-on experience with API building, Next.js, React, AWS, Azure, & Docker.
Email: g.jain86078@gmail.com
In this research paper, I introduced a convolutional neural network model named Flynet to
address the
challenge of automatic building detection in high-resolution satellite images. Existing methods
often suffer
from time-consuming processes and incomplete solutions due to the complexity of visual features
and the
presence of other objects in the images. Flynet is designed with an encoder-decoder
architecture,
incorporating improvements that enhance its speed, lightweight nature, and accuracy.
The experimental results demonstrate that it outperforms U-Net, providing more
accurate
predictions while
being three times faster and 70% smaller in size. Through this
research
paper, my aim was not only to
contribute to the development of state-of-the-art algorithms for satellite image analysis but
also to open new
possibilities for real-world applications in remote sensing, urban planning, and disaster
management.
Illustration of predictions by the models on the validation dataset. Columns starting from left are as follow: raw satellite image, its corresponding ground truths, prediction by proposed model and prediction by U Net model.
November 2023 – December 2024
1 yr 1 mo
cloth2life.ai is an AI-driven SaaS platform for fashion virtual try-on and AI photoshoots
Independently built the entire SaaS product, including AI models, APIs, a web app, a Shopify app, and cloud microservices, delivering end-to-end solutions and ensuring seamless integration across platforms.
Developed and trained a proprietary image generation model for Virtual Try-On, based on SD 1.5 inpainting and Self-Attention, enabling users to create AI-powered fashion and lifestyle visuals instantly.
Worked with SOTA image generation model FLUX, fine-tuning it with LORAs to achieve high-quality and tailored outputs.
January 2022 - October 2023
1 yrs 9 mos
Aftershoot is a software startup specializing in computer vision-powered solutions for photographers.
Played a key role as the first full-time employee in scaling the business to over 2Mn ARR and growing the ML team to 10.
Leading machine learning team & building computer vision solutions based on object detection, classification, statistics leading algorithms, segmentation and object recognition.
Developed a face detection algorithm achieving >99% accuracy, even on images with a resolution of 6k.
Trained new ML models with improved accuracy and faster inference times, leading to increased customer satisfaction and improved product performance.
June 2021 - January 2022
8 mos
Handled the development of build infrastructure and streamlined the CI/CD pipeline to ensure faster and more efficient product delivery.
Deployed a machine learning model to accurately predict ticket build time, lead to increased developer efficiency.
Designed and implemented an internal chatbot to instantly resolve queries from developers, improving productivity and collaboration within the team.
October 2018 - October 2019
1 yrs
Oversaw machine, parts, and manpower planning, leading to significant improvements in efficiency and productivity.
Automated MIS reporting in Excel, reducing manual effort by more than 1 hour and improving accuracy.
Redesigned the coolant return tank of a machine, eliminating waste of waiting and processing on the production line, and achieving an annual cost savings of Rs 100k.
Sept. 2020 – May 2022
2 yrs
Aug 2014 – May 2018
4 yrs
Problem Statement:
The project involves the development of a face detection machine learning algorithm on high
resolution images
that achieved an impressive accuracy rate of over 99%. High-resolution images contain a
substantial amount of
detail, and very tiny face, making it more challenging for a face detection algorithm to
accurately identify
faces amidst the abundance of visual information.
Solution:
To address the challenges of developing a high-resolution face detection algorithm in
tensorflow, I undertook
a
comprehensive approach.
Dataset Preparation:
To prepare the dataset for training, I prepared few python scripts. I first divided each
high-resolution image
into 16 parts, ensuring overlapping sections to cover every possible face. I then applied an
open-source face
detection model to these image segments. This process resulted in a well-annotated and diverse
dataset,
crucial
for training a robust model.
Model Architecture: Afterward, I crafted the model architecture from scratch and embarked on
training the
model
using the dataset. This custom architecture allowed for fine-tuning the algorithm to perform
optimally on
high-resolution images with tiny faces, a challenging task in itself.
Inferencing Optimization: I also optimized the inferencing process by dividing the input image
into 4 parts,
ensuring efficient and accurate detection of faces during application usage.
The combination of these steps resulted in an exceptional high-resolution face detection
algorithm, which not
only achieved an accuracy rate of over 99% but also significantly improved the overall speed and
accuracy of
face detection in real-world scenarios.
Impact:
This project led to a rapid growth in the company, as we received significantly fewer complaints
about face
detection. Furthermore, people began recommending the product to others due to the enhanced
accuracy and
reliability of the face detection algorithm.
Problem Statement:
Task aimed to classify images into three categories: 'kiss,' 'almost kiss,' and 'no kiss.' The
primary
challenge
faced was the high resolution of the images, with an average resolution of around 6k pixels.
This high
resolution made it challenging for machine learning models to achieve accurate results,
particularly as the
kiss
event typically occurred in a very small region of the entire image. The initial model had an
accuracy of only
24%, which was lower than what a random model could provide (33% in this case).
Solution:
To address these challenges, I adopted a multi-step approach. Firstly, I leveraged a face
detection model's
predictions to identify the region of interest within the image. Next, I implemented a smart
cropping
algorithm
to extract and prepare the input for my machine learning model. Subsequently, I trained a
Convolutional Neural
Network (CNN) on these cropped images.
One additional challenge I encountered was the class imbalance in the dataset. I had a limited
number of kiss
images (15k) compared to a much larger number of images for the 'almost kiss' and 'no kiss'
classes
(approximately 100k each). To mitigate this imbalance, I employed weighted loss functions and
down-sampled the
dataset.
Impact:
Approach yielded significant improvements. The model's accuracy
increased to
approximately 80%,
representing a substantial enhancement over the initial 24% accuracy. Additionally, the
preprocessing step
significantly improved inference speed. As a result, the overall speed of the application
increased by 10%,
leading to heightened customer satisfaction.
Problem Statement:
The project's primary goal was to assess the composition quality of an image, specifically in
terms of
adhering
to the rule of thirds and accurately identifying the subject's position within the
image.
Solution:
To address this challenge, I employed the YOLO machine learning model to detect and locate the
subjects within
the images. YOLO is known for its real-time object detection capabilities, making it suitable
for identifying
the subjects swiftly and accurately.
Subsequently, I developed an algorithm that assigned a composition score to each image based on
two key
factors:
the subject's position within the image and adherence to the rule of thirds. This algorithm
considered the
subject's placement in relation to the rule of thirds grid and evaluated how well the image
composition
aligned
with this fundamental principle of visual design.
Impact:
The implemented solution demonstrated remarkable performance in assessing image composition. By
efficiently
detecting subjects and evaluating composition quality, it provided valuable insights for
selecting the best
images
Problem Statement:
The project's primary challenge was to correct the white balance of an image by accurately
predicting the
appropriate temperature and tint values for editing in Adobe Lightroom. Adobe Lightroom offers a
wide range of
white balance temperature options, from 2000K to 50,000K Kelvin, allowing photographers to
adjust them based
on
their preferences. Predicting the correct values within this extensive range can be a daunting
task.
Solution:
To address this challenge, I conducted a detailed analysis of the impact of temperature values
on images.
Through this analysis, I observed that changes in temperature values beyond 9,000K and below
2,400K had
minimal
impact on the image. Therefore, I focused on developing a regression-based machine learning
model within this
temperature range.
The model's objective was to predict the ideal temperature and tint values for white balance
correction,
simplifying the editing process in Adobe Lightroom. In this case, I employed transfer learning
and designed a
CNN architecture that not only takes images but also the current Temp and Tint values into
account. I then
trained the model using the prepared dataset.
Impact:
The implementation of this machine learning solution had a significant impact on image editing
efficiency. By
predicting temperature and tint values within a practical range, it streamlined the white
balance correction
process for photographers, saving them time and improving the overall quality of image
editing.
Problem Statement:
The challenge was to categorize products into 10,000 different categories based on their product
title and the
description. The dataset posed a significant challenge due to its vast size, comprising 3
million data
points.
Solution:
Given the immense dataset, my approach began with a thorough data analysis and data cleaning to
gain a
comprehensive understanding. I identified the primary challenge as data imbalance, with over
~95% of the data
points belonging to approximately 2,700 categories, leaving the remaining 5% distributed across
roughly 7,300
categories.
To address this challenge, I used a two-fold solution. First, I trained a machine learning model
using
Tensorflow/Keras on the 95% of the dataset that belonged to the 2,700 classes. I experimented
with various
Natural Language Processing (NLP) models, starting with simple techniques like Bag of Words,
TF-IDF, and
Word2Vec, leveraging the Python's NLTK library for data preparation. For the remaining 5% of the
dataset,
encompassing approximately 7,300 classes, I implemented a straightforward random model that
provided random
predictions for these categories.
Attached graph shows the cumulative distribution of the target classes.
Impact:
While I was unable to submit my solution within the deadline, my approach demonstrated
remarkable results.
Even
with the simplicity of using clever data preparation and a basic Bag of Words model, the
accuracy achieved was
on par with the best solutions at the time that utilized advanced models like BERT. This project
underscored
my
ability to handle complex, large-scale datasets and devise effective strategies to tackle data
imbalance
issues,
leading to competitive results in a challenging machine learning competition.
Problem Statement:
The problem statement was to create a machine learning model meant to predict ticket build times
accurately.
The
objective was to enhance developer efficiency by providing a more precise estimate of the time
required to
build
the software after submitting a commit.
Solution:
To achieve this, the initial challenge was to create a dataset because there wasn't a readily
available one. I
talked to developers from different departments to figure out if the number of files affected
build time and
which changes took more or less time. Afterward, I conducted data analysis and feature
engineering using
Python
libraries like pandas and matplotlib to prepare the necessary features. I trained several
regression machine
learning models and found that the decision tree performed the best. The great thing about it
was that it not
only predicted the build time but also provided information about the importance of each
feature.
Impact:
The deployment of the machine learning model delivered significant benefits. Developers now had
access to
precise predictions for ticket build times, leading to increased efficiency and better project
management.
This
project showcased my ability to leverage machine learning to optimize processes and enhance
productivity
within
the development team, ultimately contributing to more effective software development
practices.
Problem Statement:
The challenge at hand was to create a machine learning model capable of automatically removing
backgrounds
from
images. It involves distinguishing the main subject from the background, which can be complex
and vary
significantly across different images.
Solution:
To tackle this problem, I leveraged a diverse dataset of images containing various subjects and
backgrounds.
The
objective was to develop an efficient model that could accurately identify and isolate the main
subject from
the
background.
Initially, I explored the use of existing deep learning models, such as Mask R-CNN and U-Net,
known for their
segmentation capabilities. However, these models often came with computational overhead and
larger model
sizes.
To optimize for speed and model size, I devised a custom CNN-based architecture, drawing
inspiration from
U-Net
but tailoring it to the specific task of background removal. The challenge lay in achieving
real-time or
near-real-time processing while preserving accuracy.
Web Application Development:
In addition to the model development, I also created a user-friendly web application using
Flask. This web app
allowed users to upload their images and receive instant background removal results. It provided
a seamless
and
intuitive interface for individuals and businesses to utilize this powerful image editing
capability without
the
need for extensive technical expertise.
Impact:
The result of this project was a highly efficient model for automatic background removal in
images, coupled
with
a user-friendly web application. It offered real-time or near-real-time performance while
effectively removing
complex backgrounds.
2020
2020
2020
2020
2022