Updates
Oct'24: Successfully defended my Ph.D dissertation!
Aug'24: SPAct Patent Approved! my first patent as the primary inventor 💥
Jul'24: 2 First author papers accepted at ECCV 2024: Oral presentation! (Top 3% papers)💥💥
Jun'24: Selected as Outstanding Reviewer of CVPR 2024! (top 2% among 10,000 reviewers)🥇
May'24: Started internship at Apple, Cupertino, CA
Dec'23: A First author paper "No More Shortcuts" accepted to AAAI 2024 💥
Jul'23: A First author paper "Event-TransAct" accepted at IROS 2023 💥
Jul'23: A paper "TeD-SPAD" accepted at ICCV 2023
May'23: Started summer internship at Adobe, San Jose, CA
Mar'23: A First author paper "TimeBalance" accepted to CVPR 2023 💥
Jan'23: A paper "TransVisDrone" accepted at ICRA 2023
May'22: Started summer internship at Adobe, USA (remote- Florida)
Mar'22: A First author paper "TCLR" accepted to CVIU 2022 💥
Mar'22: A First author paper "SPAct" accepted to CVPR 2022 💥
Jan'21: Our Gabriella paper has been awarded the best scientific paper award at ICPR 2020
|
|
PhD AI/ML Intern
Apple Inc., Cupertino, California, USA. May 2024- Aug 2024
- Enhanced stable diffusion models for image editing by leveraging vision-language multimodal foundation models.
- Trained diffusion models on a large-scale, high-resolution dataset of 10M samples.
- Reproduced and outperformed state-of-the-art image editing methods using a novel approach, achieving superior results.
|
|
Research Scientist/ Engineer Intern
Adobe Inc., San Jose, California, USA. May 2023- Nov 2023
Host: Simon Jenni,
Fabian Caba
- Enhanced the fine-grained capabilities of existing video retrieval methods.
- Worked on large-scale video galleries with millions of samples.
- Filed a patent and had a paper accepted at ECCV 2024.
|
|
Research Scientist Intern
Adobe Inc., Remote, USA. May 2022 - Nov 2022
Host: Simon Jenni
- Developed a novel self-supervised video representation framework by reformulating temporal self-supervision as frame-level recognition tasks and introducing an effective augmentation strategy to mitigate shortcuts.
- Achieved state-of-the-art performance on 10 video understanding benchmarks across linear classification (Kinetics400, HVU, SSv2, Charades), video retrieval (UCF101, HMDB51), and temporal correspondence (CASIA-B).
- Published a paper at AAAI 2024.
|
Research
I have a broad interest in computer vision and machine learning. My primary research focuses on video understanding: self/semi supervised learning and action recognition.
My recent research also includes enhancing the fine-grained video understanding of the large foundational models and improving multi-modal generative AI for image editing applications. I have also worked on various robotics-related vision tasks like event-camera-based action recognition and drone-to-drone detections from videos.
Below is a selected list of my works (in chronological order), representative papers are highlighted.
|
|
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
Ishan Rajendrakumar Dave,
Fabian Caba,
Mubarak Shah,
Simon Jenni.
The 18th European Conference on Computer Vision (ECCV) , 2024
Oral presentation! (Top 3% of accepted papers)
project page
Temporal video alignment synchronizes key events like object interactions or action phase transitions in two videos, benefiting video editing, processing, and understanding tasks. Existing methods assume a given video pair, limiting applicability. We redefine this as a search problem, introducing Alignable Video Retrieval (AVR), which identifies and synchronizes well-alignable videos from a large collection. Key contributions include DRAQ, a video alignability indicator, and a generalizable frame-level video feature design.
|
|
FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition
Ishan Rajendrakumar Dave,
Mamshad Nayeem Rizve,
Mubarak Shah.
The 18th European Conference on Computer Vision (ECCV) , 2024
project page
We introduce Alignability-Verification-based Metric learning for semi-supervised fine-grained action recognition. Using dynamic time warping (DTW) for action-phase-aware comparison, our learnable alignability score refines pseudo-labels of the video encoder. Our framework, FinePseudo, outperforms prior methods on fine-grained action recognition datasets. Additionally, it demonstrates robustness in handling novel unlabeled classes in open-world setups.
|
|
CodaMal: Contrastive Domain Adaptation for Malaria Detection in Low-Cost Microscopes
Ishan Rajendrakumar Dave,
Tristan de Blegiers,
Chen Chen,
Mubarak Shah.
31st IEEE International Conference on Image Processing (ICIP) , 2024
Oral presentation!
project page  /  code
We propose a Domain Adaptive Contrastive objective to bridge the gap between High and Low Cost Microscopes. On the publicly available large-scale M5 dataset, our proposed method shows a significant improvement of 16% over the state-of-the-art methods in terms of the mean average precision metric (mAP), provides a 21× speed-up during inference, and requires only half as many learnable parameters as the prior methods.
|
|
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
Ishan Rajendrakumar Dave,
Simon Jenni,
Mubarak Shah.
AAAI Conference on Artificial Intelligence, Main Technical Track (AAAI) , 2024
project page
We demonstrate experimentally that our more challenging frame-level task formulations and the removal of shortcuts drastically improve the quality of features learned through temporal self-supervision. Our extensive experiments show state-of-the-art performance across 10 video understanding datasets, illustrating the generalization ability and robustness of our learned video representations.
|
|
TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for Video Anomaly Detection
Joseph Fioresi,
Ishan Rajendrakumar Dave,
Mubarak Shah.
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
paper  /  code  /  project page
We propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner. In particular, we propose the use of a temporally-distinct triplet loss to promote temporally discriminative features, which complements current weakly-supervised VAD methods.
|
|
EventTransAct: A Video Transformer-based Framework for Event-camera Based Action Recognition
Tristan de Blegiers*,
Ishan Rajendrakumar Dave*,
Adeel Yousaf,
Mubarak Shah.
*= equal contribution
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023
paper  /  code  /  project page
We propose a video transformer-based framework for event-camera based action recognition, which leverages event-contrastive loss and augmentations to adapt the network to event data. Our method achieved state-of-the-art results on N-EPIC Kitchens dataset and competitive results on the standard DVS Gesture recognition dataset, while requiring less computation time compared to competitive prior approaches.
|
|
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
Ishan Rajendrakumar Dave,
Mamshad Nayeem Rizve,
Chen Chen,
Mubarak Shah.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper  /  code  /  project page
We propose a student-teacher semi-supervised learning framework, where we distill knowledge from a temporally-invariant and temporally-distinctive teacher. Depending on the nature of the unlabeled video, we dynamically combine the knowledge of these two teachers based on a novel temporal similarity-based reweighting scheme. State-of-the-art results on Kinetics400, UCF101, HMDB51.
|
|
Transvisdrone: Spatio-temporal Transformer for Vision-based Drone-to-drone Detection in Aerial Videos
Tushar Sangam,
Ishan Rajendrakumar Dave,
Waqas Sultani,
Mubarak Shah.
2023 IEEE International Conference on Robotics and Automation (ICRA), 2023
paper  /  code  /  project page
We propose a simple yet effective framework, TransVisDrone, that provides an end-to-end solution with higher computational efficiency. We utilize CSPDarkNet-53 network to learn object-related spatial features and VideoSwin model to improve drone detection in challenging scenarios by learning spatio-temporal dependencies of drone motion.
|
|
SPAct: Self-supervised Privacy Preservation for Action Recognition
Ishan Rajendrakumar Dave,
Chen Chen,
Mubarak Shah.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
paper  /  code
For the first time, we present a novel training framework that removes privacy information from input video in a self-supervised manner without requiring privacy labels. We train our framework using a minimax optimization strategy to minimize the action recognition cost function and maximize the privacy cost function through a contrastive self-supervised loss.
|
|
TCLR: Temporal Contrastive Learning for Video Representation
Ishan Dave,
Rohit Gupta,
Mamshad Nayeem Rizve,
Mubarak Shah.
Computer Vision and Image Understanding (CVIU), 2022
(100+ citations, Among the top-10 most downloaded papers in CVIU)
paper  / 
code
We propose a new temporal contrastive learning framework for self-supervised video representation learning, consisting of two novel losses that aim to increase the temporal diversity of learned features. The framework achieves state-of-the-art results on various downstream video understanding tasks, including significant improvement in fine-grained action classification for visually similar classes.
|
|
Gabriellav2: Towards Better Generalization in Surveillance Videos for Action Detection
Ishan Dave,
Zacchaeus Scheffer,
Akash Kumar,
Sarah Shiraz,
Yogesh Singh Rawat,
Mubarak Shah.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022
paper
We propose a realtime, online, action detection system which can generalize robustly on any unknown facility surveillance videos. We tackle the
challenging nature of action classification problem in various aspects like handling the class-imbalance training using PLM method and learning multi-label action correlations using LSEP loss. In order to improve the computational efficiency of the system, we utilize knowledge distillation.
|
|
Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos
Mamshad Nayeem Rizve,
Ugur Demir,
Praveen Tirupattur,
Aayush Jung Rana,
Kevin Duarte,
Ishan R Dave,
Yogesh S Rawat,
Mubarak Shah.
25th International Conference on Pattern Recognition (ICPR), 2021 (Best Paper Award)
paper
Gabriella consists of three stages: tubelet extraction, activity classification, and online tubelet merging. Gabriella utilizes a localization network for tubelet extraction, with a novel Patch-Dice loss to handle variations in actor size, and a Tubelet-Merge Action-Split (TMAS) algorithm to detect activities efficiently and robustly.
|
|
Self-Supervised Privacy Preservation Action Recognition System
Ishan Rajendrakumar Dave,
Chen Chen,
Mubarak Shah,
The University of Central Florida. Invention Track Code: 2023-019. (Status: Approved) , 2023
Tech Sheet
|
|
2nd place, 2022 -
ActivityNet ActEV Challenge (CVPR)
1st place, 2021 -
PMiss@0.02tfa, ActivityNet ActEV SDL (CVPR)
1st place, 2020 -
PMiss and nAUDC, ActivityNet ActEV SDL (CVPR)
2nd place, 2020 -
TRECVID ActEV: Activities in Extended Video
ORCGS Doctoral Fellowship, 2019-2020
|
Professional Reviewing experience
|
|
Reviewer, CVPR 2025, 2024, 2023, 2022
Reviewer, ECCV 2024
Program Committee, AAAI 2025
Reviewer, ICCV 2023
Technical Committee, BMVC Workshop 2024
Reviewer, WACV 2025, 2024
Reviewer, IEEE Robotics and Automation Letters
Reviewer, ICRA 2024
Reviewer, IROS 2024
Reviewer, IEEE Transaction on Image Processing
Reviewer, IEEE Transaction on Pattern Analysis and Machine Intelligence
Reviewer, IEEE Transactions on Multimedia
Reviewer, IEEE Transactions on Circuits and Systems for Video Technology
Reviewer, IEEE Transactions on Neural Networks and Learning Systems
Reviewer, Computer Vision and Image Understanding
Reviewer, Pattern Recognition
Reviewer, Expert Systems with Applications
Reviewer, Image and Vision Computing
Reviewer, Journal of Real-Time Image Processing
Reviewer, Multimedia Tools and Applications
|
Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.
|
|