Ishan Dave

I am a fifth-year Ph.D. student in the Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), advised by Prof. Mubarak Shah. .

Email  /  Google Scholar  /  Github  /  LinkedIn  /  CV (it is public version, for a detailed version feel free to ask!)

profile photo
Work Experience
Research Scientist/ Engineer Intern
Adobe Inc., San Jose, California, USA. May 2023- Nov 2023
Host: Simon Jenni, Fabian Caba

Developed fine-grained video retrieval systems for large-scale video galleries with millions of videos

Research Scientist Intern
Adobe Inc., Remote, USA. May 2022 - Nov 2022
Host: Simon Jenni

Proposed Self-supervised Video Representation Learning method suitable for both high-level and low-level video downstream tasks

Research

I have a broad interest in computer vision and machine learning. My current research mainly focuses on video representation learning with limited labels (self/semi-supervised learning), action recognition, and privacy preservation in video understanding tasks. I have also worked on various robotics-related vision tasks like event-camera-based action recognition and drone-to-drone detections from videos. Below is a selected list of my works (in chronological order), representative papers are highlighted.

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah.
AAAI Conference on Artificial Intelligence, Main Technical Track (AAAI) , 2024
project page

We demonstrate experimentally that our more challenging frame-level task formulations and the removal of shortcuts drastically improve the quality of features learned through temporal self-supervision. Our extensive experiments show state-of-the-art performance across 10 video understanding datasets, illustrating the generalization ability and robustness of our learned video representations.

TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for Video Anomaly Detection
Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah.
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
paper  /  code  /  project page

We propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner. In particular, we propose the use of a temporally-distinct triplet loss to promote temporally discriminative features, which complements current weakly-supervised VAD methods.

EventTransAct: A Video Transformer-based Framework for Event-camera Based Action Recognition
Tristan de Blegiers*, Ishan Rajendrakumar Dave*, Adeel Yousaf, Mubarak Shah.
*= equal contribution
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023
paper  /  code  /  project page

We propose a video transformer-based framework for event-camera based action recognition, which leverages event-contrastive loss and augmentations to adapt the network to event data. Our method achieved state-of-the-art results on N-EPIC Kitchens dataset and competitive results on the standard DVS Gesture recognition dataset, while requiring less computation time compared to competitive prior approaches.

TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper  /  code  /  project page

We propose a student-teacher semi-supervised learning framework, where we distill knowledge from a temporally-invariant and temporally-distinctive teacher. Depending on the nature of the unlabeled video, we dynamically combine the knowledge of these two teachers based on a novel temporal similarity-based reweighting scheme. State-of-the-art results on Kinetics400, UCF101, HMDB51.

Transvisdrone: Spatio-temporal Transformer for Vision-based Drone-to-drone Detection in Aerial Videos
Tushar Sangam, Ishan Rajendrakumar Dave, Waqas Sultani, Mubarak Shah.
2023 IEEE International Conference on Robotics and Automation (ICRA), 2023
paper  /  code  /  project page

We propose a simple yet effective framework, TransVisDrone, that provides an end-to-end solution with higher computational efficiency. We utilize CSPDarkNet-53 network to learn object-related spatial features and VideoSwin model to improve drone detection in challenging scenarios by learning spatio-temporal dependencies of drone motion.

SPAct: Self-supervised Privacy Preservation for Action Recognition
Ishan Rajendrakumar Dave, Chen Chen, Mubarak Shah.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
paper  /  code

For the first time, we present a novel training framework that removes privacy information from input video in a self-supervised manner without requiring privacy labels. We train our framework using a minimax optimization strategy to minimize the action recognition cost function and maximize the privacy cost function through a contrastive self-supervised loss.

TCLR: Temporal Contrastive Learning for Video Representation
Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah.
Computer Vision and Image Understanding (CVIU), 2022
(100+ citations, Among the top-10 most downloaded papers in CVIU)
paper  /  code

We propose a new temporal contrastive learning framework for self-supervised video representation learning, consisting of two novel losses that aim to increase the temporal diversity of learned features. The framework achieves state-of-the-art results on various downstream video understanding tasks, including significant improvement in fine-grained action classification for visually similar classes.

Gabriellav2: Towards Better Generalization in Surveillance Videos for Action Detection
Ishan Dave, Zacchaeus Scheffer, Akash Kumar, Sarah Shiraz, Yogesh Singh Rawat, Mubarak Shah.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022
paper

We propose a realtime, online, action detection system which can generalize robustly on any unknown facility surveillance videos. We tackle the challenging nature of action classification problem in various aspects like handling the class-imbalance training using PLM method and learning multi-label action correlations using LSEP loss. In order to improve the computational efficiency of the system, we utilize knowledge distillation.

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos
Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan R Dave, Yogesh S Rawat, Mubarak Shah.
25th International Conference on Pattern Recognition (ICPR), 2021 (Best Paper Award)
paper

Gabriella consists of three stages: tubelet extraction, activity classification, and online tubelet merging. Gabriella utilizes a localization network for tubelet extraction, with a novel Patch-Dice loss to handle variations in actor size, and a Tubelet-Merge Action-Split (TMAS) algorithm to detect activities efficiently and robustly.

Patent
Action Recognition System Preserves Privacy in Video Sharing.
Ishan Rajendrakumar Dave, Chen Chen, Mubarak Shah,
The University of Central Florida. Invention Track Code: 2023-019. (Status: Filed) , 2023
Tech Sheet
Challenges Winner
Trophy Image
2nd place, 2022 - ActivityNet ActEV Challenge (CVPR)



1st place, 2021 - PMiss@0.02tfa, ActivityNet ActEV SDL (CVPR)


1st place, 2020 - PMiss and nAUDC, ActivityNet ActEV SDL (CVPR)

2nd place, 2020 - TRECVID ActEV: Activities in Extended Video

ORCGS Doctoral Fellowship, 2019-2020

Professional Reviewing experience
Reviewer, CVPR 2024, 2023, 2022
Reviewer, ICCV 2023
Reviewer, IEEE Transaction on Image Processing
Reviewer, IEEE Transaction on Pattern Analysis and Machine Intelligence
Reviewer, IEEE Transactions on Multimedia
Reviewer, IEEE Transactions on Circuits and Systems for Video Technology
Reviewer, IEEE Transactions on Neural Networks and Learning Systems
Reviewer, Computer Vision and Image Understanding
Reviewer, Pattern Recognition
Reviewer, Expert Systems with Applications
Reviewer, Image and Vision Computing
Reviewer, Journal of Real-Time Image Processing
Reviewer, Multimedia Tools and Applications


Mentor in NSF-REU
NSF Image Kevin Chung, REU 2022
Ethan Thomas, REU 2021
Kali Carter, REU 2020


Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.