OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios

Aditya N G
Dhruval P B
Harshith M K
Priya S S
Dr. Surabhi Narayan
adityang5@gmail.com
dhruvalpb@gmail.com
hiharshith18@gmail.com
sspriya147@gmail.com
surabhinarayan@pes.edu

Department of Computer Science, PES University, Bengaluru
Spotlight presentation at CVPR's T4V 2023

[Paper]
[Video]
[Code]



Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms to predict the depth of a 3D scene using only a single camera image. However, when this depth map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases. Though Light Detection and Ranging (LiDAR) can solve this issue, it is expensive and not feasible for many applications. To address the challenge of accurate ranging with low-cost sensors, we propose a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features and makes use of convolution and transpose convolution to efficiently operate on spatial information. We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth by substituting it with pseudo-ground truth labels obtained from neural radiance fields (NeRFs) and boosting monocular depth estimation. We will be making our code and dataset public


Bengaluru Driving Dataset

We gathered a dataset spanning 114 minutes and 165K frames in Bengaluru, India. Our dataset consists of video data from a calibrated camera sensor with a resolution of 1920×1080 recorded at a framerate of 30 Hz. We utilize a Depth Dataset Generation pipeline that only uses videos as input to produce high-resolution disparity maps




Short Presentation



Source Code

We have released the PyTorch implementation of OCTraN on GitHub. Try our code!
[GitHub]


Paper and Bibtex

[Paper]

Citation
 
Aditya N Ganesh, Dhruval Pobbathi Badrinath, Harshith Mohan Kumar, Priya S, and Surabhi Narayan. Octran: 3d occupancy convolutional transformer network in unstructured traffic scenarios. Spotlight Presentation at the Transformers for Vision Workshop, CVPR, 2023. Transformers for Vision Workshop, CVPR 2023

[Bibtex]
@misc{analgund2023octran,
  title={OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios},
  author={Ganesh, Aditya N and Pobbathi Badrinath, Dhruval and
    Kumar, Harshith Mohan and S, Priya and Narayan, Surabhi
  },
  year={2023},
  howpublished={Spotlight Presentation at the Transformers for Vision Workshop, CVPR},
  url={https://sites.google.com/view/t4v-cvpr23/papers#h.enx3bt45p649},
  note={Transformers for Vision Workshop, CVPR 2023}
}
                


Related Projects

Hardware Accelerated Stereo Vision



Say Hi!

Contact me at adityang5@gmail.com