I am a Ph.D. student at Tampere University, where I work with Esa Rahtu on multi-modal neural architectures for video understanding. I enjoy working at the intersection of deep learning, computer vision, sound, and natural language processing. In my free time, I am keen on developing pet-projects that are somewhat serious and just-for-fun othertimes.


Vladimir Iashin and Esa Rahtu.
Taming Visually Guided Sound Generation.
In British Machine Vision Conference (BMVC), 2021 (Oral)

Vladimir Iashin and Esa Rahtu.
A Better Use of Audio-Visual Cues:
Dense Video Captioning with Bi-modal Transformer.
In British Machine Vision Conference (BMVC), 2020

Vladimir Iashin and Esa Rahtu.
Multi-modal Dense Video Captioning.
In Computer Vision and Pattern Recognition (CVPR) Workshops, 2020.


How Did You Build Your Object Detector?

A step-by-step guide on how I built my Online Object Detector as a full-stack website app. It describes how to design front- and back-ends, rent a server in the cloud, use DNS, and rent a personal domain name.

IDE Customization

A note about VSCode customization ⚙️. It contains a list of cool extensions 🧩 and settings 🎛 that I found exceptionally useful in my daily coding routine.


Video Features

Provides an easy and flexible API for feature extraction as well as optical flow frame extraction from raw videos allowing multi-GPU acceleration.

Object Detector

It will tell you what's on the image you uploaded.

The detector is based on YOLOv3, implemented in PyTorch. The computation is done on a cloud server which runs a Flask application.