I am a Ph.D. student at Tampere University, where I work with Esa Rahtu on multi-modal neural architectures for video understanding. I enjoy working at the intersection of deep learning, computer vision, sound, and natural language processing. In my free time, I am keen on developing pet-projects that are somewhat serious and just-for-fun othertimes.


Publications

Vladimir Iashin and Esa Rahtu.
Taming Visually Guided Sound Generation.
In British Machine Vision Conference (BMVC), 2021 (Oral)

Vladimir Iashin and Esa Rahtu.
A Better Use of Audio-Visual Cues:
Dense Video Captioning with Bi-modal Transformer.
In British Machine Vision Conference (BMVC), 2020

Vladimir Iashin and Esa Rahtu.
Multi-modal Dense Video Captioning.
In Computer Vision and Pattern Recognition (CVPR) Workshops, 2020.

Notes

How Did You Build Your Object Detector?

A step-by-step guide on how I built my Online Object Detector as a full-stack website app. It describes how to design front- and back-ends, rent a server in the cloud, use DNS, and rent a personal domain name.

IDE Customization

A note about VSCode customization ⚙️. It contains a list of cool extensions 🧩 and settings 🎛 that I found exceptionally useful in my daily coding routine.

Pet-Projects

Video Features

Provides an easy and flexible API for feature extraction as well as optical flow frame extraction from raw videos allowing multi-GPU acceleration.

Object Detector

It will tell you what's on the image you uploaded.

The detector is based on YOLOv3, implemented in PyTorch. The computation is done on a cloud server which runs a Flask application.