How can we help you? Search AIC
SK telecom’s AI Center participated in ICCV, a major conference in the field of computer vision, held from October 27 to November 2 in Seoul, Korea. In order to find the latest trends, researchers from T-Brain within the AI Center attended Tutorial on recent research, Workshop on each topic, and Main Conference and Keynote Speech as well as hosted networking session with conference attendees.
In ICCV, T-Brain promoted its research on computer vision by achieving 3rd place in VisDrone Challenge and participating in an invited talk at the session of computer vision and natural language processing.
VisDrone Challenge, which stands for Vision Meets Drones: A Challenge, is one of the world's largest competitions for object detection and tracking in the images/videos taken by drone. In this challenge, T-Brain ranked 3rd out of 46 teams in the “Object Detection in Images” track, where participants should compete for their technology detecting the objects from specified category within the provided drone imagery. In this track, participants aim to locate the objects and find the object category (e.g. people, cars, buses, trucks, motorcycles, bicycles, etc.) from images taken by drones. Unlike the general image, when processing the drone shot image, we should consider input image’s high resolution, various shooting environments such as distance, brightness, shooting angle, etc. and class imbalance, which comes from the difference in the number of detected objects for each category. To effectively deal with these issues, we suggested patch-level augmentation that balances the proportions of the number of objects in each category by creating hard examples and helps the training of object detection models.
The patch-level augmentation technique first extracts all the object patches in the dataset to form an object pool, and then attaches the patches to images within the existing dataset. When patching, class imbalance is solved by attaching a large number of objects for the objects existing in small amount and a small number of objects for the objects existing in large amount within the entire dataset. In addition, we implement image augmentation, such as image rotation, left-to-right switching, and brightness adjustment and then synthesize the images. After patch-level augmented images were generated throughout the dataset, we add poorly categorized areas came from the inference by object detection models previously trained and then retrained the model, so can effectively decrease the misclassification rate and improved the overall object detection performance.
Finally, at the 3rd Workshop on Closing the Loop Between Vision and Language (https://sites.google.com/site/iccv19clvllsmdc/program), Dr. Jin-hwa Kim of T-Brain presented an invited talk of “Learning Representations of Vision and Language.” This workshop was organized by a group of researchers who are working in the interdisciplinary field of computer vision and natural language processing, and the participants shared recent research trends and knowledge through guest lectures, oral presentations, poster presentations, and panel discussions. T-Brain also introduced how to train representations of visual and language information by sharing the recent study of multimodal deep learning, tensor operation, bilinear attention models and attention models. Also, Dr. Kim participated in a program committee at the Workshop on Video Turing Test: Toward Human-Level Video Story Understanding (https://videoturingtest.github.io), and contributed to the global research community in video understanding.