• Innovation through disruptive and scalable technology .
  • Cutting-edge AI research .
  • Accelerating innovations in research and service .
  • We strive for (and achieve) excellence! .
  • “SotA” (State-of-the-Art) .
  • Visual demo of research and service innovation
  • Human. Machine. Experience Together .

Visual AI

AI for visual talk


One of the oldest long-distance communication methods, a signal fire uses smoke during the daytime and light during the nighttime as a means of message delivery. As we can see from this, visual communication is an effective as well as fast tool that can be easily utilized in the moment of need. Our research on Visual AI aims to provide convenient daily communication based on 5GX technology, which is an unprecedently fast and safe than ever before. We are currently focusing on the development of reliable AI which can immediately react and provide services by processing given photos and videos.


Visual Question Answering (VQA) is not only a multi-modal system that can answer diverse questions by referring photos but also a multi-tasking system that can answer different kinds of question. Inspired by a Turing Test proposed by Alan Turing, a pioneer in the field of AI, Visual Turing Test and especially VQA use a combination of visual and language information and test whether they can reach the human level intelligence through question answering process. Since VQA is based on the visual information, it can immediately show inference process through visualization and by further implementing in depth analysis, we can also design an interpretable as well as security-friendly AI.

Referring Relationships is a multimodal learning method which includes visual reaction to the language information as well as high quality inference procedure. Our aim is to locate two objects indicated by triplet of subject, object and predicate. Many researches on Referring Relationships became important foundation for in depth proceeding in the field of Visual Understanding. Previously, Visual Understanding classified one representative object from the big range of picture by using Image Classification. As a next step, researches on Visual Relationship Detection were proposed for further understanding of the relationship among multiple objects. However, during this process, it is crucial to solve the ambiguity coming from diverse objects within the same range in a picture. Referring Relationships targets the exact object by limiting the relationship that can be observed in pictures. In this way, we try to prepare research foundation for visual logic inference by clarifying the problem definition.

Video Object Segmentation (VOS) is a technology that separates objects from the video in pixel level. Compared to the previous Video Recognition and Video Object Tracking, VOS can deal with detailed image information and also be easily integrated in multifold applications including autonomous driving and object removal. Furthermore, for a sustainable VOS technology, Video Understanding should be first considered to fast process and analyze given dataset of visual information according to the situation. We research on numerous VOS technology which can robustly function even in the unpredictable environment, including Video Understanding as a foundation of diverse Video Application. In this hyper-connected society with high demand of video information, we hope to design visual AI which can actively communicate with diverse users.


AI technology attains increasing attention; especially in this field, we research on the most effective and reliable Visual AI encompassing diverse aspects. Through our Visual AI technology, we warmly hope to contribute for swift and safe communication of everyone in everyday life.


  • [1] Kim, J.-H., Jun, J., & Zhang, B.-T. (2018). Bilinear Attention Networks. In Advances in Neural Information Processing Systems 31.
  • [2] Kim, D., Cho, D., Yoo, D., Kweon, I.-S. (2017). Two-Phase Learning for Weakly Supervised Object Localization. In IEEE International Conference on Computer Vision.
  • [3] Cho, D., Tai, Y.-W., Kweon, I.-S. (2016). Natural Image Matting using Deep Convolutional Neural Networks. In European Conference on Computer Vision.
  • [4] You, J. (2015). Beyond the Turing Test. Science, 347(6218), In Science.
  • [5] Agrawal, A., Lu, J., Antol, S., Mitchell, M., Zitnick, C. L., Parikh, D., & Batra, D. (2017). VQA: Visual Question Answering. International Journal of Computer Vision, 123(1).
  • [6] Gurari, D., Li, Q., Stangl, A. J., Guo, A., Lin, C., Grauman, K., … Bigham, J. P. (2018). VizWiz Grand Challenge: Answering Visual Questions from Blind People. In IEEE Computer Vision and Pattern Recognition.
  • [7] Krishna, R., Chami, I., Bernstein, M., & Fei-Fei, L. (2018). Referring Relationships. In IEEE Conference on Computer Vision and Pattern Recognition.
  • [8] Hudson, D. A., & Manning, C. D. (2019). GQA: A New Dataset for Compositional Question Answering Over Real-World Images. In arXiv preprint arXiv:1902.09506
  • [9] Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A. (2016). A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.
  • [10] Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L. (2017). One-Shot Video Object Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

  • Clark
  • SE
  • James
  • Jayden
  • Jerome