Microsoft Research Asia – Tokyo - Microsoft Research: Tokyo Talk Series

Microsoft Research Asia – Tokyo

Tokyo Talk Series is an initiative designed to foster intellectual exchange and collaboration within our research community. This series brings prominent speakers to Microsoft Research Asia – Tokyo to share their ideas and experiences. The talks will be scheduled on an ad-hoc basis, allowing flexibility to accommodate the availability of our guest speakers.

June 30, 2025: Fast Multi-dimensional Imaging using Neuromorphic Cameras

Speaker: Professor. Boxin Shi, a Boya Young Fellow Associate Professor (with tenure) and Research Professor at Peking University

Abstract:

Neuromorphic cameras, such as event cameras and spike cameras, characterized by the high dynamic range, high temporal resolution, and sparse data representation properties, gained momentum in computer vision research in these years. Existing techniques to capture object geometry, BRDF, and hyperspectral reflectance usually rely on abundant images captured by frame-based cameras as input. These methods suffer from high data redundancy and long capture duration limitations. This talk introduces our recent research progresses to achieve fast multi-dimensional imaging using neuromorphic cameras. We design active illumination patterns tailored to the high temporal resolution and data sparsity advantages of event cameras. By analyzing the event signal triggered by rapidly changing active light, the surface normal, roughness, metallic, and hyperspectral reflectance are reconstructed in a time and data efficient manner.

Bio:

Boxin Shi is currently a Boya Young Fellow Associate Professor (with tenure) and Research Professor at Peking University, where he leads the Camera Intelligence Lab. He received the PhD degree from the University of Tokyo in 2013. From 2013 to 2017, he did research at MIT Media Lab, Singapore University of Technology and Design, Nanyang Technological University, and National Institute of Advanced Industrial Science and Technology. His research interests are computational photography and computer vision. He has published more than 200 papers, including 30 papers in TPAMI and 92 papers in CVPR/ICCV/ECCV. His papers were awarded as Best Paper, Runners-Up at CVPR 2024/ICCP 2015, and selected as candidate for Best Paper at ICCV 2015. He received the Okawa Foundation Research Grant in 2021. He has served as an associate editor of TAPMI/IJCV and an area chair of CVPR/ICCV/ECCV. He is a Distinguished Lecturer of APSIPA, a Distinguished Member of CCF, and a Senior Member of the IEEE/CSIG, and. Please access his lab website for more information: http://camera.pku.edu.cn (opens in new tab)

June 30, 2025: Computer Vision: A Journey of Pursuing 3D World Understanding

Speaker: Dr. Xiaoming Liu, MSU Foundation Professor and Anil K. and Nandita Jain Endowed Professor of Engineering, Michigan State University

Abstract:

The real world we are living in is composed of 3D objects. When a camera takes a picture or video, many of the 3D information inevitably get lost due to the camera projection. As one of the most active fields in AI, computer vision aims to develop algorithms that can derive meaningful information from the visual content. One fundamental quest of computer vision is to recover the 3D information, and thus enables a faithful 3D understanding of the world through the lens of the camera. In this talk, I will share some of our experiences in pursuing the 3D world understanding, addressing problems such as 3D reconstruction, 3D detection, depth estimation, camera calibration, pose estimation, velocity estimation, etc. The solutions to these problems have been applied to applications including biometrics, autonomous driving, and digital human/face.

Bio:

Dr. Xiaoming Liu is the MSU Foundation Professor and Anil and Nandita Jain Endowed Professor at the Department of Computer Science and Engineering of Michigan State University (MSU). He received Ph.D. degree from Carnegie Mellon University in 2004. Before joining MSU in 2012 he was a research scientist at General Electric Global Research. He works on computer vision, machine learning, and biometrics, especially on face related analysis and 3D vision. Since 2012 he helps to develop a strong computer vision area in MSU, who is ranked top 15 in US according to csrankings.org. He is an Associate Editor of IEEE Transactions on Pattern Analysis and Machine Intelligence. He has authored more than 200 publications, and has filed 35 patents. His work has been cited over 30000 times with an H-index of 82. He is a fellow of IEEE and IAPR. More information of Dr. Liu’s research can be found at http://cvlab.cse.msu.edu (opens in new tab)

June 24, 2025: Security, Privacy, and Ethics in Generative AI Systems

Speaker: Dr. Tsubasa Takahashi, Principal Researcher, Turing Inc.

Abstract:

This talk explores key challenges and emerging solutions related to security, privacy, and ethics in modern generative AI systems. We begin by examining vulnerabilities to adversarial examples, which can lead to harmful or misleading outputs, and introduce recent techniques for copyright protection and ownership verification in generative content. We then present approaches for privacy-preserving data collection and learning, highlighting methods such as differential privacy and confidential computing. Finally, we discuss how to align generative AI systems with human values and ethical principles, and explore strategies to ensure these systems remain safe and trustworthy when deployed in open-world environments.

Bio:

Tsubasa Takahashi is a Principal Researcher at Turing Inc. He earned his PhD in Computer Science from the University of Tsukuba in 2014. He has held research positions in industrial laboratories at NEC Corp., LINE Corp., LY Corp., and SB Intuitions Corp., and spent a year as a visiting scholar at Carnegie Mellon University. His research interests include AI security, data privacy, and generative AI systems, with applications in self-driving vehicles.

May 28, 2025: Towards Impactful Research: From Visual Domain Adaptation to Deep Video Compression

Speaker: Professor Dong Xu, The University of Hong Kong

Abstract:

In this talk, I will first introduce our previous domain adaptation works, including our pioneering works in developing new domain adaptation (transfer learning) methods for video event recognition, and a series of subsequent works for single source domain adaptation, multi-domain adaptation, heterogeneous domain adaptation, domain generalization and deep domain adaptation, as well as their applications in various computer vision tasks. Then I will describe our previous deep video compression works, including the first end-to-end optimized deep video compression (DVC) framework, and our subsequent works including the feature-space video coding (FVC) network, as well as our recent works for coding mode prediction and stereo video compression.

Bio:

Prof. Dong Xu is a Professor in the Computer Science Department, The University of Hong Kong, where he serves as the Director of the JC STEM Lab of Multimedia and Machine Learning. After receiving his PhD degree from University of Science and Technology of China in 2005, he worked as a postdoctoral research scientist at Columbia University, a tenure-track and tenured faculty member at Nanyang Technological University, and the Chair in Computer Engineering at The University of Sydney.

Prof. Xu is an active researcher in the areas of computer vision, multimedia and machine learning. He was selected as a Clarivate Analytics Highly Cited Researcher twice in 2021 and 2018. He was also selected as an Australian Research Council Future Fellow (Level 3, Professorial Level) in 2018 and awarded the IEEE Computational Intelligence Society Outstanding Early Career Award in 2017. He has published more than 200 papers in IEEE Transactions and leading conferences including CVPR, ICCV, ECCV, ICML, ACM MM and MICCAI. His co-authored works (with his former PhD students) received the Best Student Paper Award in CVPR 2010 and the IEEE Transactions on Multimedia Prize Paper Award in 2014.

He is/was on the editorial boards of ACM Computing Surveys (Senior Associate Editor since October 2022), IEEE Transactions including T-PAMI, T-IP, T-NNLS, T-CSVT and T-MM, and other five journals, as well as served as a guest editor of more than ten special issues in multiple journals (e.g., IJCV, IEEE/ACM Transactions). He served as the Program Coordinator of ACM Multimedia 2024, a steering committee member of ICME (2016-2017) and a Program Co-chair of five international conferences/workshops (e.g., ICME 2014). He was also involved in the organization committees of many international conferences and served as an area chair of leading conferences such as ICCV, CVPR, ECCV, ACM MM and AAAI. He received the Best Associate Editor Award of T-CSVT in 2017. He is a Fellow of IEEE and IAPR (The International Association for Pattern Recognition) and a Foreign Member of the Academia Europaea (The Academy of Europe).

May 19, 2025: Recipe for Japanese Large Language Models

Speaker: Professor Naoaki Okazaki, Institute of Science Tokyo

Abstract:

Our teams developed and released a series of Swallow LLMs that exhibit strong performance in Japanese. For example, Llama 3.3 Swallow 70B v0.4 showed a competitive performance with OpenAI GPT-4o (gpt-4o-2024-08-06) on 10 Japanese understanding and generation tasks. Our goal is to develop LLMs with strong Japanese capabilities and unveil the recipe for building such strong LLMs. In this talk, I will briefly summarize the current state of Japanese LLM development and introduce our efforts to develop Swallow LLMs, including preparation for the data for (continual) pre-training and instruction tuning, evaluation for Japanese LLMs, and future directions of the project.

Bio:

Naoaki Okazaki, currently a professor at the School of Computing, Institute of Science Tokyo, earned his PhD in 2007 from the Graduate School of Information Science and Technology at the University of Tokyo. His early post-doctoral career included a research position at the same institution. Subsequently, he became an associate professor at the Graduate School of Information Sciences at Tohoku University. He specializes in Natural Language Processing and Artificial Intelligence. His career is marked by numerous awards, such as the Young Scientists’ Prize, the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology. In addition, he was honored with the 2016 Microsoft Research Young Faculty Award on Information Processing Society of Japan.

April 1, 2025: From Automation to Autonomy: Machine Learning for Next-generation Robotics

Speaker: Professor Sethu Vijayakumar FRSE, The University of Edinburgh, UK

Abstract:

The new generation of robots work much more closely with humans, other robots and interact significantly with the environment around it. As a result, the key paradigms are shifting from isolated decision making systems to one that involves shared control — with significant autonomy devolved to the robot platform; and end-users in the loop making only high level decisions.

This talk will briefly introduce powerful machine learning technologies ranging from robust multi-modal sensing, shared representations, scalable real-time learning and adaptation, and compliant actuation that are enabling us to reap the benefits of increased autonomy while still feeling securely in control.

This also raises some fundamental questions: while the robots are ready to share control, what is the optimal trade-off between autonomy and control that we are comfortable with?

Domains where this debate is relevant include deployment of robots in extreme environments, self-driving cars, asset inspection, repair & maintenance, factories of the future and assisted living technologies including exoskeletons and prosthetics to list a few.

Bio:

Sethu Vijayakumar is the Professor of Robotics at the University of Edinburgh, UK and the Founding Director of the Edinburgh Centre for Robotics. He has pioneered the use of large-scale machine learning techniques in the real-time control of several iconic robotic platforms such as the SARCOS and the HONDA ASIMO humanoids, KUKA-LWR robot arm and iLIMB prosthetic hand. He had held adjunct faculty positions at the University of Southern California (USC), Los Angeles and the RIKEN Brain Science Institute, Japan. One of his landmark projects (2016) involved a collaboration with NASA Johnson Space Centre on the Valkyrie humanoid robot being prepared for unmanned robotic pre-deployment missions to Mars. Professor Vijayakumar, who has a PhD from the Tokyo Institute of Technology, holds the Royal Academy of Engineering (RAEng) – Microsoft Research Chair at Edinburgh. He has published over 250 peer reviewed and highly cited articles [H-index 50, Citations > 13,000 as of 2025] on topics covering robot learning, optimal control, and real-time planning in high dimensional sensorimotor systems. He is a Fellow of the Royal Society of Edinburgh, a judge on BBC Robot Wars and winner of the 2015 Tam Dalyell Prize for excellence in engaging the public with science. Professor Vijayakumar helps shape and drive the UK Robotics and Autonomous Systems (RAS) agenda in his recent role as the Programme Director for Robotics and Human AI Interfaces at The Alan Turing Institute, the UK’s national institute for data science and AI.

Sethu also serves as Senior Independent Director (SID) on the Japan Small Caps fund with Baillie Gifford Investment and has been advisor to several UK-Japan Governmental research initiatives for the Japan Ministry of Finance as well as the Ministry of Economy, Trade and Industry.

Webpage: https://web.inf.ed.ac.uk/slmc (opens in new tab)

LinkedIn: https://www.linkedin.com/in/sethu-vijayakumar (opens in new tab)

April 1, 2025: Contact-rich and Whole-body Manipulation

Speaker: Dr. João Moura, Senior Researcher, The University of Edinburgh, UK

Abstract:

Contact-rich manipulation, which involves intricate contact and force interactions with the environment, is crucial for enabling dexterous robotic capabilities. Non-prehensile manipulation tasks, such as pushing or catching objects, present unique challenges due to under-actuation, hybrid dynamics, and model uncertainty. This talk will explore both model-based trajectory optimization and model-free reinforcement learning approaches for addressing these challenges. Furthermore, whole-body manipulation can significantly enhance a robot’s reachability and mobility. However, the increased degrees of freedom and multiple contact points introduce additional complexities, including high-dimensionality and nonlinear dynamics. Advancing contact-rich whole-body manipulation holds great potential across various domains, from nuclear decommissioning and assistive healthcare to enhancing robots’ ability to support daily-life activities.

Bio:

João Moura is a senior researcher at The University of Edinburgh, UK, and a research fellow at the Robotics and AI Collaboration (RAICo) in Whitehaven, UK. He earned his PhD (2021) and MSc by Research (with distinction, 2016) in Robotics and Autonomous Systems, jointly awarded by Heriot-Watt University and The University of Edinburgh, UK. He also holds a diploma degree (BSc + MSc, 2012) in Mechanical Engineering from the University of Aveiro, Portugal, where he graduated top of his cohort. Before pursuing PhD, João worked as a research assistant at INESC TEC Porto’s Centre for Robotics and Intelligent Systems, contributing to the FP7 European Project ICARUS—Integrated Components for Assisted Rescue and Unmanned Search operations. He also served as a teaching assistant at the University of Aveiro, lecturing courses in robotics, control systems, and programming. Currently, he collaborates with Waseda University as part of Goal 3 of Japan’s Moonshot programme. Previously, he participated in the H2020 European Project Harmony—Enhancing Healthcare with Assistive Robotic Mobile Manipulation. João has published in the top-tier robotics conferences and journals, including IROS, ICRA, CoRL, R:SS, RA-L, IJRR, and T-RO, having received nominations for Best Student Paper Award and Best Paper Award at the R:SS conference. His research focuses on trajectory optimization (TO), model predictive control (MPC), learning from demonstration (LfD), and reinforcement learning (RL) for contact-rich and non-prehensile manipulation.

Webpage: https://sites.google.com/view/joaomoura (opens in new tab)

LinkedIn: https://www.linkedin.com/in/joaopousamoura (opens in new tab)