About Me

Hi there, I am a second-year Ph.D. student at Australian Institute for Machine Learning (AIML), the University of Adelaide, supervised by A/Prof. Qi Wu and Dr. Yicong Hong. I am a member of the V3A Lab. Previously I was a student at the Australian National University and finished my master research project under the supervision of Prof. Stephen Gould. Before that, I received my Bachelor’s degree from Dalian University of Technology.

My research is dedicated to creating explainable and embodied AI systems that can interact dynamically with both humans and their environments. I aim to build an autonomous agent that can understand, reason, and navigate the physical world, while seamlessly communicating with humans in natural language. By integrating machine learning with visual and linguistic applications, I strive to enhance the transparency and interpretability of AI decision-making, fostering more natural and effective human-AI interactions.

Some topics that I currently focus on:

  • Self Explainable and Communicative Vision-and-Language Navigation (VLN) with Language Models: NavGPT, NavGPT-2
  • Sim2Real Transfer for VLN with Large Vision-Language Models: NaVid

News

  • 2024.07.11   We are thrilled to see that @GoogleDeepMind shares the same perspective as our previous work NavGPT on instruction-following navigation agents and build fascinating robots based on Gemini 1.5 Pro! [Details]
  • 2024.07.01   NavGPT-2 is accepted to ECCV 2024! Thanks to all collaborators.
  • 2024.05.14   NaVid is accepted to RSS 2024! Congratulations to Jiazhao, Kunyu and Rongtao!
  • 2023.12.09   Two papers are accepted to AAAI 2024. Congratulations and thanks to all collaborators.

Research

Document
NavGPT-2 Image

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu

European Conference on Computer Vision (ECCV), 2024

Static Badge


NavGPT Image

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Gengze Zhou, Yicong Hong, Qi Wu

AAAI Conference on Artificial Intelligence (AAAI), 2024

Static Badge


WebVLN Image

WebVLN: Vision-and-Language Navigation on Websites

Qi Chen, Dileepa Pitawela, Chongyang Zhao, Gengze Zhou, Hsiang-Ting Chen, Qi Wu

AAAI Conference on Artificial Intelligence (AAAI), 2024

Static Badge


NaVid Image

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, Wang He

Proceedings of Robotics: Science and Systems (RSS), 2024

Static Badge

Experience

Teaching

  • Teaching Assistant: COMP8536 - Deep Learning, ANU, 2022

Services

  • Reviewer: CVPR’(24), MM’(24)

Professional

  • Research Internship: Sensetime, Dec 2021 - Apr 2022