IEEE Fellow
Prof. James Tin-Yau Kwok, The Hong Kong University of Science and Technology, Hong Kong, China
Title of the speech: Enhancing Language Models through Improved Pre-Training and Fine-Tuning
Abstract: Language models (LMs) are essential in natural language processing and vision-language modeling. However, several challenges arise in pre-training and fine-tuning of LMs. First, when learning through unsupervised pre-training, information that are semantically irrelevant may negatively affect downstream tasks, leading to negative transfer. Second, cross-modal masked language modeling is often used to learn vision-language associations in vision-language models. However, existing masking strategies may be insufficient in that the masked tokens can sometimes be simply recovered with only the language information, ignoring the visual inputs. Lastly, prompt tuning is effective in fine-tuning LMs on downstream tasks with limited labeled samples, but prompt design is difficult.
To tackle these issues, we propose several measures. First, we introduce a new pre-training method that trains each expert with only semantically relevant data through cluster-conditional gates. This allows downstream tasks be allocated to customized models pre-trained on data most similar to the downstream data. Second, on pre-training vision-language models, we use a masking strategy based on the saliencies of language tokens to the image. Lastly, we use meta-learning to learn an efficient prompt pool that can extract diverse knowledge from historical tasks. This allows instance-dependent prompts to be constructed from the pool without tuning the whole LM. Experimental results show that these measures can significantly improve the performance of LMs.
BIO: Prof. Kwok is a Professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He received his B.Sc. degree in Electrical and Electronic Engineering from the University of Hong Kong and his PhD degree in computer science from the Hong Kong University of Science and Technology. Prof. Kwok served/is serving as an Associate Editor for the IEEE Transactions on Neural Networks and Learning Systems, Neural Networks, Neurocomputing, Artificial Intelligence Journal, International Journal of Data Science and Analytics, Editorial Board Member of Machine Learning, Governing Board Member and Vice President for Publications of the Asia Pacific Neural Network Society. He also served/is serving as Senior Area Chairs / Area Chairs of major machine learning / AI conferences including NeurIPS, ICML, ICLR, IJCAI, AAAI and ECML. He is recognized as the Most Influential Scholar Award Honorable Mention for "outstanding and vibrant contributions to the field of AAAI/IJCAI between 2009 and 2019". He is an IEEE Fellow.
Prof. Kenneth K.Y. Wong, University of Hong Kong, Hong Kong, China
Title of the speech: Clothed Human Model Reconstruction and Generation
Abstract: The rapid development of the Metaverse, catalyzed by the advancements in virtual reality (VR) and augmented reality (AR) technologies, has resulted in a growing demand for 3D human models which are commonly used in applications like VR chat rooms and games. Historically, such 3D human models are created using CAD software, which is both time-consuming and expert-dependent. In this talk, we first introduce a flexible framework, named SeSDF, that can reconstruct a clothed human model from an arbitrary number of images under an uncalibrated setting. At the core of our framework is our novel self-evolved signed distance field module which allows the framework to learn to deform the signed distance field derived from a SMPL-X body model fitted to the image(s). We also introduce a simple method for self-calibration of multi-view images via the fitted SMPL-X parameters. This lifts the requirement of tedious manual calibration. In the second half of this talk, we introduce a generative framework, named DreamAvatar, that can generate high quality 3D human avatars with controllable poses based on text prompts. DreamAvatar utilizes a trainable NeRF for predicting density and color features for 3D points and a pre-trained text-to-image diffusion model for providing 2D self-supervision. Specifically, it leverages an SMPL-X model to provide rough pose and shape guidance for the generation. DreamAvatar is capable of generating avatars with detailed geometry and texture, establishing a new state-of-the-art for text-and-shape guided 3D human avatar generation.
BIO: Dr. Wong received the PhD degree in computer vision from the University of Cambridge in 2001. Since then, he has been with the Department of Computer Science at The University of Hong Kong, where he is now an Associate Professor. His research interests are in computer vision and machine intelligence. His research works include camera calibration, 3D reconstruction, image super-resolution, inpainting and restoration, and text-to-3D. He has published over 150 peer-reviewed journal and conference papers. Many of his works appear in top venues, including CVPR, ECCV, ICCV, TPAMI, IJCV, and TIP. He is currently an Associate Editor of IJCV.
Prof. Seokwon Yeom, Daegu University Gyeongsan, South Korea
Title of the speech: Drone application for search and rescue mission
Title of the speech: Multiple Ground Target Tracking with a Drone
Abstract: In this keynote speech, multiple ground target tracking with a small drone is introduced. The entire process consists of moving object detection and target tracking. Moving object detection consists of frame extraction and thresholding, morphological operations, and false alarm removing based on the size and shape of the object. Segmentation by k-means clustering is utilized in thermal video. Targeted tracking consists of the following steps: track initialization, measurement-trace association, state prediction and estimation, track-trace association, track termination, and validation testing. The measurement that is statistically nearest to the state prediction updates the target’s state. With the improved track-to-track association, the fittest track is selected in the track validation region, and the direction of the displacement vector and velocity vectors of the two tracks are tested with an angular threshold. The coordinates of the image are converted to real-world based on the angular field of view, tilt angle, and altitude of the camera. In the experiments, various scenarios are tested including drone flight, bird-eye and oblique views, and thermal imaging. Tracking performance was evaluated by total track life (TTL) and mean track life (MTL). Promising results were obtained for 86 targets within approximately 1 km from the drone.
BIO: Seokwon Yeom has been a faculty member of Daegu University since 2007. He is now a full professor of the same university, School of AI. He has a Ph.D. in Electrical and Computer Engineering from the University of Connecticut in 2006.
His research interests are intelligent image/optical information processing, deep/machine learning, and target tracking. He has researched on multiple target tracking for the airborne early warning system, three-dimensional image processing with digital holography and integral imaging, photon-counting linear discriminant analysis and photon-counting nonlinear matched filter, millimeter wave and infrared image analysis, and long-distance target tracking for aerial surveillance and search and rescue mission with a small unmanned aerial vehicle.
He has been a guest editor of Applied Sciences and Drones in MDPI. He has served as a board member of the Korean Institute of Intelligent Systems, and a member of the board of directors of the Korean Institute of Convergence Signal Processing. He was program chair of ICCCS2015, ISIS2017, iFUZZY2018, ICCCS2019, ADIP2021-2023, IPMV2024. He was a vice director of the AI homecare center and a head of the department of IT convergence engineering at Daegu University in 2020, a visiting scholar at the University of Maryland in 2014, and a director of the Gyeongbuk techno-park specialization center in 2013.