Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

publications

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Published in ICCV, 2023

Knowledge distillation aims to learn a lightweight student network from a pre-trained teacher network. In practice, existing knowledge distillation methods are usually infeasible when the original training data is unavailable due to some privacy issues and data management considerations. Therefore, data-free knowledge distillation approaches proposed to collect training instances from the Internet. However, most of them have ignored the common distribution shift between the instances from original training data and webly collected data, affecting the reliability of the trained student network. To solve this problem, we propose a novel method dubbed “Knowledge Distillation between Different Distributions” (KD^ 3 ), which consists of three components. Specifically, we first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network. Subsequently, we align both the weighted features and classifier parameters of the two networks for knowledge memorization. Meanwhile, we also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment, so that the student network can further learn a distribution-invariant representation. Intensive experiments on various benchmark datasets demonstrate that our proposed KD^ 3 can outperform the state-of-the-art data-free knowledge distillation approaches.

Recommended citation: Tang, Jialiang, Shuo Chen, Gang Niu, Masashi Sugiyama, and Chen Gong. "Distribution shift matters for knowledge distillation with webly collected images." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17470-17480. 2023.
Download Paper

Learning Student Network under Universal Label Noise

Published in IEEE Transactions on Image Processing, 2023

Data-free knowledge distillation aims to learn a small student network from a large pre-trained teacher network without the aid of original training data. Recent works propose to gather alternative data from the Internet for training student network. In a more realistic scenario, the data on the Internet contains two types of label noise, namely: 1) closed-set label noise, where some examples belong to the known categories but are mislabeled; and 2) open-set label noise, where the true labels of some mislabeled examples are outside the known categories. However, the latter is largely ignored by existing works, leading to limited student network performance. Therefore, this paper proposes a novel data-free knowledge distillation paradigm by utilizing a webly-collected dataset under universal label noise, which means both closed-set and open-set label noise should be tackled. Specifically, we first split the collected noisy dataset into clean set, closed noisy set, and open noisy set based on the prediction uncertainty of various data types. For the closed-set noisy examples, their labels are refined by teacher network. Meanwhile, a noise-robust hybrid contrastive learning is performed on the clean set and refined closed noisy set to encourage student network to learn the categorical and instance knowledge inherited by teacher network. For the open-set noisy examples unexplored by previous work, we regard them as unlabeled and conduct self-supervised learning on them to enrich the supervision signal for student network. Intensive experimental results on image classification tasks demonstrate that our approach can achieve superior performance to state-of-the-art data-free knowledge distillation methods.

Recommended citation: Tang, Jialiang, Ning Jiang, Hongyuan Zhu, Joey Tianyi Zhou, and Chen Gong. "Learning student network under universal label noise." IEEE Transactions on Image Processing (2024).
Download Paper

Direct Distillation between Different Domains

Published in ECCV, 2024

Knowledge Distillation (KD) aims to learn a compact student network using knowledge from a large pre-trained teacher network, where both networks are trained on data from the same distribution. However, in practical applications, the student network may be required to perform in a new scenario (i.e., the target domain), which usually exhibits significant differences from the known scenario of the teacher network (i.e., the source domain). The traditional domain adaptation techniques can be integrated with KD in a two-stage process to bridge the domain gap, but the ultimate reliability of two-stage approaches tends to be limited due to the high computational consumption and the additional errors accumulated from both stages. To solve this problem, we propose a new one-stage method dubbed “Direct Distillation between Different Domains” (4Ds). We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge. Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data. As a result, the teacher network can effectively transfer categorical knowledge that aligns with the target domain of the student network. Intensive experiments on various benchmark datasets demonstrate that our proposed 4Ds method successfully produces reliable student networks and outperforms state-of-the-art approaches. Code is available at https://github.com/tangjialiang97/4Ds.

Recommended citation: Tang, Jialiang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, and Masashi Sugiyama. "Direct distillation between different domains." In European Conference on Computer Vision, pp. 154-172. Cham: Springer Nature Switzerland, 2024.
Download Paper

Open-World Semi-Supervised Learning under Compound Distribution Shifts

Published in GitHub Journal of Bugs, 2024

Open-world Semi-Supervised Learning (OSSL) has drawn significant attention recently which assumes that the scarce labeled data and abundant unlabeled data for classifier training are sampled from different distributions. Existing methods typically assume that all unlabeled examples are drawn from the same domain following the same distribution. Nevertheless, this assumption may be violated as the unlabeled data are often collected from multiple unknown domains practically. Therefore, this paper tries to solve the OSSL problem under compound distribution shifts, in which the unlabeled data are from multiple unknown domains which may deviate from the distribution of labeled data. Specifically, we propose a novel Adversarial Mutual Information Disentanglement (AMID) framework to capture domain-invariant features for classifier training without the knowledge of domains. Particularly, we find that the class tokens of the pre-trained Vision Transformer (ViT) carry critical cues reflecting the styles of unlabeled data which can be deployed to attribute unlabeled data into different discovered domains. Subsequently, we train a feature encoder which captures the domain-invariant features shared among the attributed domains via designed adversarial confusion loss, so that the trained feature encoder can accurately represent the semantic information of unlabeled examples regardless of their domains. To further enhance feature disentanglement and enlarge the gap between useful domain-invariant features and interfered domain-specific features, we minimize the mutual information between the outputs of the encoders corresponding to domain-invariant features and domain-specific features. Comprehensive experiments conducted on various benchmark datasets demonstrate the effectiveness and generalizability of our approach in resolving the issue of compound distribution shifts in OSSL.

Recommended citation: Xu, Shijia, Lin Zhao, Jialiang Tang, Guangyu Li, and Chen Gong. "Open-World Semi-Supervised Learning under Compound Distribution Shifts." (2024).
Download Paper

Hybrid Data-Free Knowledge Distillation

Published in AAAI, 2025

Data-free knowledge distillation aims to learn a compact student network from a pre-trained large teacher network without using the original training data of the teacher network. Existing collection-based and generation-based methods train student networks by collecting massive real examples and generating synthetic examples, respectively. However, they inevitably become weak in practical scenarios due to the difficulties in gathering or emulating sufficient real-world data. To solve this problem, we propose a novel method called Hybrid Data-Free Distillation (HiDFD), which leverages only a small amount of collected data as well as generates sufficient examples for training student networks. Our HiDFD comprises two primary modules, ie, the teacher-guided generation and student distillation. The teacher-guided generation module guides a Generative Adversarial Network (GAN) by the teacher network to produce high-quality synthetic examples from very few real-world collected examples. Specifically, we design a feature integration mechanism to prevent the GAN from overfitting and facilitate the reliable representation learning from the teacher network. Meanwhile, we drive a category frequency smoothing technique via the teacher network to balance the generative training of each category. In the student distillation module, we explore a data inflation strategy to properly utilize a blend of real and synthetic data to train the student network via a classifier-sharing-based feature alignment technique. Intensive experiments across multiple benchmarks demonstrate that our HiDFD can achieve state-of-the-art performance using 120 times less collected data than existing methods.

Recommended citation: Tang, Jialiang, Shuo Chen, and Chen Gong. "Hybrid Data-Free Knowledge Distillation." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 19, pp. 20805-20813. 2025.
Download Paper

Exploring Neural Radiance Fields for Thermal View Synthesis Solely with Thermal Inputs

Published in Chinese Journal of Electronics, 2025

Novel View Synthesis (NVS) for thermal scenes aims to generate thermal images from unseen viewpoints, which shows great potential in various applications, such as nighttime autonomous driving, industrial inspection, and agricultural monitoring. Recently, Neural Radiance Fields (NeRF) have emerged as a powerful approach for NVS in thermal scenes, which typically require paired RGB and thermal images to produce realistic thermal images from new views. However, practical limitations, such as insufficient lighting, the prohibitive cost of RGB image acquisition, or the lack of RGB cameras, make it challenging or even impossible to obtain high-quality RGB images, which prevents the existing NeRF methods from generating realistic thermal images. To address this problem, we devise a simple yet effective NeRF framework based on Thermal Radiation Prediction, which is termed ‘NeRF-TRP’, for NVS in thermal scenes. Unlike the existing NeRF techniques that rely on paired RGB and thermal images, NeRF-TRP exclusively utilizes thermal images as input. By leveraging the principle of thermal imaging, NeRF-TRP predicts the thermal radiation emitted by objects to render thermal images from novel views. Meanwhile, motivated by the thermal equilibrium observed in thermal scenes, we design a patchbased regularization to enhance the realism of the generated thermal images. Extensive experiments on thermal images demonstrate that NeRF-TRP not only produces more accurate thermal image synthesis, but also reveals superior efficiency in both training and rendering when compared with various representative baseline approaches.

Recommended citation: Ding, Haixuan, Jialiang Tang, Sheng Wan, and Chen Gong. "Exploring Neural Radiance Fields for Thermal View Synthesis Solely with Thermal Inputs." Chinese Journal of Electronics (2025).
Download Paper

Empowering Large Language Models for Time Series Forecasting with Patterns and Semantics

Published in ICDM, 2025

Time Series Forecasting (TSF) is critical in many real-world domains like financial planning and health monitoring. Recent studies have revealed that Large Language Models (LLMs), with their powerful in-contextual modeling capabilities, hold significant potential for TSF. However, existing LLM-based methods usually perform suboptimally because they neglect the inherent characteristics of time series data. Unlike the textual data used in LLM pre-training, the time series data is semantically sparse and comprises distinctive temporal patterns. To address this problem, we propose LLM-PS to empower the LLM for TSF by learning the fundamental \textit{Patterns} and meaningful \textit{Semantics} from time series data. Our LLM-PS incorporates a new multi-scale convolutional neural network adept at capturing both short-term fluctuations and long-term trends within the time series. Meanwhile, we introduce a time-to-text module for extracting valuable semantics across continuous time intervals rather than isolated time points. By integrating these patterns and semantics, LLM-PS effectively models temporal dependencies, enabling a deep comprehension of time series and delivering accurate forecasts. Intensive experimental results demonstrate that LLM-PS achieves state-of-the-art performance in both short- and long-term forecasting tasks, as well as in few- and zero-shot settings.

Recommended citation: Tang, Jialiang, Shuo Chen, Chen Gong, Jing Zhang, and Dacheng Tao. "LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics." arXiv preprint arXiv:2503.09656 (2025).
Download Paper